The stat_summary() function from the ggplot2 package is used to apply statistical summaries to the data and display the results in various plot types. It calculates various summary statistics for data points, such as the mean, median, maximum, minimum, or standard deviation.
Syntax
stat_summary(mapping = NULL, data = NULL,
geom = "pointrange", position = "identity")
Parameters
- mapping: Aesthetic mapping, usually constructed with aes or aes_string.
- data: A layer-specific dataset – is only needed if you want to override the plot defaults.
- geom: The geometric object to use to display the data.
- position: The position adjustment to use for overlapping points on this layer.
Example 1: Usage of stat_summary()
library(ggplot2)
ggplot(mpg, aes(x = displ, y = hwy, color = factor(cyl))) +
geom_point() +
stat_summary(fun = mean, geom = "line", aes(group = cyl))
Output
Example 2: Showing the mean and standard deviation for a given dataset
library(ggplot2)
ggplot(data = iris, aes(x = Species, y = Sepal.Length)) +
stat_summary(
fun.data = "mean_cl_boot",
geom = "errorbar" # Use error bars
) +
stat_summary(
fun.y = "mean", # Calculate mean
geom = "point" # Use points
)
Output
Example 3: Using “mpg” dataset
library(ggplot2)
ggplot(mpg, aes(class, hwy)) +
geom_boxplot() +
stat_summary(fun = "mean", geom = "point", shape = 20, size = 3, color = "red")
Output
The stat_summary() is helpful for datasets with grouping variables where you want to compare statistical summaries across groups.
Krunal Lathiya is a seasoned Computer Science expert with over eight years in the tech industry. He boasts deep knowledge in Data Science and Machine Learning. Versed in Python, JavaScript, PHP, R, and Golang. Skilled in frameworks like Angular and React and platforms such as Node.js. His expertise spans both front-end and back-end development. His proficiency in the Python language stands as a testament to his versatility and commitment to the craft.