What is stat_summary() Function in R

The stat_summary() is a ggplot2 library function in that allows for tremendous flexibility in the specification of summary functions. The summary function can operate on a data frame (with argument name fun.data) or a vector (fun.y, fun.ymax, fun.ymin).

The stat_summary() function calculates various summary statistics for data points, such as the mean, median, maximum, minimum, or standard deviation. It takes a summary function as an argument, such as mean, median, max, min, sd, q1, or q3, to name a few.

Syntax

stat_summary(mapping = NULL, data = NULL,
             geom = "pointrange", position = "identity", ...)

Parameters

mapping: Aesthetic mapping, usually constructed with aes or aes_string.

data: A layer-specific dataset – is only needed if you want to override the plot defaults.

geom: The geometric object to use to display the data.

position: The position adjustment to use for overlapping points on this layer.

: other arguments passed on to layer. This can include aesthetics whose values you want to set, not map.

Example 1

The stat_summary() function is used in combination with the geom_point() or geom_line() functions to add a summary point or line to a graph. It is useful for quickly visualizing summary statistics across different groups or categories in the data.

library(ggplot2)

ggplot(mpg, aes(x = displ, y = hwy, color = factor(cyl))) +
  geom_point() +
  stat_summary(fun = mean, geom = "line", aes(group = cyl))

Output

stat_summary() Function in R

In this code example, we generated a scatterplot with a summary line added for each unique value of the cyl variable, showing the average highway miles per gallon (hwy) for each engine displacement (displ) in the mpg dataset.

Example 2

library(ggplot2)

ggplot(mpg, aes(class, hwy)) + 
  geom_boxplot() +
  stat_summary(fun = "mean", geom = "point", shape = 20, size = 3, color = "red")

Output

stat_summary() in R

In this example, we created a boxplot of highway miles per gallon (hwy) for each class of vehicle (class) in the mpg dataset.

The stat_summary() function adds a red point to the plot for each class’s mean highway miles per gallon.

That’s it.

Leave a Comment