R stat_summary() Function

The stat_summary() function from the ggplot2 package is used to apply statistical summaries to the data and display the results in various plot types. It calculates various summary statistics for data points, such as the mean, median, maximum, minimum, or standard deviation.

Syntax

stat_summary(mapping = NULL, data = NULL,
             geom = "pointrange", position = "identity")

Parameters

  1. mapping: Aesthetic mapping, usually constructed with aes or aes_string.
  2. data: A layer-specific dataset – is only needed if you want to override the plot defaults.
  3. geom: The geometric object to use to display the data.
  4. position: The position adjustment to use for overlapping points on this layer.

Example 1: Usage of stat_summary()

library(ggplot2)

ggplot(mpg, aes(x = displ, y = hwy, color = factor(cyl))) +
  geom_point() +
  stat_summary(fun = mean, geom = "line", aes(group = cyl))

Output

Visual representation of stat_summary() Function

Example 2: Showing the mean and standard deviation for a given dataset

library(ggplot2)

ggplot(data = iris, aes(x = Species, y = Sepal.Length)) +
 stat_summary(
   fun.data = "mean_cl_boot",
   geom = "errorbar" # Use error bars
 ) +
 stat_summary(
  fun.y = "mean", # Calculate mean
  geom = "point" # Use points
 )

Output

Visual representation of Showing the mean and standard deviation for a given dataset

Example 3: Using “mpg” dataset

library(ggplot2)

ggplot(mpg, aes(class, hwy)) + 
  geom_boxplot() +
  stat_summary(fun = "mean", geom = "point", shape = 20, size = 3, color = "red")

Output

Another plot example

The stat_summary() is helpful for datasets with grouping variables where you want to compare statistical summaries across groups.

Leave a Comment