How to Create a Grouped Boxplot in R

To create a grouped boxplot in R, we can use the ggplot2 library’s aes() and geom_boxplot() functions. The aes() function maps the continuous and categorical variables to visual properties of a plot. The geom_boxplot() function creates a boxplot.

There is a difference between a standard boxplot and a grouped boxplot.

A standard boxplot provides a snapshot of key statistics such as median, quartiles, IQR, and potential outliers. It describes how our dataset spans across the vertical.

A grouped boxplot provides the distribution of a continuous variable across multiple groups or categories. Each group or category gets its own boxplot. By placing each group’s boxplot side-by-side, you can compare the median, IQR, range, and outliers. It highlights the central tendency of the dataset and how it is spreading based on the provided categories.

When you are attempting to create a grouped boxplot, you first need to decide which variable will be your continuous variable and which will be your categorical variable(s).

A categorical variable represents the type of categories. It classifies observations into distinct groups.

A continuous variable is a numeric variable that can take an infinite number of values within a given range. Without these two variables, you can’t create a boxplot.

Let’s take an example of diet_df and decide which can be a continuous or categorical variable:

set.seed(123)

df_study <- data.frame(
  DietType = rep(c("Vegan", "Vegetarian"), each = 5),
  ExerciseRegimen = rep(rep(c("None", "Regular"), each = 5), 2),
  WeightLoss = c(
    rnorm(10, mean = 3, sd = 0.5),
    rnorm(10, mean = 5, sd = 0.7),
    rnorm(10, mean = 2, sd = 0.5),
    rnorm(10, mean = 4, sd = 0.6)
  )
)

The above figure shows the complete data frame.

In the df, we can classify

Two categorical variables: DietType and ExerciseRegimen
One continuous variable: WeightLoss

Now that we have the basic variables, we can construct a grouped boxplot:

# Load ggplot2 package
library(ggplot2)

# Set seed for reproducibility
set.seed(123)

# Created the dataset with 40 observations:
# - 20 observations for "Vegan" (10 with "None" and 10 with "Regular")
# - 20 observations for "Vegetarian" (10 with "None" and 10 with "Regular")
df_diet <- data.frame(
  DietType = rep(c("Vegan", "Vegetarian"), each = 20),
  ExerciseRegimen = rep(c("None", "Regular"), each = 10, times = 2),
  WeightLoss = c(
    rnorm(10, mean = 3, sd = 0.5), # Vegan diet, No exercise (10 observations)
    rnorm(10, mean = 5, sd = 0.7), # Vegan diet, Regular exercise (10 observations)
    rnorm(10, mean = 2, sd = 0.5), # Vegetarian diet, No exercise (10 observations)
    rnorm(10, mean = 4, sd = 0.6) # Vegetarian diet, Regular exercise (10 observations)
  )
)

# Creating a grouped boxplot
ggplot(df_diet, aes(x = DietType, y = WeightLoss, fill = ExerciseRegimen)) +
  geom_boxplot(position = position_dodge(width = 0.75)) +
  labs(
    title = "Weight Loss by Diet Type and Exercise Regimen",
    x = "Diet Type",
    y = "Weight Loss (kg)",
  fill = "Exercise Regimen"
  ) +
  theme_minimal()

Run the above code in RStudio to see the chart:

The above boxplot shows the comparison of weight loss distribution based on the Vegan or Vegetarian diet and whether you are doing exercise regularly or not.

The box in the above figure represents the IQR(Interquartile Range). A line inside the box represents the median.

The above visualization helps you understand the effect of diet and exercise on weight loss clearly and meaningfully.

Base R also provides the boxplot() function to create a grouped boxplot, but it lacks the aesthetics and interactivity of ggplot2. That’s why it is a lesser option.

Krunal Lathiya

Krunal Lathiya is a seasoned Computer Science expert with over eight years in the tech industry. He boasts deep knowledge in Data Science and Machine Learning. Versed in Python, JavaScript, PHP, R, and Golang. Skilled in frameworks like Angular and React and platforms such as Node.js. His expertise spans both front-end and back-end development. His proficiency in the Python language stands as a testament to his versatility and commitment to the craft.

Leave a Comment Cancel reply