How to Use the mutate() Function in R

The mutate() function from the dplyr package is “used to add new variables while retaining old variables to a data frame”.

Syntax

mutate(df, expr)

Parameters

  1. df: A data frame, extension (e.g. a tibble), or a lazy data frame.
  2. expr: The operation to be performed on variables.

Example 1

Before using the mutate() function, you need to install the dplyr library.

# Install the package if not installed
if (!requireNamespace("dplyr", quietly = TRUE)) {
  install.packages("dplyr")
}

# Load the dplyr package
library(dplyr)

# Create a sample data frame
df <- data.frame(
  id = c(1, 2, 3, 4, 5),
  score = c(80, 75, 90, 85, 95)
)

# Create a new column 'grade' based on the 'score' column
df <- df %>%
  mutate(grade = ifelse(score >= 90, "A",
    ifelse(score >= 80, "B", "C")
 ))

# Print the modified data frame
print(df)

Output

  id score grade
1  1   80     B
2  2   75     C
3  3   90     A
4  4   85     B
5  5   95     A

In this code, the mutate() function takes the data frame and creates a new grade column based on the score column. The new column assigns a grade “A”, “B”, or “C,” depending on the value of the score column.

You can also use the mutate() function to create multiple new columns or modify existing ones simultaneously by providing multiple expressions separated by commas.

Example 2

In this example, we will create a data frame containing the heights and weights of five individuals.

In the next step, we used the “mutate()” function to calculate the Body Mass Index (BMI) for each individual and classify them based on their BMI values.

# Install the package if not installed
if (!requireNamespace("dplyr", quietly = TRUE)) {
  install.packages("dplyr")
}

# Load the dplyr package
library(dplyr)

# Create a sample data frame
df <- data.frame(
  id = 1:5,
  height_m = c(1.65, 1.80, 1.75, 1.68, 1.90),
  weight_kg = c(60, 80, 85, 70, 95)
)

# Print the data frame
print(df)

df <- df %>%
  mutate(
    bmi = weight_kg / (height_m^2),
    bmi_category = case_when(
    bmi < 18.5 ~ "Underweight",
    bmi >= 18.5 & bmi < 24.9 ~ "Normal weight",
    bmi >= 24.9 & bmi < 29.9 ~ "Overweight",
    TRUE ~ "Obesity"
  )
)

# Print the modified data frame
print(df)

Output


   id  height_m   weight_kg
1   1   1.65        60
2   2   1.80        80
3   3   1.75        85
4   4   1.68        70
5   5   1.90        95
 
   id  height_m   weight_kg   bmi    bmi_category
1   1   1.65         60    22.03857  Normal weight
2   2   1.80         80    24.69136  Normal weight
3   3   1.75         85    27.75510  Overweight
4   4   1.68         70    24.80159  Normal weight
5   5   1.90         95    26.31579  Overweight

In this example, the mutate() function creates a new column bmi by dividing the weight_kg column by the square of the height_m column.

Then, another new column, bmi_category, is created using the case_when() function to classify individuals based on their calculated BMI values.

Leave a Comment