R mutate() Function from dplyr

R mutate() function from the dplyr package is used to add new columns or modify existing columns in a data frame or tibble.

Syntax

mutate(df, expression)

Parameters

  1. df: A data frame, extension (e.g. a tibble), or a lazy data frame.
  2. expression: The operation to be performed on variables.

Visual representation

Visual representation of using the mutate function in R to add a grade column to a dataframe based on scores

Example 1: Usage of mutate() function

# Install the package if not installed
if (!requireNamespace("dplyr", quietly = TRUE)) {
  install.packages("dplyr")
}

# Load the dplyr package
library(dplyr)

# Create a sample data frame
df <- data.frame(
  id = c(1, 2, 3, 4, 5),
  score = c(80, 75, 90, 85, 95)
)
cat("Data Frame before using mutate() function", "\n")
df

# Create a new column 'grade' based on the 'score' column
df <- df %>%
  mutate(grade = ifelse(score >= 90, "A",
  ifelse(score >= 80, "B", "C")
))

cat("Modififed Data Frame After using mutate() function", "\n")
df

Output

Output of mutate() function

In this code, the mutate() function takes the data frame and creates a new grade column based on the score column. The new column assigns a grade “A”, “B”, or “C,” depending on the value of the score column.

You can also use the mutate() function to create multiple new columns or modify existing ones simultaneously by providing multiple expressions separated by commas.

Example 2: Modifying an existing column

# Load the dplyr package
library(dplyr)

# Create a sample data frame
df <- data.frame(
  id = c(1, 2, 3, 4, 5),
  score = c(80, 75, 90, 85, 95)
)
cat("Before modifying 'Score' column", "\n")
df

# Create a new column 'grade' based on the 'score' column
df <- df %>% mutate(score = 2 * score)

cat("After modifying 'Score' column", "\n")
df

Output

Output of modifying an existing column

Example 3: Calculating BMI Index

In this example, we will create a data frame containing the heights and weights of five individuals.

In the next step, we used the mutate() function to calculate the Body Mass Index (BMI) for each individual and classify them based on their BMI values.

# Install the package if not installed
if (!requireNamespace("dplyr", quietly = TRUE)) {
  install.packages("dplyr")
}

# Load the dplyr package
library(dplyr)

# Create a sample data frame
df <- data.frame(
  id = 1:5,
  height_m = c(1.65, 1.80, 1.75, 1.68, 1.90),
  weight_kg = c(60, 80, 85, 70, 95)
)

# Print the data frame
print(df)

df <- df %>%
  mutate(
    bmi = weight_kg / (height_m^2),
    bmi_category = case_when(
    bmi < 18.5 ~ "Underweight",
    bmi >= 18.5 & bmi < 24.9 ~ "Normal weight",
    bmi >= 24.9 & bmi < 29.9 ~ "Overweight",
    TRUE ~ "Obesity"
  )
)

# Print the modified data frame
print(df)

Output

   id  height_m   weight_kg
1   1   1.65        60
2   2   1.80        80
3   3   1.75        85
4   4   1.68        70
5   5   1.90        95
 
   id  height_m   weight_kg   bmi    bmi_category
1   1   1.65         60    22.03857  Normal weight
2   2   1.80         80    24.69136  Normal weight
3   3   1.75         85    27.75510  Overweight
4   4   1.68         70    24.80159  Normal weight
5   5   1.90         95    26.31579  Overweight

That’s it!

Leave a Comment