How to Rank Variables by Group Using dplyr in R

To Rank Variables by Group using dplyr in R, you can “combine arrange(), group_by(), or mutate() functions.”

Syntax

df %>% arrange(group_var, numeric_var) 
   %>% group_by(group_var) 
   %>% mutate(rank = rank(numeric_var))

Example 1: Rank in Ascending Order

library(dplyr)

df <- data.frame(
  Age = c(20, 21, 19, 22, 23, 20, 21),
  Gender = c("Male", "Female", "Male", "Female", "Male", "Female", "Male"),
  Score = c(85, 90, 88, 78, 92, 80, 87)
)

ranked_students <- df %>%
  arrange(Gender, Score) %>%
  group_by(Gender) %>%
  mutate(rank = rank(Score))

ranked_students

Output

How to Rank Variables by Group Using dplyr in R

The arrange(group_var, numeric_var) function sorts the dataframe by group_var and then by numeric_var.

The group_by(group_var) function groups the data by the group_var.

The mutate(rank = rank(numeric_var)) function calculates the rank of numeric_var within each group.

Example 2: Rank in Descending Order

library(dplyr)

df <- data.frame(
  Age = c(20, 21, 19, 22, 23, 20, 21),
  Gender = c("Male", "Female", "Male", "Female", "Male", "Female", "Male"),
  Score = c(85, 90, 88, 78, 92, 80, 87)
)

ranked_students <- df %>%
  arrange(Gender, Score) %>%
  group_by(Gender) %>%
  mutate(rank = rank(-Score))

ranked_students

Output

Rank in Descending Order

How to Handle Ties in Ranking in R

Handling ties in ranking is an important consideration. In R, the rank() function provides several methods to handle ties:

  1. “average” (default): The ranks of tied values will be the average of the ranks they would have received if they were not tied. For example, if two values tie for 3rd place, they would both receive the average rank of 3.5.
  2. “first”: The ranks are assigned in the order they appear in the data.
  3. “last”: The ranks are assigned in the reverse order they appear in the data.
  4. “random”: The ranks of tied values are assigned randomly.
  5. “max”: All tied values receive the maximum possible rank.
  6. “min”: All tied values receive the minimum possible rank.

Here’s how you can use these methods with the rank() function in R:

# Using the "average" method (default)
ranked_avg <- df %>%
  arrange(Gender, Score) %>%
  group_by(Gender) %>%
  mutate(rank_avg = rank(Score, ties.method = "average"))

# Using the "min" method
ranked_min <- df %>%
  arrange(Gender, Score) %>%
  group_by(Gender) %>%
  mutate(rank_min = rank(Score, ties.method = "min"))

Choose the method that best fits the context and purpose of your ranking. For many applications, the default “average” method is suitable.

That’s it!

Related posts

How to Select the First Row by Group Using dplyr

How to Calculate Relative Frequencies Using dplyr

How to Recode Values Using dplyr

How to Replace NA with Zero in R

Leave a Comment