R Advanced

How to Find the Maximum Value By Group in R

If you want to find the maximum value within a specific subset of your data, you must find the maximum value within each group.

First, you need to group your data based on the values of one or more categorical variables (columns). The second step is identifying the maximum value of a specific numeric variable (numeric column) for each group.

Here are three approaches you can use to find the maximum value by group in R:

  1. Using dplyr (Fast, readable, and user-friendly)
  2. Using base R (No dependencies, Moderate speed)
  3. Using data.table (Fastest and Most Efficient, but Learning Curve)

To illustrate the concepts in this tutorial, we use the following demo DataFrame, which is concise and robust for coding examples.

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

Method 1: Using dplyr

Using dplyr is efficient because its main function is to modify the data frame meaningfully to get the desired output.

It provides a group_by() function to group the data into multiple subsets based on the value of a single or multiple columns. Then, we can use the summarise() function to find the maximum value in each subgroup.

Finding the maximum value by a single group

Let’s get the maximum Salary by Department:

library(dplyr)

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

# Using dplyr
# Maximum salary per single group
employee_data %>%
  group_by(Department) %>%
  summarize(max_salary = max(Salary))

Output

The above output figure shows a tibble of four rows and two columns. The Department is a categorical column containing unique departments; the next column is max_salary, which is the maximum Salary for each Department.

Finding the maximum value by multiple groups

Let’s find the maximum Salary by Department (group 1) and location (group 2):

library(dplyr)

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

# Maximum value by multiple groups
employee_data %>%
  group_by(Department, Location) %>%
  summarize(max_salary = max(Salary))

Output

When you combine multiple groups, R creates unique combinations of these groups (Department and Location) based on the max_salary for each subset.

A group with no data (e.g., Sales) will be omitted by default. To retain empty groups, use .drop = FALSE in the group_by() method.

Finding a single row with the maximum value across all the columns

Using dplyr’s slice_max() function, you can find the maximum value of Salary, and it will return the complete row containing that Salary.

library(dplyr)

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

# Maximum value across all columns
employee_data %>%
  slice_max(Salary, n = 1)

Output

If two employees in the same Department have the same maximum Salary, use the slice_max(Salary, n = Inf) function with n = Inf to return all ties.

Method 2: Using base R functions

When it comes to base R, aggregate() is the most popular function that allows grouping data and applying a function like max. You don’t need any third-party package to use it.

Get the maximum Salary per Department

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

# Get maximum salary per group
aggregate(Salary ~ Department, data = employee_data, max)

Output

In this code, we pass Department as a grouping variable for subsetting the data frame, employee_data df as data, and the max() function to get the maximum value out of each subset.

Get maximum value by multiple groups

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

# Get maximum value by multiple groups
aggregate(Salary ~ Department + Location, data = employee_data, max)

Output

Get the absolute maximum value across all columns

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

# Get absolute maximum value across all columns
employee_data[which.max(employee_data$Salary), ]

Output

The above output shows that row 5 has the highest Salary of 85000, Department IT, Location CA, and Employee_ID E005.

Method 3: Using data.table

The data.table is highly efficient when you have a large dataset. First, you need to convert the input data frame into a data.table and then you can use its functions to get the maximum value by group(s).

Finding the maximum Salary per Department

library(data.table)

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

employee_data_dt <- data.table(employee_data)

# Maximum salary per group
employee_data_dt[, .(max_salary = max(Salary)), by = Department]

Output

In this code, we used data.table() function to convert the input data frame employee_data to data.table (employee_data_dt). 

In the next step, we applied the max() function to the Salary column, grouped by Department, using the “by” argument. The above output is a data table, not a data frame. We get each Department’s highest Salary.

Finding the maximum Salary by Department and Group

library(data.table)

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

employee_data_dt <- data.table(employee_data)

# Maximum value by multiple groups
employee_data_dt[, .(max_salary = max(Salary)), by = .(Department, Location)]

Output

Finding the maximum value across all columns

library(data.table)

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

employee_data_dt <- data.table(employee_data)

# Maximum value across all columns
employee_data_dt[which.max(Salary), ]

Output

That’s all!

Recent Posts

How to Check If File and Folder Already Exists in R

Whether you are reading or writing files via programs in the file system, it is…

1 day ago

How to Check Data type of a Variable in R

When it comes to checking the data type of a variable, it depends on what…

2 days ago

Mastering grepl() Function in R

The grepl() function (stands for "grep logical") in R searches for patterns within each element…

3 days ago

zip(), unzip() and tar(), untar() Functions in R

The zip() function creates a new zip archive file. You must ensure that the zip tool…

4 days ago

How to Create Directory and File If It doesn’t Exist in R

When working with file systems, checking the directory or file existence is always better before…

5 days ago

How to Create a Grouped Boxplot in R

To create a grouped boxplot in R, we can use the ggplot2 library's aes() and…

7 days ago