How to Find the Maximum Value By Group in R

If you want to find the maximum value within a specific subset of your data, you must find the maximum value within each group.

First, you need to group your data based on the values of one or more categorical variables (columns). The second step is identifying the maximum value of a specific numeric variable (numeric column) for each group.

Here are three approaches you can use to find the maximum value by group in R:

  1. Using dplyr (Fast, readable, and user-friendly)
  2. Using base R (No dependencies, Moderate speed)
  3. Using data.table (Fastest and Most Efficient, but Learning Curve)

To illustrate the concepts in this tutorial, we use the following demo DataFrame, which is concise and robust for coding examples.

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

employee_data DataFrame

Method 1: Using dplyr

Using dplyr is efficient because its main function is to modify the data frame meaningfully to get the desired output.

It provides a group_by() function to group the data into multiple subsets based on the value of a single or multiple columns. Then, we can use the summarise() function to find the maximum value in each subgroup.

Finding the maximum value by a single group

Let’s get the maximum Salary by Department:

library(dplyr)

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

# Using dplyr
# Maximum salary per single group
employee_data %>%
  group_by(Department) %>%
  summarize(max_salary = max(Salary))

Output

Finding maximum value by a single group using dplyr

The above output figure shows a tibble of four rows and two columns. The Department is a categorical column containing unique departments; the next column is max_salary, which is the maximum Salary for each Department.

Finding the maximum value by multiple groups

Let’s find the maximum Salary by Department (group 1) and location (group 2):

library(dplyr)

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

# Maximum value by multiple groups
employee_data %>%
  group_by(Department, Location) %>%
  summarize(max_salary = max(Salary))

Output

Finding maximum value by multiple groups using dplyr

When you combine multiple groups, R creates unique combinations of these groups (Department and Location) based on the max_salary for each subset.

A group with no data (e.g., Sales) will be omitted by default. To retain empty groups, use .drop = FALSE in the group_by() method.

Finding a single row with the maximum value across all the columns

Using dplyr’s slice_max() function, you can find the maximum value of Salary, and it will return the complete row containing that Salary.

library(dplyr)

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

# Maximum value across all columns
employee_data %>%
  slice_max(Salary, n = 1)

Output

Finding a single row with the maximum value across all the columns

If two employees in the same Department have the same maximum Salary, use the slice_max(Salary, n = Inf) function with n = Inf to return all ties.

Method 2: Using base R functions

When it comes to base R, aggregate() is the most popular function that allows grouping data and applying a function like max. You don’t need any third-party package to use it.

Get the maximum Salary per Department

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

# Get maximum salary per group
aggregate(Salary ~ Department, data = employee_data, max)

Output

Get the maximum Salary per Department using aggregate()In this code, we pass Department as a grouping variable for subsetting the data frame, employee_data df as data, and the max() function to get the maximum value out of each subset.

Get maximum value by multiple groups

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

# Get maximum value by multiple groups
aggregate(Salary ~ Department + Location, data = employee_data, max)

Output

Get maximum value by multiple groups using aggregate()Get the absolute maximum value across all columns

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

# Get absolute maximum value across all columns
employee_data[which.max(employee_data$Salary), ]

Output

Get the absolute maximum value across all columns using aggregate()

The above output shows that row 5 has the highest Salary of 85000, Department IT, Location CA, and Employee_ID E005.

Method 3: Using data.table

The data.table is highly efficient when you have a large dataset. First, you need to convert the input data frame into a data.table and then you can use its functions to get the maximum value by group(s).

Finding the maximum Salary per Department

library(data.table)

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

employee_data_dt <- data.table(employee_data)

# Maximum salary per group
employee_data_dt[, .(max_salary = max(Salary)), by = Department]

Output

Finding the maximum salary per department using data.table

In this code, we used data.table() function to convert the input data frame employee_data to data.table (employee_data_dt). 

In the next step, we applied the max() function to the Salary column, grouped by Department, using the “by” argument. The above output is a data table, not a data frame. We get each Department’s highest Salary.

Finding the maximum Salary by Department and Group

library(data.table)

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

employee_data_dt <- data.table(employee_data)

# Maximum value by multiple groups
employee_data_dt[, .(max_salary = max(Salary)), by = .(Department, Location)]

Output

Finding the maximum Salary by Department and GroupFinding the maximum value across all columns

library(data.table)

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

employee_data_dt <- data.table(employee_data)

# Maximum value across all columns
employee_data_dt[which.max(Salary), ]

Output

Finding the maximum value across all columns using data.table

That’s all!

Leave a Comment