R Advanced

How to Find the Minimum Value By Group in R

The most common and efficient way to find the minimum value by group is to use the dplyr package. It provides group_by(), summarise(), and filter() methods to extract the smallest value based on our requirement.

Finding the minimum value by group means getting the smallest value within each group in our data frame. For example, if I have a column defined as a “group” and another column called “value” that contains different numeric values, I will extract the minimum value from the “value” column for each distinct group.

Here is the demo data frame that forms the basis of this tutorial:

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

Method 1: Using dplyr

Using dplyr’s group_by() method, we can divide the data frame based on the “Department” column. In other words, we can create a subgroup based on the unique Department values.

Then, using dplyr’s summarise(), we can find the minimum value for each unique group using the min() function.

Finding the Minimum Salary by Department

library(dplyr)

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

# Minimum Salary by Department
employee_data %>%
  group_by(Department) %>%
  summarise(min_salary = min(Salary))

Output

The above output image shows the department-wise minimum salary.

Two identical minimum values

What if we have a data frame with two rows with the same minimum values in a single group? How do we deal with that? If you use dplyr’s group_by() and summarise(), it will return a single minimum value, and the other will be discarded from the output.

Let’s say we have the following modified data frame.

library(dplyr)

# data frame with two same minimum values
modified_employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007", "E008"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance", "HR"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 60000, 77000, 62000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY", "TX")
)

print(modified_employee_data)

In the modified_employee_data, we have two employees, E001 and E006, who work under the HR department and have a minimum salary of 60000.

Let’s fetch only one minimum value:

# Only the Minimum Value (No Employee Details)
modified_employee_data %>%
  group_by(Department) %>%
  summarise(min_salary = min(Salary))

Output

Retrieve all rows with the minimum salary

Even if there are multiple minimum value salary rows by group, we will fetch each row using the dplyr filter() function.

library(dplyr)

# Two Employees with Minimum Salary
modified_employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007", "E008"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance", "HR"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 60000, 77000, 62000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY", "TX"))

print(modified_employee_data)

# Retrieve All Rows with the Minimum Salary
modified_employee_data %>%
  group_by(Department) %>%
  filter(Salary == min(Salary))

Output

The filter() function filters out all the rows except the minimum Salary rows by group.

Method 2: Using aggregate()

If your dataset is small and you don’t want to use any third-party package, you can use the base R’s aggregate() function.

Let’s return to the original data frame and use the aggregate() function to find the minimum salary by location.

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

# Using aggregate
# Group-wise minimum salary using aggregate() by location
aggregate(Salary ~ Location, data = employee_data, FUN = min)

Output

Method 3: Using data.table

If your input dataset is extremely large, consider using the data table approach. In large data sets, a data table is very efficient and faster than all the other methods. The output is a data table.

library(data.table)

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

# Using data.table
employee_data_dt <- data.table(employee_data)

employee_data_dt[, .(min_salary = min(Salary)), by = Department]

Output

That’s all!

Krunal Lathiya

Krunal Lathiya is a seasoned Computer Science expert with over eight years in the tech industry. He boasts deep knowledge in Data Science and Machine Learning. Versed in Python, JavaScript, PHP, R, and Golang. Skilled in frameworks like Angular and React and platforms such as Node.js. His expertise spans both front-end and back-end development. His proficiency in the Python language stands as a testament to his versatility and commitment to the craft.