R Advanced

How to Find the Minimum Value By Group in R

The most common and efficient way to find the minimum value by group is to use the dplyr package. It provides group_by(), summarise(), and filter() methods to extract the smallest value based on our requirement.

Finding the minimum value by group means getting the smallest value within each group in our data frame. For example, if I have a column defined as a “group” and another column called “value” that contains different numeric values, I will extract the minimum value from the “value” column for each distinct group.

Here is the demo data frame that forms the basis of this tutorial:

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

Method 1: Using dplyr

Using dplyr’s group_by() method, we can divide the data frame based on the “Department” column. In other words, we can create a subgroup based on the unique Department values.

Then, using dplyr’s summarise(), we can find the minimum value for each unique group using the min() function.

Finding the Minimum Salary by Department

library(dplyr)

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

# Minimum Salary by Department
employee_data %>%
  group_by(Department) %>%
  summarise(min_salary = min(Salary))

Output

The above output image shows the department-wise minimum salary. 

Two identical minimum values

What if we have a data frame with two rows with the same minimum values in a single group? How do we deal with that? If you use dplyr’s group_by() and summarise(), it will return a single minimum value, and the other will be discarded from the output.

Let’s say we have the following modified data frame.

library(dplyr)

# data frame with two same minimum values
modified_employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007", "E008"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance", "HR"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 60000, 77000, 62000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY", "TX")
)

print(modified_employee_data)

In the modified_employee_data, we have two employees, E001 and E006, who work under the HR department and have a minimum salary of 60000.

Let’s fetch only one minimum value:

# Only the Minimum Value (No Employee Details)
modified_employee_data %>%
  group_by(Department) %>%
  summarise(min_salary = min(Salary))

Output

Retrieve all rows with the minimum salary

Even if there are multiple minimum value salary rows by group, we will fetch each row using the dplyr filter() function.

library(dplyr)

# Two Employees with Minimum Salary
modified_employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007", "E008"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance", "HR"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 60000, 77000, 62000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY", "TX"))

print(modified_employee_data)

# Retrieve All Rows with the Minimum Salary
modified_employee_data %>%
  group_by(Department) %>%
  filter(Salary == min(Salary))

Output

The filter() function filters out all the rows except the minimum Salary rows by group.

Method 2: Using aggregate()

If your dataset is small and you don’t want to use any third-party package, you can use the base R’s aggregate() function.

Let’s return to the original data frame and use the aggregate() function to find the minimum salary by location.

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

# Using aggregate
# Group-wise minimum salary using aggregate() by location
aggregate(Salary ~ Location, data = employee_data, FUN = min)

Output

Method 3: Using data.table

If your input dataset is extremely large, consider using the data table approach. In large data sets, a data table is very efficient and faster than all the other methods. The output is a data table.

library(data.table)

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

# Using data.table
employee_data_dt <- data.table(employee_data)

employee_data_dt[, .(min_salary = min(Salary)), by = Department]

Output

That’s all!

Recent Posts

R paste() Function

The paste() function in R concatenates vectors after converting them to character. paste("Hello", 19, 21,…

7 days ago

paste0() Function in R

R paste0() function concatenates strings without any separator between them. It is a shorthand version…

1 week ago

How to Calculate Standard Error in R

Standard Error (SE) measures the variability or dispersion of the sample mean estimate of a…

2 weeks ago

R max() and min() Functions

max() The max() function in R finds the maximum value of a vector or data…

2 weeks ago

R as.Date() Function: Working with Dates

The as.Date() function in R converts various types of date and time objects or character…

2 weeks ago

R pnorm() Function [With Graphical Representation]

The pnorm() function in R calculates the cumulative density function (cdf) value of the normal…

3 weeks ago