R Advanced

How to Find the Minimum Value By Group in R

Finding the minimum value by group means getting the smallest value within each group in our data frame. For example, if I have a column that I defined as a “group” and another column called “value” that contains different numeric values, then I will extract the minimum value from the value column for each distinct group.

The most common and efficient way to find the minimum value by group is to use the dplyr package. It provides group_by(), summarise(), and filter() methods to extract the smallest value based on our requirement.

Here is the demo data frame that forms the basis of this tutorial:

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

Method 1: Using dplyr

Using dplyr’s group_by() method, we can divide the data frame based on the “Department” column. Meaning, we can create a subgroup based on the unique Department values.

Then, using dplyr’s summarise(), we can find the minimum value for each unique group using the min() function.

Finding the minimum Salary by Department

library(dplyr)

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

# Minimum Salary by Department
employee_data %>%
  group_by(Department) %>%
  summarise(min_salary = min(Salary))

Output

The above output image shows the department-wise minimum salary. 

Two same minimum values

What if we have a data frame with two rows with the same minimum values in a single group? How do we deal with that? If you use dplyr’s group_by() and summarise(), it will return a single minimum value, and the other will be discarded from the output.

Let’s say we have the following modified data frame.

library(dplyr)

# data frame with two same minimum values
modified_employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007", "E008"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance", "HR"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 60000, 77000, 62000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY", "TX")
)

print(modified_employee_data)

In the modified_employee_data, we have two employees, E001 and E006, who work under the HR department and have a minimum salary of 60000.

Let’s fetch only one minimum value:

# Only the Minimum Value (No Employee Details)
modified_employee_data %>%
  group_by(Department) %>%
  summarise(min_salary = min(Salary))

Output

Retrieve all rows with the minimum salary

Even if there are multiple minimum value salary rows by group, we will fetch each row using the dplyr filter() function.

library(dplyr)

# Two Employees with Minimum Salary
modified_employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007", "E008"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance", "HR"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 60000, 77000, 62000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY", "TX"))

print(modified_employee_data)

# Retrieve All Rows with the Minimum Salary
modified_employee_data %>%
  group_by(Department) %>%
  filter(Salary == min(Salary))

Output

The filter() function filters out all the rows except the minimum Salary rows by group.

Method 2: Using aggregate()

If your dataset is small and you don’t want to use any third-party package, you can use the base R’s aggregate() function.

Let’s return to the original data frame and use the aggregate() function to find the minimum salary by location.

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

# Using aggregate
# Group-wise minimum salary using aggregate() by location
aggregate(Salary ~ Location, data = employee_data, FUN = min)

Output

Method 3: Using data.table

If your input dataset is extremely large, you should consider the data table approach. In large data sets, data table is very efficient and faster than all the other methods. The output is a data table.

library(data.table)

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

# Using data.table
employee_data_dt <- data.table(employee_data)

employee_data_dt[, .(min_salary = min(Salary)), by = Department]

Output

That’s all!

Recent Posts

cbind() Function: Binding R Objects by Columns

R cbind (column bind) is a function that combines specified vectors, matrices, or data frames…

1 week ago

rbind() Function: Binding Rows in R

The rbind() function combines R objects, such as vectors, matrices, or data frames, by rows.…

1 week ago

as.numeric(): Converting to Numeric Values in R

The as.numeric() function in R converts valid non-numeric data into numeric data. What do I…

2 weeks ago

Calculating Natural Log using log() Function in R

The log() function calculates the natural logarithm (base e) of a numeric vector. By default,…

3 weeks ago

Dollar Sign ($ Operator) in R

In R, you can use the dollar sign ($ operator)  to access elements (columns) of…

1 month ago

Calculating Absolute Value using abs() Function in R

The abs() function calculates the absolute value of a numeric input, returning a non-negative (only…

1 month ago