R Advanced

How to Find the Minimum Value By Group in R

Finding the minimum value by group means getting the smallest value within each group in our data frame. For example, if I have a column that I defined as a “group” and another column called “value” that contains different numeric values, then I will extract the minimum value from the value column for each distinct group.

The most common and efficient way to find the minimum value by group is to use the dplyr package. It provides group_by(), summarise(), and filter() methods to extract the smallest value based on our requirement.

Here is the demo data frame that forms the basis of this tutorial:

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

Method 1: Using dplyr

Using dplyr’s group_by() method, we can divide the data frame based on the “Department” column. Meaning, we can create a subgroup based on the unique Department values.

Then, using dplyr’s summarise(), we can find the minimum value for each unique group using the min() function.

Finding the minimum Salary by Department

library(dplyr)

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

# Minimum Salary by Department
employee_data %>%
  group_by(Department) %>%
  summarise(min_salary = min(Salary))

Output

The above output image shows the department-wise minimum salary. 

Two same minimum values

What if we have a data frame with two rows with the same minimum values in a single group? How do we deal with that? If you use dplyr’s group_by() and summarise(), it will return a single minimum value, and the other will be discarded from the output.

Let’s say we have the following modified data frame.

library(dplyr)

# data frame with two same minimum values
modified_employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007", "E008"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance", "HR"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 60000, 77000, 62000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY", "TX")
)

print(modified_employee_data)

In the modified_employee_data, we have two employees, E001 and E006, who work under the HR department and have a minimum salary of 60000.

Let’s fetch only one minimum value:

# Only the Minimum Value (No Employee Details)
modified_employee_data %>%
  group_by(Department) %>%
  summarise(min_salary = min(Salary))

Output

Retrieve all rows with the minimum salary

Even if there are multiple minimum value salary rows by group, we will fetch each row using the dplyr filter() function.

library(dplyr)

# Two Employees with Minimum Salary
modified_employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007", "E008"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance", "HR"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 60000, 77000, 62000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY", "TX"))

print(modified_employee_data)

# Retrieve All Rows with the Minimum Salary
modified_employee_data %>%
  group_by(Department) %>%
  filter(Salary == min(Salary))

Output

The filter() function filters out all the rows except the minimum Salary rows by group.

Method 2: Using aggregate()

If your dataset is small and you don’t want to use any third-party package, you can use the base R’s aggregate() function.

Let’s return to the original data frame and use the aggregate() function to find the minimum salary by location.

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

# Using aggregate
# Group-wise minimum salary using aggregate() by location
aggregate(Salary ~ Location, data = employee_data, FUN = min)

Output

Method 3: Using data.table

If your input dataset is extremely large, you should consider the data table approach. In large data sets, data table is very efficient and faster than all the other methods. The output is a data table.

library(data.table)

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

# Using data.table
employee_data_dt <- data.table(employee_data)

employee_data_dt[, .(min_salary = min(Salary)), by = Department]

Output

That’s all!

Recent Posts

How to Set and Get Working Directory [setwd() and getwd()] in R

Set the current working directory The setwd() function sets the working directory to the new…

2 days ago

Standard deviation in R [Using sd() Function]

The sd() function in R calculates the sample standard deviation of a numeric vector or…

3 days ago

R dnorm(): Probability Density Function

The dnorm() function in R calculates the value of the probability density function (pdf) of…

4 days ago

R rep() Function: Repeating Elements of a Vector

R rep() is a generic function that replicates elements of vectors and lists for a…

1 week ago

Splitting Strings: A Beginner’s Guide to strsplit() in R

The strsplit() function in R splits elements of a character vector into a list of…

1 week ago

Understanding of rnorm() Function in R

The rnorm() method in R generates random numbers from a normal (Gaussian) distribution, which is…

2 weeks ago