R Advanced

How to Find the Minimum Value By Group in R

The most common and efficient way to find the minimum value by group is to use the dplyr package. It provides group_by(), summarise(), and filter() methods to extract the smallest value based on our requirement.

Finding the minimum value by group means getting the smallest value within each group in our data frame. For example, if I have a column defined as a “group” and another column called “value” that contains different numeric values, I will extract the minimum value from the “value” column for each distinct group.

Here is the demo data frame that forms the basis of this tutorial:

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

Method 1: Using dplyr

Using dplyr’s group_by() method, we can divide the data frame based on the “Department” column. In other words, we can create a subgroup based on the unique Department values.

Then, using dplyr’s summarise(), we can find the minimum value for each unique group using the min() function.

Finding the Minimum Salary by Department

library(dplyr)

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

# Minimum Salary by Department
employee_data %>%
  group_by(Department) %>%
  summarise(min_salary = min(Salary))

Output

The above output image shows the department-wise minimum salary. 

Two identical minimum values

What if we have a data frame with two rows with the same minimum values in a single group? How do we deal with that? If you use dplyr’s group_by() and summarise(), it will return a single minimum value, and the other will be discarded from the output.

Let’s say we have the following modified data frame.

library(dplyr)

# data frame with two same minimum values
modified_employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007", "E008"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance", "HR"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 60000, 77000, 62000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY", "TX")
)

print(modified_employee_data)

In the modified_employee_data, we have two employees, E001 and E006, who work under the HR department and have a minimum salary of 60000.

Let’s fetch only one minimum value:

# Only the Minimum Value (No Employee Details)
modified_employee_data %>%
  group_by(Department) %>%
  summarise(min_salary = min(Salary))

Output

Retrieve all rows with the minimum salary

Even if there are multiple minimum value salary rows by group, we will fetch each row using the dplyr filter() function.

library(dplyr)

# Two Employees with Minimum Salary
modified_employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007", "E008"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance", "HR"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 60000, 77000, 62000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY", "TX"))

print(modified_employee_data)

# Retrieve All Rows with the Minimum Salary
modified_employee_data %>%
  group_by(Department) %>%
  filter(Salary == min(Salary))

Output

The filter() function filters out all the rows except the minimum Salary rows by group.

Method 2: Using aggregate()

If your dataset is small and you don’t want to use any third-party package, you can use the base R’s aggregate() function.

Let’s return to the original data frame and use the aggregate() function to find the minimum salary by location.

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

# Using aggregate
# Group-wise minimum salary using aggregate() by location
aggregate(Salary ~ Location, data = employee_data, FUN = min)

Output

Method 3: Using data.table

If your input dataset is extremely large, consider using the data table approach. In large data sets, a data table is very efficient and faster than all the other methods. The output is a data table.

library(data.table)

employee_data <- data.frame(
  Employee_ID = c("E001", "E002", "E003", "E004", "E005", "E006", "E007"),
  Department = c("HR", "IT", "Finance", "Marketing", "IT", "HR", "Finance"),
  Salary = c(60000, 80000, 75000, 70000, 85000, 62000, 77000),
  Location = c("TX", "CA", "NY", "IL", "CA", "CA", "NY")
)

print(employee_data)

# Using data.table
employee_data_dt <- data.table(employee_data)

employee_data_dt[, .(min_salary = min(Salary)), by = Department]

Output

That’s all!

Recent Posts

R scale(): Scaling and Centering of Matrix-like Objects

The scale() function in R centers (subtracting the mean) and/or scales (dividing by the standard…

2 weeks ago

file.rename(): Renaming Single and Multiple Files in R

To rename a file in R, you can use the file.rename() function. It renames a…

3 weeks ago

R prop.table() Function

The prop.table() function in R calculates the proportion or relative frequency of values in a…

3 weeks ago

exp() Function: Calculate Exponential of a Number in R

The exp() is a built-in function that calculates the exponential of its input, raising Euler's…

3 weeks ago

R split() Function: Splitting a Data

The split() function divides the input data into groups based on some criteria, typically specified…

4 weeks ago

colMeans(): Calculating the Mean of Columns in R Data Frame

The colMeans() function in R calculates the arithmetic mean of columns in a numeric matrix,…

1 month ago