R dplyr

R dplyr::filter() Function: Complete Guide

The dplyr filter() function in R subsets a data frame and retains all rows that satisfy the conditions. In other words, you can select the data frame rows based on conditions.

To retain rows, they should produce the output to TRUE, and if they return NA, they will be dropped from the data frame.

Syntax

filter(.df, ..., .by = NULL, .preserve = FALSE)

Parameters

Name Value
.df It is an input data frame or tibble.
These are logical conditions for filtering rows. You can combine multiple conditions with the &(AND) operator.
.by (Optional) Grouping specification (alternative to group_by()).
.preserve (Optional) If TRUE, preserves grouping structure (relevant for grouped data).

Sample Data Frame

Here is the sample data frame we will use for this tutorial:

student_grades_df <- data.frame(
  StudentID = 101:107,
  Name = c("Emma", "Sydney", "Mia", "Khushi", "Yogita", "Rachel", "Gracy"),
  Grade = c("A", "B", "C", "B+", "A-", "B", "A"),
  Score = c(95, 85, 72, 88, 90, 84, 96),
  Subject = c("Math", "Science", "English", "History", "Art", "Math", "Science")
)

print(student_grades_df)

Also, install the dplyr library if you have not already and load it in your file at the head of the code:

library(dplyr)

Basic Numeric Filter

Let’s create a basic filter that selects the rows with student scores greater than 85.

library(dplyr)

student_grades_df <- data.frame(
  StudentID = 101:107,
  Name = c("Emma", "Sydney", "Mia", "Khushi", "Yogita", "Rachel", "Gracy"),
  Grade = c("A", "B", "C", "B+", "A-", "B", "A"),
  Score = c(95, 85, 72, 88, 90, 84, 96),
  Subject = c("Math", "Science", "English", "History", "Art", "Math", "Science")
)

filtered_rows <- student_grades_df %>% filter(Score > 85)

print(filtered_rows)

Output

Dplyr filter() with multiple conditions

If you have multiple conditions, you can use the logical AND (&) / OR (|) operator to club them.

AND (&) operator

What if we want to get data with scores greater than 85, and the subject is Math? Let’s implement these conditions using the & operator.

library(dplyr)

student_grades_df <- data.frame(
  StudentID = 101:107,
  Name = c("Emma", "Sydney", "Mia", "Khushi", "Yogita", "Rachel", "Gracy"),
  Grade = c("A", "B", "C", "B+", "A-", "B", "A"),
  Score = c(95, 85, 72, 88, 90, 84, 96),
  Subject = c("Math", "Science", "English", "History", "Art", "Math", "Science")
)

# Multiple Conditions (AND)
filtered_multiple <- student_grades_df %>% filter(Score > 85 & Subject == "Math")

print(filtered_multiple)

Output

Only one row with StudentID 101 satisfies these conditions.

OR (|) operator

If you have a scenario in which you have multiple conditions but only at least one of those conditions is TRUE, you should use the | operator inside the filter() function.

Let’s consider conditions where the score exceeds 85 or the grade is exactly A.

library(dplyr)

student_grades_df <- data.frame(
  StudentID = 101:107,
  Name = c("Emma", "Sydney", "Mia", "Khushi", "Yogita", "Rachel", "Gracy"),
  Grade = c("A", "B", "C", "B+", "A-", "B", "A"),
  Score = c(95, 85, 72, 88, 90, 84, 96),
  Subject = c("Math", "Science", "English", "History", "Art", "Math", "Science")
)


# Multiple Conditions (OR)

filtered_or_rows <- student_grades_df %>% filter(Score > 85 | Grade == "A")
print(filtered_or_rows)

Output

 

With %in% operator

You can use the %in% operator inside the filter() function to select rows where a column’s value matches any value in a given data frame.

Let’s select students that come under Math or Science subjects.

library(dplyr)

student_grades_df <- data.frame(
  StudentID = 101:107,
  Name = c("Emma", "Sydney", "Mia", "Khushi", "Yogita", "Rachel", "Gracy"),
  Grade = c("A", "B", "C", "B+", "A-", "B", "A"),
  Score = c(95, 85, 72, 88, 90, 84, 96),
  Subject = c("Math", "Science", "English", "History", "Art", "Math", "Science")
)


# Filter with %in%
filter_in_rows <- student_grades_df %>% filter(Subject %in% c("Math", "Science"))

print(filter_in_rows)

Output

With “not in” (! %in%) operator

If you want to filter out rows where a column’s values are present in a specific data frame, you can use the “not in” (! %in%) operator with the filter() function.

Let’s filter out rows where students took subjects of Maths or Science. Select all the other rows.

library(dplyr)

student_grades_df <- data.frame(
  StudentID = 101:107,
  Name = c("Emma", "Sydney", "Mia", "Khushi", "Yogita", "Rachel", "Gracy"),
  Grade = c("A", "B", "C", "B+", "A-", "B", "A"),
  Score = c(95, 85, 72, 88, 90, 84, 96),
  Subject = c("Math", "Science", "English", "History", "Art", "Math", "Science")
)

# Using filter() with "not in" (! %in%)
rows_not_in_df <- student_grades_df %>% filter(!(Subject %in% c("Math", "Science")))

print(rows_not_in_df)

Output

String matching

You can create a filter that selects only rows where a string column includes a specific substring using the grepl() or stringr::str_detect() function.

Let’s create a condition that selects only rows with a Name column containing the “i” character.

library(dplyr)

student_grades_df <- data.frame(
  StudentID = 101:107,
  Name = c("Emma", "Sydney", "Mia", "Khushi", "Yogita", "Rachel", "Gracy"),
  Grade = c("A", "B", "C", "B+", "A-", "B", "A"),
  Score = c(95, 85, 72, 88, 90, 84, 96),
  Subject = c("Math", "Science", "English", "History", "Art", "Math", "Science")
)

# dplyr filter string includes
string_matching_df <- student_grades_df %>% filter(grepl("i", Name, ignore.case = TRUE))

print(string_matching_df)

Output

Group filtering

Group filtering is helpful when rows must be filtered based on conditions applied within groups rather than the entire dataset.

Let’s select rows based on the grouping of subjects, and each row contains a score greater than 85.

library(dplyr)

student_grades_df <- data.frame(
  StudentID = 101:107,
  Name = c("Emma", "Sydney", "Mia", "Khushi", "Yogita", "Rachel", "Gracy"),
  Grade = c("A", "B", "C", "B+", "A-", "B", "A"),
  Score = c(95, 85, 72, 88, 90, 84, 96),
  Subject = c("Math", "Science", "English", "History", "Art", "Math", "Science")
)

# Grouped Filtering
group_filtering_df <- student_grades_df %>%
  group_by(Subject) %>%
  filter(Score > 85)

print(group_filtering_df)

Output

Please note that here in output, we get the tibble instead of the data frame. We got the records of students whose score is greater than 85 subject-wise. Tibble is an extended version of the data frame.

Filtering across multiple columns

The dplyr across() function is helpful when filtering based on multiple columns. You can use if_any() or if_all() depending on whether you want to filter rows where at least one or all of the selected columns satisfy a condition.

library(dplyr)

student_grades_df <- data.frame(
  StudentID = 101:107,
  Name = c("Emma", "Sydney", "Mia", "Khushi", "Yogita", "Rachel", "Gracy"),
  Grade = c("A", "B", "C", "B+", "A-", "B", "A"),
  Score = c(95, 85, 72, 88, 90, 84, 96),
  Subject = c("Math", "Science", "English", "History", "Art", "Math", "Science")
)

# Filter Across Multiple Columns
# Keep rows where ALL conditions are met

across_df <- student_grades_df %>% filter(if_all(c(Grade, Score), ~ . > 85))

print(across_df)

Output

Handling NA values

If you want to filter out rows with NA values using the filter() function, you can achieve it by combining it with is.na() function.

library(dplyr)

student_grades_df <- data.frame(
  StudentID = 101:107,
  Name = c("Emma", "Sydney", NA, "Khushi", "Yogita", "Rachel", NA),
  Grade = c("A", "B", NA, "B+", "A-", "B", NA),
  Score = c(95, 85, 72, 88, 90, 84, 96),
  Subject = c("Math", "Science", NA, "History", "Art", "Math", "Science")
)

print(student_grades_df)

# Filtering Rows with NA Values
filter_na_df <- student_grades_df %>% filter(!is.na(Name))

print(filter_na_df)

Output

That’s all!

Share
Published by
Krunal Lathiya

Recent Posts

summary() Function: Producing Summary Statistics in R

The summary() is a generic function that produces the summary statistics for various R objects,…

1 day ago

R paste() Function

The paste() function in R concatenates vectors after converting them to character. paste("Hello", 19, 21,…

1 week ago

paste0() Function in R

R paste0() function concatenates strings without any separator between them. It is a shorthand version…

1 week ago

How to Calculate Standard Error in R

Standard Error (SE) measures the variability or dispersion of the sample mean estimate of a…

2 weeks ago

R max() and min() Functions

max() The max() function in R finds the maximum value of a vector or data…

2 weeks ago

R as.Date() Function: Working with Dates

The as.Date() function in R converts various types of date and time objects or character…

3 weeks ago