R dplyr::filter() Function: Complete Guide

The dplyr filter() function in R subsets a data frame and retains all rows that satisfy the conditions. In other words, you can select the data frame rows based on conditions.

To retain rows, they should produce the output to TRUE, and if they return NA, they will be dropped from the data frame.

Syntax

filter(.df, ..., .by = NULL, .preserve = FALSE)

Parameters

Name Value
.df It is an input data frame or tibble.
These are logical conditions for filtering rows. You can combine multiple conditions with the &(AND) operator.
.by (Optional) Grouping specification (alternative to group_by()).
.preserve (Optional) If TRUE, preserves grouping structure (relevant for grouped data).

Sample Data Frame

Here is the sample data frame we will use for this tutorial:

student_grades_df <- data.frame(
  StudentID = 101:107,
  Name = c("Emma", "Sydney", "Mia", "Khushi", "Yogita", "Rachel", "Gracy"),
  Grade = c("A", "B", "C", "B+", "A-", "B", "A"),
  Score = c(95, 85, 72, 88, 90, 84, 96),
  Subject = c("Math", "Science", "English", "History", "Art", "Math", "Science")
)

print(student_grades_df)

R Data Frame

Also, install the dplyr library if you have not already and load it in your file at the head of the code:

library(dplyr)

Basic Numeric Filter

Let’s create a basic filter that selects the rows with student scores greater than 85.

library(dplyr)

student_grades_df <- data.frame(
  StudentID = 101:107,
  Name = c("Emma", "Sydney", "Mia", "Khushi", "Yogita", "Rachel", "Gracy"),
  Grade = c("A", "B", "C", "B+", "A-", "B", "A"),
  Score = c(95, 85, 72, 88, 90, 84, 96),
  Subject = c("Math", "Science", "English", "History", "Art", "Math", "Science")
)

filtered_rows <- student_grades_df %>% filter(Score > 85)

print(filtered_rows)

Output

Output of basic filter() in R

Dplyr filter() with multiple conditions

If you have multiple conditions, you can use the logical AND (&) / OR (|) operator to club them.

AND (&) operator

What if we want to get data with scores greater than 85, and the subject is Math? Let’s implement these conditions using the & operator.

Multiple conditions with AND operator

library(dplyr)

student_grades_df <- data.frame(
  StudentID = 101:107,
  Name = c("Emma", "Sydney", "Mia", "Khushi", "Yogita", "Rachel", "Gracy"),
  Grade = c("A", "B", "C", "B+", "A-", "B", "A"),
  Score = c(95, 85, 72, 88, 90, 84, 96),
  Subject = c("Math", "Science", "English", "History", "Art", "Math", "Science")
)

# Multiple Conditions (AND)
filtered_multiple <- student_grades_df %>% filter(Score > 85 & Subject == "Math")

print(filtered_multiple)

Output

Dplyr filter() with multiple conditions using AND operator

Only one row with StudentID 101 satisfies these conditions.

OR (|) operator

If you have a scenario in which you have multiple conditions but only at least one of those conditions is TRUE, you should use the | operator inside the filter() function.

Let’s consider conditions where the score exceeds 85 or the grade is exactly A.

filter() with OR operator in R

library(dplyr)

student_grades_df <- data.frame(
  StudentID = 101:107,
  Name = c("Emma", "Sydney", "Mia", "Khushi", "Yogita", "Rachel", "Gracy"),
  Grade = c("A", "B", "C", "B+", "A-", "B", "A"),
  Score = c(95, 85, 72, 88, 90, 84, 96),
  Subject = c("Math", "Science", "English", "History", "Art", "Math", "Science")
)


# Multiple Conditions (OR)

filtered_or_rows <- student_grades_df %>% filter(Score > 85 | Grade == "A")
print(filtered_or_rows)

Output

Output of OR (|) operator

 

With %in% operator

You can use the %in% operator inside the filter() function to select rows where a column’s value matches any value in a given data frame.

Let’s select students that come under Math or Science subjects.

Use filter() with in operator in R

library(dplyr)

student_grades_df <- data.frame(
  StudentID = 101:107,
  Name = c("Emma", "Sydney", "Mia", "Khushi", "Yogita", "Rachel", "Gracy"),
  Grade = c("A", "B", "C", "B+", "A-", "B", "A"),
  Score = c(95, 85, 72, 88, 90, 84, 96),
  Subject = c("Math", "Science", "English", "History", "Art", "Math", "Science")
)


# Filter with %in%
filter_in_rows <- student_grades_df %>% filter(Subject %in% c("Math", "Science"))

print(filter_in_rows)

Output

Output with in operator

With “not in” (! %in%) operator

If you want to filter out rows where a column’s values are present in a specific data frame, you can use the “not in” (! %in%) operator with the filter() function.

Let’s filter out rows where students took subjects of Maths or Science. Select all the other rows.

Use filter() with not in operator

library(dplyr)

student_grades_df <- data.frame(
  StudentID = 101:107,
  Name = c("Emma", "Sydney", "Mia", "Khushi", "Yogita", "Rachel", "Gracy"),
  Grade = c("A", "B", "C", "B+", "A-", "B", "A"),
  Score = c(95, 85, 72, 88, 90, 84, 96),
  Subject = c("Math", "Science", "English", "History", "Art", "Math", "Science")
)

# Using filter() with "not in" (! %in%)
rows_not_in_df <- student_grades_df %>% filter(!(Subject %in% c("Math", "Science")))

print(rows_not_in_df)

Output

Output with not in operator

String matching

You can create a filter that selects only rows where a string column includes a specific substring using the grepl() or stringr::str_detect() function.

Let’s create a condition that selects only rows with a Name column containing the “i” character.

Using dplyr filter() with string includes specific character

library(dplyr)

student_grades_df <- data.frame(
  StudentID = 101:107,
  Name = c("Emma", "Sydney", "Mia", "Khushi", "Yogita", "Rachel", "Gracy"),
  Grade = c("A", "B", "C", "B+", "A-", "B", "A"),
  Score = c(95, 85, 72, 88, 90, 84, 96),
  Subject = c("Math", "Science", "English", "History", "Art", "Math", "Science")
)

# dplyr filter string includes
string_matching_df <- student_grades_df %>% filter(grepl("i", Name, ignore.case = TRUE))

print(string_matching_df)

Output

Selecting rows that contain specific characters

Group filtering

Group filtering is helpful when rows must be filtered based on conditions applied within groups rather than the entire dataset.

Let’s select rows based on the grouping of subjects, and each row contains a score greater than 85.

Grouped filtering

library(dplyr)

student_grades_df <- data.frame(
  StudentID = 101:107,
  Name = c("Emma", "Sydney", "Mia", "Khushi", "Yogita", "Rachel", "Gracy"),
  Grade = c("A", "B", "C", "B+", "A-", "B", "A"),
  Score = c(95, 85, 72, 88, 90, 84, 96),
  Subject = c("Math", "Science", "English", "History", "Art", "Math", "Science")
)

# Grouped Filtering
group_filtering_df <- student_grades_df %>%
  group_by(Subject) %>%
  filter(Score > 85)

print(group_filtering_df)

Output

Output of Group filtering

Please note that here in output, we get the tibble instead of the data frame. We got the records of students whose score is greater than 85 subject-wise. Tibble is an extended version of the data frame.

Filtering across multiple columns

The dplyr across() function is helpful when filtering based on multiple columns. You can use if_any() or if_all() depending on whether you want to filter rows where at least one or all of the selected columns satisfy a condition.

Filtering across multiple columns

library(dplyr)

student_grades_df <- data.frame(
  StudentID = 101:107,
  Name = c("Emma", "Sydney", "Mia", "Khushi", "Yogita", "Rachel", "Gracy"),
  Grade = c("A", "B", "C", "B+", "A-", "B", "A"),
  Score = c(95, 85, 72, 88, 90, 84, 96),
  Subject = c("Math", "Science", "English", "History", "Art", "Math", "Science")
)

# Filter Across Multiple Columns
# Keep rows where ALL conditions are met

across_df <- student_grades_df %>% filter(if_all(c(Grade, Score), ~ . > 85))

print(across_df)

Output

Filtering across multiple columns

Handling NA values

If you want to filter out rows with NA values using the filter() function, you can achieve it by combining it with is.na() function.

Removing Rows with NA values

library(dplyr)

student_grades_df <- data.frame(
  StudentID = 101:107,
  Name = c("Emma", "Sydney", NA, "Khushi", "Yogita", "Rachel", NA),
  Grade = c("A", "B", NA, "B+", "A-", "B", NA),
  Score = c(95, 85, 72, 88, 90, 84, 96),
  Subject = c("Math", "Science", NA, "History", "Art", "Math", "Science")
)

print(student_grades_df)

# Filtering Rows with NA Values
filter_na_df <- student_grades_df %>% filter(!is.na(Name))

print(filter_na_df)

Output

Output of filtering NA values using filter() function

That’s all!

Leave a Comment