R filter() Function from dplyr Package

The filter() function from the dplyr package is used to subset a data frame, retaining all rows that satisfy your conditions which is essential for efficient data analysis.

Syntax

filter(.df, .conditions., .by = NULL, .preserve = FALSE)

Parameters

  1. .df: It is a data frame.
  2. .conditions: It is an expression that returns a logical value. You can use a wide range of conditions, including logical operators (==, !=, >, <, >=, <=) and functions. Multiple conditions can be combined using & (and) or | (or).
  3. .by: It is a selection of columns to group by for just this operation.
  4. .preserve: It is relevant when the .df input is grouped.

Return value

It returns rows that are a subset of the input but appear in the same order. The data type of return value is the same as .df.

Useful filter functions

Condition Description
== Checks if values are equal.
!= Checks if values are not equal.
> Checks if a value is greater than another.
>= Greater than or equal to.
< Less than
<= Less than or equal to.
& Logical AND. Both conditions must be true.
| Logical OR. Either condition can be true.
! Logical NOT. Negates a condition.
xor() Exclusive OR. True if either condition is true, but not both.
is.na() Checks for NA (missing) values.
between() Check if a numeric value lies between two other values.
near() Checks if values are approximately equal, useful for floating point comparisons.

Visual representation

Visual representation of filter() function in R

Example 1: Usage of filter() function

For this example, let’s say we want to filter out rows where the score is greater than 80 and the subject is “Math”.

library(dplyr)

df <- data.frame(
  name = c("Krunal", "Ankit", "Rushabh"),
  score = c(85, 90, 78),
  subject = c("Math", "Math", "History"),
  grade = c("10th", "11th", "11th")
)

filtered_df <- df %>% filter(score > 80, subject == "Math")
filtered_df

Output

Output of R filter() Function

score > 80: This condition selects rows where the score column has values greater than 80.

subject == “Math”: This condition selects rows where the subject column exactly matches the string “Math”.

Example 2: Combine multiple conditions

You can combine multiple conditions using logical operators like & (and), | (or) as mentioned in the above filter functions table.

library(dplyr)

df <- data.frame(
  name = c("Krunal", "Ankit", "Rushabh", "Niva", "Hemang"),
  score = c(85, 90, 78, 95, 80),
  subject = c("Math", "Math", "History", "History", "Math"),
  grade = c("10th", "11th", "11th", "10th", "11th")
)

# Filter rows where score is greater than 80
filtered_output <- filter(df, score > 70 & grade == "10th")
filtered_output

Output

   name    score   subject   grade
1  Krunal   85     Math      10th
2  Niva     95     History   10th

Grouped tibbles

Grouped tibbles allow for concise and readable code for performing group-wise operations. They are particularly efficient for large datasets, as dplyr is optimized for performance.

To calculate the average score for each subject, you can use: df %>% group_by(subject) %>% summarize(average_score = mean(score)).

Visual representation of Grouped tibbles

library(dplyr)

df <- data.frame(
  name = c("Krunal", "Ankit", "Rushabh"),
  score = c(85, 90, 78),
  subject = c("Math", "Math", "History"),
  grade = c("10th", "11th", "11th")
)

filtered_df <- df %>%
  group_by(subject) %>%
  summarize(average_score = mean(score))

filtered_df

Output

Output of grouped tibble

That’s it!

Leave a Comment