R Advanced

How to Select Rows by Single or Multiple Conditions in R

Here are the two most prominent ways to select rows by single or multiple conditions in R:

  1. Subsetting with [ ]
  2. Using dplyr::filter()

To isolate the data pertinent to our analysis, we must filter rows based on conditions from a data frame. Conditions help us uncover patterns and trends within our data.

Method 1: Subsetting with [ ]

Subsetting with square brackets ([ ]) is a basic way to select rows where the column meets a specific condition. For example, df[df$column_name > value, ].

Here is the input data frame:

df <- data.frame(
  name = c("Millie", "Yogita", "KMJ"),
  score = c(90, 95, 77),
  subject = c("Biology", "Biology", "Biology"),
  grade = c(12, 12, 11)
)

Single condition

Let’s apply a single condition to filter rows using subsetting.

Fetch me the rows where ‘score’ > 85.

df <- data.frame(
  name = c("Millie", "Yogita", "KMJ"),
  score = c(90, 95, 77),
  subject = c("Biology", "Biology", "Biology"),
  grade = c(12, 12, 11)
)

df[df$score > 85, ]

Our central column to filter is “score”, and you can select any column of the data frame using the “df$score” syntax, which we have done.

For filtering, we used a greater than (>) sign to select only columns with scores greater than 85.

Multiple conditions

To apply multiple conditions at once, you can use the And (&)/Or (|) operator. But make sure to use the vector logical operators & and |, not the scalar ones && or ||.

Let’s select only rows whose score is > 70 & grade == 11.

df <- data.frame(
  name = c("Millie", "Yogita", "KMJ"),
  score = c(90, 95, 77),
  subject = c("Biology", "Biology", "Biology"),
  grade = c(12, 12, 11)
)

df[df$score > 70 & df$grade == 11, ]

After filtering, only row number 3 satisfied our conditions, so it was included in the output, as shown in the figure above.

Handling NA values

If there are NAs in the column, the condition might result in NA, which would be treated as FALSE in subsetting.

Using a function like is.na(), we can check for any NA values before subsetting on it.

Although our data frame does not contain any NA values, we can use a different data frame that contains NA values.

df <- data.frame(
  name = c("Millie", "Yogita", NA),
  score = c(90, 95, NA),
  subject = c("Biology", "Biology", "Biology"),
  grade = c(12, 12, 11)
)

df[is.na(df$score) | df$age > 70, ]

Output

[1]  name  score  subject  grade
<0 rows> (or 0-length row.names)

Since our data frame, df, contains NA values, we get zero rows in the output.

String matching with grepl()

Let’s select rows based on specific string column values using the “grepl()” function.

df <- data.frame(
  name = c("Millie", "Yogita", "KMJ"),
  score = c(90, 95, 77),
  subject = c("Biology", "Biology", "Biology"),
  grade = c(12, 12, 11)
)

df[grepl("gita", df$name), ]

Method 2: Using dplyr::filter()

The dplyr filter() function subsets a data frame, retaining all rows that satisfy your conditions. For example df %>% filter(column_name > value).

The dplyr is a third-party package, so we need to install and load it in our program:

install.packages("dplyr")


library(dplyr)

Single condition

Let’s select rows whose score is > 80 using the df %>% filter() function.

library(dplyr)

df <- data.frame(
  name = c("Millie", "Yogita", "KMJ"),
  score = c(90, 95, 77),
  subject = c("Biology", "Biology", "Biology"),
  grade = c(12, 12, 11)
)

df %>% filter(score > 80)

Multiple conditions

For multiple conditions, we can use: df %>% filter(first_column > value1, second_column == “value2”) or use “&” between conditions.

library(dplyr)

df <- data.frame(
  name = c("Millie", "Yogita", "KMJ"),
  score = c(90, 95, 77),
 subject = c("Biology", "Biology", "Biology"),
  grade = c(12, 12, 11)
)

df %>% filter(score > 70 & grade == 11)

Both methods provide flexibility; you can choose whichever method suits your workflow and readability needs.

Recent Posts

colSums(): Calculating the Sum of Columns of a Data Frame in R

The colSums() function in R calculates the sums of columns for numeric matrices, data frames,…

4 days ago

rowSums(): Calculating the Sum of Rows of a Matrix or Data Frame in R

The rowSums() function calculates the sum of values in each numeric row of a matrix,…

1 week ago

R View() Function

The View() is a utility function in R that invokes a more intuitive spreadsheet-style data…

2 weeks ago

summary() Function: Producing Summary Statistics in R

The summary() is a generic function that produces the summary statistics for various R objects,…

3 weeks ago

R paste() Function

The paste() function in R concatenates vectors after converting them to character. paste("Hello", 19, 21,…

4 weeks ago

paste0() Function in R

R paste0() function concatenates strings without any separator between them. It is a shorthand version…

4 weeks ago