R select_if() Function from dplyr

The dplyr::select_if() function selects columns from a data frame or tibble based on a specified condition. It allows you to choose columns dynamically based on whether they meet a specific criterion.

This function selects columns from a data frame or tibble where a specified predicate function returns TRUE.

This is helpful when dynamically selecting columns based on specific properties like data type, names, or other custom conditions.

Syntax

select_if(.tbl, .predicate, ...)

Parameters

Name Description
.tbl It is a data frame or tibble from which columns will be selected.
.predicate It is a predicate function that returns TRUE or FALSE for each column.
Additional arguments passed to the predicate function.

Return value

It returns a modified data frame or tibble containing only the selected columns.

Example 1: Selecting numeric columns

Basic understanding of select_if() Function from dplyr in R

library(dplyr)

df <- data.frame(
  a = c(1, 2, 3),
  b = c("A", "B", "C"),
  c = c(7, 8, 9)
)

# Select all numeric columns
df %>% select_if(is.numeric)

Output

Output of Selecting numeric columns

Example 2: Selecting columns based on multiple custom conditions

Selecting columns based on multiple custom conditions

library(dplyr)

df <- data.frame(
  a = c(1, 2, 3),
  b = c("A", "B", "C"),
  c = c(7, 8, 9)
)

# Select columns where the mean is greater than 2,
# but only for numeric columns

df %>% select_if(~ is.numeric(.) && mean(.) > 2)

Output

   c
1  7
2  8
3  9

Here is a brief explanation:

  1. The is.numeric(.) function checks if the column is numeric.
  2. The mean(.) > 2 calculates the mean and checks if it’s greater than 2.
  3. The logical && ensures that the mean is only calculated for numeric columns, thus avoiding warning.

Leave a Comment