What is subset() Function in R

The subset() function in R creates subsets of a data frame. It can also be used to drop columns from a data frame. The syntax is a subset(df, expr), where df is the data frame, and expr is an expression that specifies the rows to be included in the subset.

Syntax

subset(x, subset, select, drop = FALSE, …)

Parameters

  1. x – Object to be subsetted. It could be any of the vector data.frame, & matrices.
  2. subset – It is a subset expression.
  3. select – The columns to select in a vector.
  4. drop – Passed on to the indexing method for matrices and data frames.
  5. – Other arguments.

Return value

The subset() function returns the subset of rows from the data frame based on a list of row names, a list of values, and based on conditions.

Example 1: Using the subset() by row name

You can use the subset() function to get a subset of rows from a data frame based on row names. You can specify a vector of required row names and use the %in% operator to check for the presence of data frame row names in that vector.

df <- data.frame(x = 1:3, y = c("a", "b", "c"))
rownames(df) <- c("A", "B", "C")

# Subset by row name
dfsub <- subset(df, rownames(df) %in% c("A", "C"))
dfsub

Output

   x   y
A  1   a
C  3   c

In the above code example, we created a data frame df with three rows and two columns. The rows are named “A”, “B”, and “C”.

Then we used the subset() function to create a new data frame dfsub that contains only the rows with names “A” and “C”.

Example 2: Using the subset() function by a list of values

You can use the subset() function to get a subset of rows from a data frame based on a list of values. You can create a vector with the list of values and use the %in% operator on condition to the subset() function.

df <- data.frame(x = 1:3, y = c("a", "b", "c"))
rownames(df) <- c("A", "B", "C")

# Subset by row name
dfsub <- subset(df, x %in% c(1, 2))
dfsub

Output

   x   y
A  1   a
B  2   b

In the above code, we created data frame df with three rows and two columns.

The rows are named “A”, “B”, and “C”.

Then, we used the subset() function to create a new data frame dfsub that contains only the rows where column x has values 1 and 2.

Example 3: Using the subset() columns by Name

You can use the subset() function to get a subset of columns from a data frame based on column names. You can use the select argument with either a single column name or a vector of column names.

df <- data.frame(x = 1:3, y = c("a", "b", "c"), z = c("A", "B", "C"))

# Subset by column name
dfsub <- subset(df, select = c("x", "z"))
dfsub

Output

   x   z
1  1   A
2  2   B
3  3   C

You can see that we created a data frame df with three rows and three columns.

Then, it uses the subset() function to create a new data frame dfsub that contains only columns “x” and “z”.

Example 4: Using subset() Columns by Index

The subset() function has no built-in way to subset columns by index, but you can achieve the same result using standard subsetting with square brackets [].

df <- data.frame(x = 1:3, y = c("a", "b", "c"), z = c("A", "B", "C"))

# Subset by column index
dfsub <- df[, c(1, 2)]
dfsub

Output

   x   y
1  1   a
2  2   b
3  3   c 

In the above code example, the data frame df with three rows and three columns.

Then, we used a standard subsetting with square brackets to create a new data frame dfsub that contains only columns 1 and 2.

That’s it.

Leave a Comment