R split() Function: Splitting a Data

The split() function divides the input data into groups based on some criteria, typically specified by one or more grouping factors.

The split() function always returns a list, with elements named after the levels of the factor and does not modify the original data.

In the above figure, we split the data frame by the subject column. That means we are dividing the data based on the subject values.

If there are two unique subjects, the data frame will be divided into two sub-data frames. If you have performed database operations, it is similar to how the GROUP BY clause works.

The output is a list with two groups.

df <- data.frame(
 name = c("Krunal", "Ankit", "Rushabh", "Dhaval", "Tejas"),
 score = c(85, 90, 78, 92, 88),
 subject = c("Math", "Math", "History", "History", "Math"),
 grade = c("10th", "11th", "11th", "10th", "10th")
)

# Split the data frame by subject
split_df <- split(df, df$subject)

# Print the split data frame
print(split_df)

# Output:
# $History
#    name   score   subject    grade
# 3 Rushabh  78    History     11th
# 4 Dhaval   92    History     10th

# $Math
#    name   score  subject  grade
# 1  Krunal 85      Math    10th
# 2  Ankit  90      Math    11th
# 5  Tejas  88      Math    10th

You can use the unsplit() function to restore the original data frame: unsplit(df, f = df$subject)

Syntax

split(data, factor, drop = FALSE, sep = ".", lex.order = FALSE)

Parameters

Argument	Description
data	It represents either a data frame or a vector that is divided into groups. For data frames, splitting occurs row-wise.
factor	It represents a factor or a list of factors defining groups.
drop (default: FALSE)	It is a logical argument, and if set to TRUE, empty levels in factors are dropped.
sep	It represents the Character string (default: “.”).
lex.order	If set to TRUE, group names are sorted in lexicographic order when factor is a list.

Splitting a vector

You can split a vector into two vectors where elements are of the same group, passing the names of the vector with the names function to the f argument.

In the above figure, we split the vector based on its name.

So, the list has two values $x and $y and each contains its respective values.

vec <- c(x = 3, y = 5, x = 1, x = 4)

vec

#  Output:
#  x  y  x  x 
#  3  5  1  4 


data <- split(vec, f = names(vec))

data

# Output:
# $x
#  x  x 
#  3  1 

# $y
#  y  y
#  5  4

Splitting a list

If you split a list, it will return multiple sub-lists based on the groupings.

main_list <- list(a = 1:2, b = 3:4)

split(main_list, c("g1", "g2"))

# Output:
# $g1
# $g1$a
# [1]  1  2

# $g2
# $g2$b
# [1]  3  4

Splitting a dataset into groups

You can also split the built-in dataset into multiple groups based on the specified column values.

data("ToothGrowth")

df <- head(ToothGrowth)

data <- split(df, f = df$len)

data

Output

$`4.2`
   len supp dose
1  4.2  VC  0.5

$`5.8`
   len supp dose
4  5.8  VC  0.5

$`6.4`
   len supp dose
5  6.4 VC 0.5

$`7.3`
   len supp dose
3  7.3  VC 0.5

$`10`
   len supp dose
6  10   VC  0.5

$`11.5`
   len supp dose
2  11.5 VC 0.5

Use drop = TRUE

Let’s say we have a vector with only two values, but the factor we defined for splitting has three values. That means one factor value will be unused.

By using drop = TRUE, we will drop that third factor value because of its uselessness.

f <- factor(c("A", "B"), levels = c("A", "B", "C"))

split(1:2, f, drop = TRUE)

# Output:
# $A
# [1] 1

# $B
# [1] 2

The above output suggests that level “C” is dropped and only two values are splitted, one with group $A and one with group $B.

Empty vector

If the vector is empty and the factor is also empty, the output list will be empty too, since there is nothing to divide.

# Splitting empty objects
input_empty <- numeric(0)

factor_empty <- factor(character(0))

empty_list <- split(input_empty, factor_empty)

print(empty_list)

# Output: named list()

That’s it.

Krunal Lathiya

Krunal Lathiya is a seasoned Computer Science expert with over eight years in the tech industry. He boasts deep knowledge in Data Science and Machine Learning. Versed in Python, JavaScript, PHP, R, and Golang. Skilled in frameworks like Angular and React and platforms such as Node.js. His expertise spans both front-end and back-end development. His proficiency in the Python language stands as a testament to his versatility and commitment to the craft.