split in R: How to Split Vector and Data Frame in R

R language has various data types, and the most common data type is Vector. However, merging and splitting is a common operation in any programming language, and today, we will see how to split vector and data frames into various groups in R.

split in R

The split() is a built-in R function that divides the Vector or data frame into the groups defined by the function. It accepts the vector or data frame as an argument and returns the data into groups.

The unsplit() function in R does the reverse of the split() function. The value returned from the split() function is a list of vectors containing the groups’ values.

Syntax

split(x, f, drop = FALSE, ...)
split(x, f, drop = FALSE, sep = ".", lex.order = FALSE, ...)

Parameters

The is a vector or data frame to be divided into groups.

The is a ‘factor’ because as.factor(f) defines the grouping or a list of such factors, and their interaction is used for the grouping.

The drop is a logical argument suggesting if the levels that do not occur should be dropped.

The sep is a separator, a character string, passed to the interaction where f is a list.

The lex.order is a logical argument that passed to interaction when f is a list.

Example

Suppose you have a named vector, where the name of each element corresponds to the group to which the element belongs.

Hence, you can split a vector into two vectors where items are of the same group, passing the names of the vector with the names function to argument f.

Let’s define a named vector using the c() function.

rv <- c(x = 3, y = 5, x = 1, x = 4, y = 3)
rv

Output

x y x x y
3 5 1 4 3

To divide into groups, use the split() function. We will divide the data into the x and y groups.

rv <- c(x = 3, y = 5, x = 1, x = 4, y = 3)
rv
data <- split(rv, f = names(rv))
data

Output

x y x x y
3 5 1 4 3
$x
x x x
3 1 4

$y
y y
5 3

You can see that our vector is divided by its groups defined by the names.

You can also pass a character vector as a parameter to f to indicate the corresponding groups of each element or directly a factor object.

rv <- c("Mando1", "Mando2", "Mando1", "Mando1", "Mando2")
rv
data <- split(rv, f = factor(rv))
data

Output

[1] "Mando1" "Mando2" "Mando1" "Mando1" "Mando2"
$Mando1
[1] "Mando1" "Mando1" "Mando1"

$Mando2
[1] "Mando2" "Mando2"

Split data in Multiple groups in R

To split the data into multiple groups, use the input of the argument f as a list.

rv <- c(x = 3, y = 5, x = 1, x = 4, y = 3)

rv1 <- c("Mando1", "Mando2", "Mando1", "Mando1", "Mando2")
rv1

rv2 <- c("DarkTrooper1", "DarkTrooper2", "DarkTrooper2", "DarkTrooper1", "DarkTrooper1")
rv2

data <- split(rv, f = list(rv1, rv2))
data

Output

[1] "Mando1" "Mando2" "Mando1" "Mando1" "Mando2"
[1] "DarkTrooper1" "DarkTrooper2" "DarkTrooper2" "DarkTrooper1" "DarkTrooper1"
$Mando1.DarkTrooper1
x x
3 4

$Mando2.DarkTrooper1
y
3

$Mando1.DarkTrooper2
x
1

$Mando2.DarkTrooper2
y
5

You can see that by default, the group interactions are separated with a dot and that the output contains all possible groups even when there are no observations in some of them.

However, you can customize that with the sep and drop arguments, respectively. See the following code.

rv <- c(x = 3, y = 5, x = 1, x = 4, y = 3)

rv1 <- c("Mando1", "Mando2", "Mando1", "Mando1", "Mando2")

rv2 <- c("DarkTrooper1", "DarkTrooper2", "DarkTrooper2", "DarkTrooper1", "DarkTrooper1")

data <- split(rv, f = list(rv1, rv2), drop = TRUE, sep = ": ")
data

Output

$`Mando1: DarkTrooper1`
x x
3 4

$`Mando2: DarkTrooper1`
y
3

$`Mando1: DarkTrooper2`
x
1

$`Mando2: DarkTrooper2`
y
5

Splitting the data frame in R

To split the data frame in R, use the split() function. You can split a data set into subsets based on one or more variables representing groups of the data. R-lang comes with some inbuilt data sets, which we will use in this example.

Let’s use the R inbuilt dataset called ToothGrowth.

data("ToothGrowth")

head(ToothGrowth)

Output

   len  supp dose
1  4.2  VC   0.5
2 11.5  VC   0.5
3 7.3   VC   0.5
4 5.8   VC   0.5
5 6.4   VC   0.5
6 10.0  VC   0.5

The head() function returns the first six rows of the dataset.

You can use the split() function to split the data frame into groups based on the len variable.

data("ToothGrowth")

df <- head(ToothGrowth)

data <- split(df, f = df$len)
data

Output

$`4.2`
   len supp dose
1  4.2  VC  0.5

$`5.8`
   len supp dose
4  5.8  VC  0.5

$`6.4`
   len supp dose
5  6.4 VC 0.5

$`7.3`
   len supp dose
3  7.3  VC 0.5

$`10`
   len supp dose
6  10   VC  0.5

$`11.5`
   len supp dose
2  11.5 VC 0.5

You can see from the output that we have divided the dataset into subsets that meet different combinations of groups simultaneously. As an example, you can create a split of the sample data frame with len columns. This will create four subsets with all possible combinations of the groups.

If you want to divide a data frame based on more columns or groups, then pass the list as a value to the f. For example, see the following code snippet.

split(df, f = list(df$len, df$dose))

To recover the original data frame from split() function, use the unsplit() function. The syntax for unsplit() function is the following.

unsplit(df, f = df$len)

Conclusion

To split the vector or data frame in R, use the split() function. To recover the split vector or data frame, use the unsplit() method.

See also

How to add column in R data frame

How to add vectors in R

R append to list

Leave a Comment