R language has various data types, and the most common data type is Vector. However, merging and splitting is a common operation in any programming language, and today, we will see how to split vector and data frames into various groups in R.
split in R
The unsplit() function in R does the reverse of the split() function. The value returned from the split() function is a list of vectors containing the groups’ values.
split(x, f, drop = FALSE, ...) split(x, f, drop = FALSE, sep = ".", lex.order = FALSE, ...)
The x is a vector or data frame to be divided into groups.
The f is a ‘factor’ because as.factor(f) defines the grouping or a list of such factors, and their interaction is used for the grouping.
The drop is a logical argument suggesting if the levels that do not occur should be dropped.
The sep is a separator, a character string, passed to the interaction where f is a list.
The lex.order is a logical argument that passed to interaction when f is a list.
Suppose you have a named vector, where the name of each element corresponds to the group to which the element belongs.
Hence, you can split a vector into two vectors where items are of the same group, passing the names of the vector with the names function to argument f.
Let’s define a named vector using the c() function.
rv <- c(x = 3, y = 5, x = 1, x = 4, y = 3) rv
x y x x y 3 5 1 4 3
To divide into groups, use the split() function. We will divide the data into the x and y groups.
rv <- c(x = 3, y = 5, x = 1, x = 4, y = 3) rv data <- split(rv, f = names(rv)) data
x y x x y 3 5 1 4 3 $x x x x 3 1 4 $y y y 5 3
You can see that our vector is divided by its groups defined by the names.
You can also pass a character vector as a parameter to f to indicate the corresponding groups of each element or directly a factor object.
rv <- c("Mando1", "Mando2", "Mando1", "Mando1", "Mando2") rv data <- split(rv, f = factor(rv)) data
 "Mando1" "Mando2" "Mando1" "Mando1" "Mando2" $Mando1  "Mando1" "Mando1" "Mando1" $Mando2  "Mando2" "Mando2"
Split data in Multiple groups in R
To split the data into multiple groups, use the input of the argument f as a list.
rv <- c(x = 3, y = 5, x = 1, x = 4, y = 3) rv1 <- c("Mando1", "Mando2", "Mando1", "Mando1", "Mando2") rv1 rv2 <- c("DarkTrooper1", "DarkTrooper2", "DarkTrooper2", "DarkTrooper1", "DarkTrooper1") rv2 data <- split(rv, f = list(rv1, rv2)) data
 "Mando1" "Mando2" "Mando1" "Mando1" "Mando2"  "DarkTrooper1" "DarkTrooper2" "DarkTrooper2" "DarkTrooper1" "DarkTrooper1" $Mando1.DarkTrooper1 x x 3 4 $Mando2.DarkTrooper1 y 3 $Mando1.DarkTrooper2 x 1 $Mando2.DarkTrooper2 y 5
You can see that by default, the group interactions are separated with a dot and that the output contains all possible groups even when there are no observations in some of them.
However, you can customize that with the sep and drop arguments, respectively. See the following code.
rv <- c(x = 3, y = 5, x = 1, x = 4, y = 3) rv1 <- c("Mando1", "Mando2", "Mando1", "Mando1", "Mando2") rv2 <- c("DarkTrooper1", "DarkTrooper2", "DarkTrooper2", "DarkTrooper1", "DarkTrooper1") data <- split(rv, f = list(rv1, rv2), drop = TRUE, sep = ": ") data
$`Mando1: DarkTrooper1` x x 3 4 $`Mando2: DarkTrooper1` y 3 $`Mando1: DarkTrooper2` x 1 $`Mando2: DarkTrooper2` y 5
Splitting the data frame in R
To split the data frame in R, use the split() function. You can split a data set into subsets based on one or more variables representing groups of the data. R-lang comes with some inbuilt data sets, which we will use in this example.
Let’s use the R inbuilt dataset called ToothGrowth.
len supp dose 1 4.2 VC 0.5 2 11.5 VC 0.5 3 7.3 VC 0.5 4 5.8 VC 0.5 5 6.4 VC 0.5 6 10.0 VC 0.5
The head() function returns the first six rows of the dataset.
You can use the split() function to split the data frame into groups based on the len variable.
data("ToothGrowth") df <- head(ToothGrowth) data <- split(df, f = df$len) data
$`4.2` len supp dose 1 4.2 VC 0.5 $`5.8` len supp dose 4 5.8 VC 0.5 $`6.4` len supp dose 5 6.4 VC 0.5 $`7.3` len supp dose 3 7.3 VC 0.5 $`10` len supp dose 6 10 VC 0.5 $`11.5` len supp dose 2 11.5 VC 0.5
You can see from the output that we have divided the dataset into subsets that meet different combinations of groups simultaneously. As an example, you can create a split of the sample data frame with len columns. This will create four subsets with all possible combinations of the groups.
If you want to divide a data frame based on more columns or groups, then pass the list as a value to the f. For example, see the following code snippet.
split(df, f = list(df$len, df$dose))
To recover the original data frame from split() function, use the unsplit() function. The syntax for unsplit() function is the following.
unsplit(df, f = df$len)
To split the vector or data frame in R, use the split() function. To recover the split vector or data frame, use the unsplit() method.
Krunal Lathiya is an Information Technology Engineer by education and web developer by profession. He has worked with many back-end platforms, including Node.js, PHP, and Python. In addition, Krunal has excellent knowledge of Data Science and Machine Learning, and he is an expert in R Language. Krunal has written many programming blogs, which showcases his vast expertise in this field.