R language has various data types, and the most common data type is Vector. However, merging and splitting is a common operation in any programming language, and today, we will see how to split vector and data frames into various groups in R.

**split in R**

**Vector** or data frame into the groups defined by the function. It accepts the vector or data frame as an argument and returns the data into groups.

The** unsplit()** function in R does the reverse of the split() function. The value returned from the** split() **function is a list of vectors containing the groups’ values.

**Syntax**

```
split(x, f, drop = FALSE, ...)
split(x, f, drop = FALSE, sep = ".", lex.order = FALSE, ...)
```

**Parameters**

The **x **is a vector or data frame to be divided into groups.

The **f **is a ‘**factor**’ because as.factor(f) defines the grouping or a list of such factors, and their interaction is used for the grouping.

The **drop **is a logical argument suggesting if the levels that do not occur should be dropped.

The **sep **is a separator, a character string, passed to the interaction where **f** is a list.

The **lex.order **is a logical argument that passed to interaction when **f** is a list.

**Example**

Suppose you have a named vector, where the name of each element corresponds to the group to which the element belongs.

Hence, you can split a vector into two vectors where items are of the same group, passing the names of the vector with the names function to argument f.

Let’s define a named vector using the c() function.

```
rv <- c(x = 3, y = 5, x = 1, x = 4, y = 3)
rv
```

**Output**

```
x y x x y
3 5 1 4 3
```

To divide into groups, use the split() function. We will divide the data into the x and y groups.

```
rv <- c(x = 3, y = 5, x = 1, x = 4, y = 3)
rv
data <- split(rv, f = names(rv))
data
```

**Output**

```
x y x x y
3 5 1 4 3
$x
x x x
3 1 4
$y
y y
5 3
```

You can see that our vector is divided by its groups defined by the names.

You can also pass a character vector as a parameter to f to indicate the corresponding groups of each element or directly a factor object.

```
rv <- c("Mando1", "Mando2", "Mando1", "Mando1", "Mando2")
rv
data <- split(rv, f = factor(rv))
data
```

**Output**

```
[1] "Mando1" "Mando2" "Mando1" "Mando1" "Mando2"
$Mando1
[1] "Mando1" "Mando1" "Mando1"
$Mando2
[1] "Mando2" "Mando2"
```

**Split data in Multiple groups in R**

To split the data into multiple groups, use the input of the argument **f** as a list.

```
rv <- c(x = 3, y = 5, x = 1, x = 4, y = 3)
rv1 <- c("Mando1", "Mando2", "Mando1", "Mando1", "Mando2")
rv1
rv2 <- c("DarkTrooper1", "DarkTrooper2", "DarkTrooper2", "DarkTrooper1", "DarkTrooper1")
rv2
data <- split(rv, f = list(rv1, rv2))
data
```

**Output**

```
[1] "Mando1" "Mando2" "Mando1" "Mando1" "Mando2"
[1] "DarkTrooper1" "DarkTrooper2" "DarkTrooper2" "DarkTrooper1" "DarkTrooper1"
$Mando1.DarkTrooper1
x x
3 4
$Mando2.DarkTrooper1
y
3
$Mando1.DarkTrooper2
x
1
$Mando2.DarkTrooper2
y
5
```

You can see that by default, the group interactions are separated with a dot and that the output contains all possible groups even when there are no observations in some of them.

However, you can customize that with the **sep** and **drop** arguments, respectively. See the following code.

```
rv <- c(x = 3, y = 5, x = 1, x = 4, y = 3)
rv1 <- c("Mando1", "Mando2", "Mando1", "Mando1", "Mando2")
rv2 <- c("DarkTrooper1", "DarkTrooper2", "DarkTrooper2", "DarkTrooper1", "DarkTrooper1")
data <- split(rv, f = list(rv1, rv2), drop = TRUE, sep = ": ")
data
```

**Output**

```
$`Mando1: DarkTrooper1`
x x
3 4
$`Mando2: DarkTrooper1`
y
3
$`Mando1: DarkTrooper2`
x
1
$`Mando2: DarkTrooper2`
y
5
```

**Splitting the data frame in R**

To split the data frame in R, use the split() function. You can split a data set into subsets based on one or more variables representing groups of the data. R-lang comes with some inbuilt data sets, which we will use in this example.

Let’s use the R inbuilt dataset called **ToothGrowth.**

```
data("ToothGrowth")
head(ToothGrowth)
```

**Output**

```
len supp dose
1 4.2 VC 0.5
2 11.5 VC 0.5
3 7.3 VC 0.5
4 5.8 VC 0.5
5 6.4 VC 0.5
6 10.0 VC 0.5
```

The head() function returns the first six rows of the dataset.

You can use the split() function to split the data frame into groups based on the **len** variable.

```
data("ToothGrowth")
df <- head(ToothGrowth)
data <- split(df, f = df$len)
data
```

**Output**

```
$`4.2`
len supp dose
1 4.2 VC 0.5
$`5.8`
len supp dose
4 5.8 VC 0.5
$`6.4`
len supp dose
5 6.4 VC 0.5
$`7.3`
len supp dose
3 7.3 VC 0.5
$`10`
len supp dose
6 10 VC 0.5
$`11.5`
len supp dose
2 11.5 VC 0.5
```

You can see from the output that we have divided the dataset into subsets that meet different combinations of groups simultaneously. As an example, you can create a split of the sample data frame with **len** columns. This will create four subsets with all possible combinations of the groups.

If you want to divide a data frame based on more columns or groups, then pass the **list **as a value to the f. For example, see the following code snippet.

`split(df, f = list(df$len, df$dose))`

To recover the original data frame from split() function, use the unsplit() function. The syntax for unsplit() function is the following.

`unsplit(df, f = df$len)`

**Conclusion**

To split the vector or data frame in R, use the **split() **function. To recover the split vector or data frame, use the **unsplit() **method.

