Subsetting means selecting the data from either a one-dimensional object like LETTERS or a two-dimensional object like DataFrame. You can select the data based on simple indexing, slicing and apply logical operators to get specific values.
What is Subsetting in R
Subsetting in R is a robust indexing feature to access object elements. Subsetting is used to filter the variables in R. Mastery of subsetting lets you express the complex operations in a way that it is hard to find that in other languages. Let’s print all the capital letters of the alphabet using LETTERS.
It will give us the following output.
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Selecting Variables in R
To select any character in the alphabet, you need to pass the index to the variable in box brackets.
It will give the 11th character of the alphabet, which is K.
To pull the sets of elements, type the following code.
It will select the 11th letter up to the 12th letter, which gives us K L.
You can also do this in another way. See the following code.
cat(LETTERS[c(11, 4, 12)])
K D L
Let’s see another example of subsetting.
cat(LETTERS[c(11, 4, 12:16)])
K D L M N O P
Excluding selections in R
To exclude the selection in R, use the negative sign (-) and pass the indexes that you don’t want to include in your output.
cat(LETTERS[-c(11, 4, 12:16)])
A B C E F G H I J Q R S T U V W X Y Z
You can see that the output does not contain
K D L M N O P
Another way to exclude selection is the following.
cat(LETTERS[c(-11, -4, -12:-16)])
It will give us the same result as above, but in this case, we are passing negative indices.
rep() function in R
The rep() is an inbuilt generic function in R that replicates the values in x. Two cases are faster-simplified versions: rep.int and rep_len.
rep(x, …) rep.int(x, times) rep_len(x, length.out)
cat(rep(c(19, 21), 5))
In this example, we want to repeat 19 and 21 five times. So it gives the following output.
19 21 19 21 19 21 19 21 19 21
To get every other letter in the alphabet using the rep() function, write the below code.
cat(LETTERS[rep(c(TRUE, FALSE), 13)])
A C E G I K M O Q S U W Y
Here, I am applying A to TRUE, that is why it is included in the output, then applying B to FALSE, that is why it does not select. So I am applying alternate letters to TRUE and FALSE and repeat that 13 times. The alphabet contains 26 letters, so 13 will get TRUE, and it will be printed here.
You can select the observations using the rep() function. If the selection is complex, you need to apply the proper condition to get the desired output.
Subset two dimensions of data in R
Two-dimensional data can also be subsetted. To create two-dimensional data, use DataFrame in R. Let’s create a quick DataFrame and then apply the selection.
df <- data.frame(LETTERS, letters, position = 1:length(letters)) print(df)
LETTERS letters position 1 A a 1 2 B b 2 3 C c 3 4 D d 4 5 E e 5 6 F f 6 7 G g 7 8 H h 8 9 I i 9 10 J j 10 11 K k 11 12 L l 12 13 M m 13 14 N n 14 15 O o 15 16 P p 16 17 Q q 17 18 R r 18 19 S s 19 20 T t 20 21 U u 21 22 V v 22 23 W w 23 24 X x 24 25 Y y 25 26 Z z 26
Woohoo! We have created a dataframe in R, which contains three columns.
Alphabet has 26 letters; that is why the DataFrame contains 26 rows.
Now, let’s select the 3rd row of DataFrame based on subsetting.
df <- data.frame(LETTERS, letters, position = 1:length(letters)) print(df[3, ])
LETTERS letters position 3 C c 3
You can see that it subsets the third row of the DataFrame.
To select a column in R DataFrame, for example, to get the third column, leave the first argument of the bracket empty and then pass the column number you want to get.
df <- data.frame(LETTERS, letters, position = 1:length(letters)) cat(df[, 3])
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
You can see that it gives us the values of the position column, which is the 3rd column of the DataFrame.
Select ranges in R
To select ranges in R DataFrame, use the colon(:) operator.
df <- data.frame(LETTERS, letters, position = 1:length(letters)) cat(df[2:4, 2])
b c d
Here, we are selecting rows from 2 to 4, which are b, c, and d, and column 2, which is small letters. So the output is b c d.
Subset rows based on logical conditions in R
To select DataFrame rows based on the logical conditions, select the row based on the logical condition, and then apply the specific column name to get the exact value.
In other words, apply the condition that will fetch the specific row, and then filter out the value from the row bypassing the column name as a second parameter.
df <- data.frame(LETTERS, letters, position = 1:length(letters)) cat(df[LETTERS == "K", "letters"])
In this example, we want to fetch the row, which has LETTER K, and then you select the specific value bypassing the column name, which is letters. So it will return the small letter k.
We can apply ‘or (|)‘ and ‘and (&)‘ logical operators to subset the data from DataFrame.
df <- data.frame(LETTERS, letters, position = 1:length(letters)) cat(df[LETTERS == "K" | LETTERS == "B", "letters"])
In this example, we select both b and k observations from the DataFrame using the | (OR) operator. We can do the same for the & (AND) operator.
Subsetting the data means selecting the data from 1D, 2D, or 3D datasets. R provides different ways to select data from different datasets. We have seen how to select the data based on an index, repetitive way, and applying logical operators.
That is it for Subsetting in R.
Krunal Lathiya is an Information Technology Engineer by education and web developer by profession. He has worked with many back-end platforms, including Node.js, PHP, and Python. In addition, Krunal has excellent knowledge of Data Science and Machine Learning, and he is an expert in R Language. Krunal has written many programming blogs, which showcases his vast expertise in this field.