R Data Frame: How to Create, Read, Modify, and Delete Data Frame

Data Frame in R is a “table or two-dimensional array-like structure in which a row contains a set of values, and each column holds values of one variable”. In Data Frame, each element forms the column, and the contents of the component form the rows.

In short, a data frame is a data structure that describes cases with several observations(rows) or measurements (columns). Rows and Columns form a tabular data structure.

How to Create Data Frame in R

To create a data frame in R, you can use the “data.frame()” function. The function creates data frames, tightly coupled collections of variables that share many of the properties of matrices and lists, used as the fundamental data structure.

streaming <- data.frame(
  service_id = c(1:5),
  service_name = c("Netflix", "Disney+", "HBOMAX", "Hulu", "Peacock"),
  service_price = c(18, 10, 15, 7, 12),
  stringsAsFactors = FALSE
)
# Print the data frame. 
print(streaming)

Output

   service_id   service_name   service_price
1    1            Netflix           18
2    2            Disney+           10
3    3            HBOMAX            15
4    4            Hulu               7
5    5            Peacock           12

Get the Structure of the Data Frame

To get the structure of the data frame in R, you can use the str() function.

streaming <- data.frame(
 service_id = c(1:5),
 service_name = c("Netflix", "Disney+", "HBOMAX", "Hulu", "Peacock"),
 service_price = c(18, 10, 15, 7, 12),
 stringsAsFactors = FALSE
)

# Print the data frame.
print(str(streaming))

Output

'data.frame': 5 obs. of 3 variables:
$ service_id : int 1 2 3 4 5
$ service_name : chr "Netflix" "Disney+" "HBOMAX" "Hulu" ...
$ service_price: num 18 10 15 7 12
NULL

Summary of Data in Data Frame

To get the statistical summary and nature of data in the data frame, use the summary() function.

streaming <- data.frame(
  service_id = c(1:5),
  service_name = c("Netflix", "Disney+", "HBOMAX", "Hulu", "Peacock"),
  service_price = c(18, 10, 15, 7, 12),
  stringsAsFactors = FALSE
)

print(summary(streaming))

Output

Summary of R Data Frame

How to access Components of a Data Frame

To access components of the Data Frame, use either [, [[ or $ operator to access columns of the data frame. Components of the data frame can be accessed like a list or matrix.

streaming <- data.frame(
  service_id = c(1:5),
  service_name = c("Netflix", "Disney+", "HBOMAX", "Hulu", "Peacock"),
  service_price = c(18, 10, 15, 7, 12),
  stringsAsFactors = FALSE
)

streaming["service_name"]
streaming$service_price
streaming[["service_name"]]

Output

   service_name
1  Netflix
2  Disney+
3  HBOMAX
4  Hulu
5  Peacock
[1] 18 10 15 7 12
[1] "Netflix" "Disney+" "HBOMAX" "Hulu" "Peacock"

Accessing data frames like a matrix

You can access the data frame like Matrix by providing an index for row and column.

To demonstrate this, we use datasets already available in R. Datasets that are available can be listed with the command library(help = “datasets”). We will use the women dataset.

You can examine the data set using functions like str() and head().

str(women)

Output

'data.frame': 15 obs. of 2 variables:
$ height: num 58 59 60 61 62 63 64 65 66 67 ...
$ weight: num 115 117 120 123 126 129 132 135 139 142 ...

We can see the first three rows of the women dataset using the head() function.

head(women, n=3)

Output

   height  weight
1   58      115
2   59      117
3   60      120

Now we will access the data frame like a matrix.

Let’s select only the 3rd and 4th rows.

women[2:3,]

Output

    height  weight
2     59     117
3     60     120

Let’s select the rows with heights greater than 70.

women[women$height > 70,]

Output

    height weight
14    71    159
15    72    164

Let’s see another example.

women[10:14, 2]

Output

[1] 142 146 150 154 159

In this case, the returned type is a vector since we extracted data from a single column. This behavior can be avoided by passing the argument drop=FALSE as follows.

women[10:14, 2, drop = FALSE]

Output

    weight
10   142
11   146
12   150
13   154
14   159

How to add a row in the R Data Frame

To add rows in the data frame in R, use the rbind() function.

streaming <- data.frame(service_id = c(1:5),
service_name = c("Netflix", "Disney+", "HBOMAX", "Hulu", "Peacock"),
service_price = c(18, 10, 15, 7, 12),
stringsAsFactors = FALSE)
streaming

cat("After adding a row", "\n")
rbind(streaming, list(6, "Quibi", 5))

The rbind() function takes a data frame and the row you must pass as R List. If you run the output,

    service_id   service_name   service_price
1      1           Netflix           18
2      2           Disney+           10
3      3           HBOMAX            15
4      4           Hulu               7
5      5           Peacock           12
After adding a row
    service_id   service_name    service_price
1      1            Netflix          18
2      2            Disney+          10
3      3            HBOMAX           15
4      4            Hulu              7
5      5            Peacock          12
6      6            Quibi             5

How to add a column in the R Data Frame

To add a column in R Data Frame, use the cbind() function.

streaming <- data.frame(service_id = c(1:5),
service_name = c("Netflix", "Disney+", "HBOMAX", "Hulu", "Peacock"),
service_price = c(18, 10, 15, 7, 12),
stringsAsFactors = FALSE)
streaming

cat("After adding a column", "\n")
cbind(streaming, service_show=c("Stranger Things", "The Mandalorian", 
                 "Friends", "Castle Rock", "The Office"))

Output

    service_id   service_name   service_price
1       1          Netflix           18
2       2          Disney+           10
3       3          HBOMAX            15
4       4          Hulu               7
5       5          Peacock           12
After adding a column
    service_id   service_name   service_price   service_show
1       1          Netflix           18         Stranger Things
2       2          Disney+           10         Mandalorian
3       3          HBOMAX            15         Friends
4       4          Hulu               7         Castle Rock
5       5          Peacock           12         The Office

How to Delete Column in R DataFrame

To remove a column in the R data frame, assign NULL to that column.

streaming <- data.frame(service_id = c(1:5),
service_name = c("Netflix", "Disney+", "HBOMAX", "Hulu", "Peacock"),
service_price = c(18, 10, 15, 7, 12),
stringsAsFactors = FALSE)
streaming

cat("After removing a service_price column", "\n")
streaming$service_price <- NULL
streaming

Output

     service_id   service_name   service_price
1         1           Netflix          18
2         2           Disney+          10
3         3           HBOMAX           15
4         4           Hulu              7
5         5           Peacock          12
After removing a service_price column
      service_id   service_name
1         1           Netflix
2         2           Disney+
3         3           HBOMAX
4         4           Hulu
5         5           Peacock

How to remove a row in R DataFrame

To remove a row from a data frame, assign NULL to that row.

streaming <- data.frame(service_id = c(1:5),
service_name = c("Netflix", "Disney+", "HBOMAX", "Hulu", "Peacock"),
service_price = c(18, 10, 15, 7, 12),
stringsAsFactors = FALSE)
streaming

cat("After removing a service_price column", "\n")
streaming <- streaming[-1, ]
streaming

Output

    service_id   service_name   service_price
1       1         Netflix           18
2       2         Disney+           10
3       3         HBOMAX            15
4       4         Hulu               7
5       5         Peacock           12
After removing a service_price column
    service_id   service_name   service_price
2       2         Disney+            10
3       3         HBOMAX             15
4       4         Hulu                7
5       5         Peacock            12

That is it.

Leave a Comment