How to Remove Single or Multiple Rows from Data Frame in R

For meaningful and accurate analysis, you may want to remove rows with too many missing or incorrect values from the data frame or dataset to improve the quality of your data.

The best and most efficient way to remove rows from a data frame is using “negative indexing”. It is a base R approach that does not require any packages.

However, there are different approaches you should be aware of, depending on the situation you are encountering.

Here are five ways:

  1. Using negative indexing (For single or multiple rows)
  2. Removing rows by name
  3. Using subset() function
  4. Using dplyr package
  5. Using the na.omit() function (For removing rows with NA values)

Method 1: Using negative indexing

Negative sign (-) means exclusion. If you have a data frame df and you want to remove the first row, you just need to write df[-1, ].

For multiple rows, use df[-c(1,3,5), ], which will remove rows 1, 3, and 5. It is a removal by row number. The c() function combines the indices, and the negative sign excludes them.

It is the process of accessing a data frame without some rows specified by the negative index. This is also called row indexing.

Syntax

df[-c(row_index_1, row_index_2),]

Removing a single row (By row number)

Visual representation of removing a single row from data frame in R

df <- data.frame(
   Shares = c("TCS", "Reliance", "HDFC Bank", "HUL", "KPIT"),
   Price = c(3200, 1900, 1500, 2200, 1400)
)

df_remain <- df[-c(3), ]

df_remain

Output

  Shares    Price
1  TCS      3200
2  Reliance 1900
4  HUL      2200
5  KPIT     1400

Removing multiple rows (By row numbers)

To remove the second and third rows, use -c(2, 3). The negative sign before c(2, 3) tells R to exclude those rows.

Visual representation of removing multiple rows from data frame in R

df <- data.frame(
   Shares = c("TCS", "Reliance", "HDFC Bank", "HUL", KPIT),
   Price = c(3200, 1900, 1500, 2200, 1400)
)

df_remain <- df[-c(2, 3),]

df_remain

Output

   Shares   Price
1   TCS     3200
4   HUL     2200
5   KPIT    1400

Using range

If you want to remove multiple rows, you can define them as a range. For example, -c(2:4) refers to the sequence of rows (rows 2, 3, and 4) that will be excluded from the data frame.

Visual representation of using range

df <- data.frame(
  Shares = c("TCS", "Reliance", "HDFC Bank", "HUL", "KPIT"),
  Price = c(3200, 1900, 1500, 2200, 1400)
)

modified_df <- df[-c(2:4), ]

modified_df

Output

  Shares   Price
1  TCS     3200
5  KPIT    1400

Pros

  1. Negative indexing provides a way to remove rows directly by their number.
  2. It is blazing fast and requires no dependencies.
  3. It is memory efficient because it does not create intermediate objects.

Cons

  1. If the order of the rows keeps changing, it will be hard to remove specific rows.
  2. This approach is only helpful when you know the exact row numbers.

Method 2: Removing rows by name

If you have a data frame that contains row names, you can remove the row by its name using the which() function. To get the specific row name, use the rownames() function.

Visual representation of Deleting rows by name

df <- data.frame(
  Shares = c("TCS", "Reliance", "HDFC Bank", "HUL", "KPIT"),
  Price = c(3200, 1900, 1500, 2200, 1400)
)

rownames(df) <- c("row1", "row2", "row3", "row4", "row5")

cat("After removing 4th row", "\n")

modified_df <- df[-which(rownames(df) == "row4"), ]
modified_df

Output

 After removing 4th row
      
      Shares      Price
row1   TCS        3200
row2   Reliance   1900
row3   HDFC Bank  1500
row5   KPIT       1400

Pros

  1. You can target rows by name, which is a good identifier.
  2. It makes a code self-documenting.

Cons

  1. While you name the rows, you must name them unique; otherwise, unexpected results will be returned.
  2. If you have numeric names, then it becomes useless.

Method 3: Using the subset() function

The subset() function is helpful when you have a specific logical condition. It removes only the rows that meet that condition.

To properly use the subset() function, you must provide a logical expression that evaluates FALSE for the rows you want to remove.

Visual representation of using the subset() function

df <- data.frame(
  Shares = c("TCS", "Reliance", "HDFC Bank", "HUL", "KPIT"),
  Price = c(3200, 1900, 1500, 2200, 1400)
)

df_after_removed <- subset(df, Price > 1900)

df_after_removed

Output

  Shares   Price
1  TCS     3200
4  HUL     2200

Pros

  1. It provides a clean syntax for logical conditions.
  2. You don’t need to use a $ sign.

Cons

  1. You cannot chain multiple operations.

Method 4: Using the dplyr package

If you already use the dplyr package in your program, then you can use the dplyr package’s filter() or slice() function.

Using dplyr::filter()

You can pass the condition using a logical expression to the filter() function, which will filter out the rows for you.

Visual representation of using dplyr package

library(dplyr)

df <- data.frame(
  Shares = c("TCS", "Reliance", "HDFC Bank", "HUL", "KPIT"),
  Price = c(3200, 1900, 1500, 2200, 1400)
)

filtered_df <- df %>% filter(Price > 1400)

filtered_df

Output

   Shares     Price
1   TCS       3200
2  Reliance   1900
3  HDFC Bank  1500
4  HUL        2200

Using dplyr::slice()

Pass the indices you want to remove in the slice() function, and it will remove it from the data frame.

library(dplyr)

df <- data.frame(
  Shares = c("TCS", "Reliance", "HDFC Bank", "HUL", "KPIT"),
  Price = c(3200, 1900, 1500, 2200, 1400)
)

# Remove rows using slice()
df_sliced <- df %>%
  slice(-c(1, 4, 5)) # Exclude rows 1, 4, ands 5

print(df_sliced)

Output

   Shares     Price
1  Reliance   1900
2  HDFC Bank  1500

Removing duplicate rows

Check out removing duplicate rows from the data frame article for more information.

Pros

  1. It provides an intuitive syntax for readability. 
  2. It works seamlessly with %>% pipes and other dplyr verbs.
  3. It is efficient for medium-to-large datasets.

Cons

  1. It requires an external dependency called the “dplyr” package.
  2. It can be an overkill for a very small dataset.

Method 5: Using the na.omit() function

If you want to quickly remove rows with NA values, use the built-in na.omit() function. For more targeted NA removal, I recommend using the complete.cases() or tidyr::drop_na() functions.

Visual representation of Using the na.omit() function

df <- data.frame(
  Shares = c("TCS", "Reliance", "HDFC Bank", NA, "KPIT"),
  Price = c(3200, 1900, 1500, NA, 1400)
)

df_na_removed <- na.omit(df)

df_na_removed

Output

   Shares     Price
1   TCS       3200
2  Reliance   1900
3  HDFC Bank  1500
5  KPIT       1400

Pros

  1. It is a one-liner.
  2. Requires no external dependency.

Cons

  1. You cannot target specific columns using this approach.

That’s it!

Leave a Comment