R Basic

How to Remove the Last Row or N Rows from DataFrame in R

In a real-life dataset, the last row may contain metadata, summaries, footnotes, or unwanted rows that are not needed for data analysis. That’s why we need to clean the data. Data cleaning requires removing unnecessary rows from the data frame for data accuracy and integrity.

We have already seen how to remove the first row and its use cases.

Here are three major ways to remove the last row from the data frame in R:

  1. Using nrow() with negative indexing
  2. Using head()
  3. Using dplyr::slice()

Method 1: Using nrow() with negative indexing

To determine the total number of rows, we can use the built-in nrow() function, and using negative indexing, we can exclude the last row from the data frame. If turn our logic into code, it looks like this: df[-nrow(df), ].

df <- data.frame(
  name = c("Krunal", "Ankit", "Rushabh", "Dhaval", "Niva", "Mansi"),
  score = c(85, 90, 78, 92, 92, 88),
  subject = c("Math", "Math", "History", "History", "Biology", "Science"),
  grade = c("10th", "12th", "11th", "10th", "12th", "10th")
)

df # Printing df before removing the last row

df <- df[-nrow(df), ]

df 
# Printing df after removing the last row

Output

The nrow() returns a total number of rows, which is 6 in our case. Then, we subtract -1 from 6, which is 5. So, we subset the first five rows. The above figure shows that row number 6 has been removed.

Removing the last N rows

To remove the last N rows, we have to modify our previous logic because N can be any number of rows.

Let’s say N = 3, which means we need to remove the last three rows. How are we going to do that?

First, we can use the nrow() function to get the total number of rows and then create a sequence from the (N-th last row) to the last row. Use negative indexing (-) before this sequence excludes those rows.

df <- data.frame(
  name = c("Krunal", "Ankit", "Rushabh", "Dhaval", "Niva", "Mansi"),
  score = c(85, 90, 78, 92, 92, 88),
  subject = c("Math", "Math", "History", "History", "Biology", "Science"),
  grade = c("10th", "12th", "11th", "10th", "12th", "10th")
)

df # Before removing the last N rows

# Specify number of rows to remove
N <- 3

# Remove the last N rows
df <- df[-((nrow(df) - N + 1):nrow(df)), ]

df 
# After removing the last N = 3 rows

Output

Data Frame with only one row

If we have a data frame that contains only a single row, then after removing the last row (which means that a single row), it will return an empty data frame without giving any errors.

df <- data.frame(
  name = c("Krunal"),
  score = c(85),
  subject = c("Math"),
  grade = c("10th")
)

df # Printing before moving the last (only) row

df <- df[-nrow(df), ]

df 
#Printing after moving the last (only) row

Output

Empty data frame

If you provide an empty data frame, it will still return an empty data frame without any error.

df <- data.frame(
  name = c(),
  score = c(),
  subject = c(),
  grade = c()
)

df # data frame with 0 columns and 0 rows

df <- df[-nrow(df), ]

df 
# data frame with 0 columns and 0 rows

Output

Safe Approach

The above approaches are still unsafe even if they don’t throw any errors. We can enhance our approach by checking if our data frame minimum contains 2 rows to remove the last row. That way, at least one row will be there in the final data frame to avoid any future potential issues.

df_with_one_row <- data.frame(
  name = c("Krunal"),
  score = c(85),
  subject = c("Math"),
  grade = c("10th")
)

df_empty <- data.frame(
  name = c(),
  score = c(),
  subject = c(),
  grade = c()
)
# Function to safely remove the last row
remove_last_row <- function(df) {
  if (nrow(df) > 1) {
    df <- df[-nrow(df), ]
  } else {
    message("DataFrame is empty or has only one row. Returning original DataFrame.")
  }
  return(df)
}

# Test the function
df_empty <- remove_last_row(df_empty)
df_one_row <- remove_last_row(df_with_one_row)

# Output: DataFrame is empty or has only one row. Returning original DataFrame.
# Output: DataFrame is empty or has only one row. Returning original DataFrame.

Method 2: Using head()

By default, the head() function returns the first six rows unless you pass the n parameter, which tells you the number of rows. If you pass n=-1, it will exclude the last row and return the remaining rows.

df <- data.frame(
  name = c("Krunal", "Ankit", "Rushabh", "Dhaval", "Niva", "Mansi"),
  score = c(85, 90, 78, 92, 92, 88),
  subject = c("Math", "Math", "History", "History", "Biology", "Science"),
  grade = c("10th", "12th", "11th", "10th", "12th", "10th")
)

df # Print the data frame

head(df, -1) 
# Print the data frame without the last row

Output

Removing the last N rows

For example, if I want to remove the last three rows, I will pass -3 as the second argument. Like this: head(df, -3).

df <- data.frame(
  name = c("Krunal", "Ankit", "Rushabh", "Dhaval", "Niva", "Mansi"),
  score = c(85, 90, 78, 92, 92, 88),
  subject = c("Math", "Math", "History", "History", "Biology", "Science"),
  grade = c("10th", "12th", "11th", "10th", "12th", "10th")
)

df # Print the data frame


df <- head(df, -3) # Minus will exclude the rows. Removing last 3 rows.

df 
# Print the updated data frame

Output

If you have a data frame with only one row and you try to remove that row, it will return an empty data frame without giving you an error.

If you operate head() on an empty data frame, it will still return an empty data frame.

Method 3: Using dplyr::slice()

The dplyr package provides a slice() method for selecting rows based on position. It allows positive values (to include specific rows) or negative values (to exclude specific rows).

Install and load the dplyr package.

library(dplyr)

df <- data.frame(
  name = c("Krunal", "Ankit", "Rushabh", "Dhaval", "Niva", "Mansi"),
  score = c(85, 90, 78, 92, 92, 88),
  subject = c("Math", "Math", "History", "History", "Biology", "Science"),
  grade = c("10th", "12th", "11th", "10th", "12th", "10th")
)

df 
# Print the data frame


df %>% slice(-nrow(.))

Output

Removing the last N rows

To remove the last N rows using the dplyr package, use the nrow(.) dynamically calculates the number of rows. Then, we will create a sequence from the (N-th last row) to the last row and exclude them using negative indexing. Finally, pass everything to the slice() method.

# Load dplyr package
library(dplyr)

# Sample DataFrame
df <- data.frame(
  name = c("Krunal", "Ankit", "Rushabh", "Dhaval", "Niva", "Mansi"),
  score = c(85, 90, 78, 92, 92, 88),
  subject = c("Math", "Math", "History", "History", "Biology", "Science"),
  grade = c("10th", "12th", "11th", "10th", "12th", "10th")
)

# Print the original DataFrame
df

# Specify the number of rows to remove
N <- 3

# Remove the last N rows
df <- df %>% slice(-((nrow(.) - N + 1):nrow(.)))

# Print the updated DataFrame
df

Output

If your data frame is empty or has only a single row and you try to remove the last row, it will return an empty data frame without giving you an error.

That’s it!

Recent Posts

How to Set and Get Working Directory [setwd() and getwd()] in R

Set the current working directory The setwd() function sets the working directory to the new…

2 days ago

Standard deviation in R [Using sd() Function]

The sd() function in R calculates the sample standard deviation of a numeric vector or…

3 days ago

R dnorm(): Probability Density Function

The dnorm() function in R calculates the value of the probability density function (pdf) of…

4 days ago

R rep() Function: Repeating Elements of a Vector

R rep() is a generic function that replicates elements of vectors and lists for a…

1 week ago

Splitting Strings: A Beginner’s Guide to strsplit() in R

The strsplit() function in R splits elements of a character vector into a list of…

1 week ago

Understanding of rnorm() Function in R

The rnorm() method in R generates random numbers from a normal (Gaussian) distribution, which is…

2 weeks ago