How to Remove Duplicate Rows in R

Here are three ways to remove duplicate rows in R:

  1. Using !duplicated()
  2. Using unique()
  3. Using dplyr package’s distinct()

Method 1: Using !duplicated()

To get only the unique rows, you can use the logical negation ! in conjunction with duplicated().

Figure of using !duplicated() method to remove duplicate rows from data frame

df <- data.frame(
  name = c("Krunal", "Ankit", "Rushabh", "Dhaval", "Tejas", "Rushabh", "Dhaval", "Tejas"),
  score = c(85, 90, 78, 92, 88, 78, 92, 88),
  subject = c("Math", "Math", "History", "History", "Math", "History", "History", "Math"),
  grade = c("10th", "11th", "11th", "10th", "10th", "11th", "10th", "10th")
)

df_unique <- df[!duplicated(df), ]

print(df_unique)

Output

Output of using !duplicated() method

One thing to note is that this approach will keep the first occurrence of the duplicate row and remove subsequent duplicates.

If you want to remove all occurrences of duplicate rows, you can use the below code:

df_unique_all <- df[!(duplicated(df) | duplicated(df, fromLast = TRUE)), ]

Method 2: Using unique()

Figure of using a unique() method to extract unique rows from data frame in R

df <- data.frame(
  name = c("Krunal", "Ankit", "Rushabh", "Dhaval", "Tejas", "Rushabh", "Dhaval", "Tejas"),
  score = c(85, 90, 78, 92, 88, 78, 92, 88),
  subject = c("Math", "Math", "History", "History", "Math", "History", "History", "Math"),
  grade = c("10th", "11th", "11th", "10th", "10th", "11th", "10th", "10th")
)

df_unique <- unique(df)

print(df_unique)

Output

Output of using unique() method to get unique rows from data frame

Method 3: Using the dplyr package’s distinct() function

The distinct() is a function of the dplyr package that can keep unique/distinct rows from the data frame. If there are duplicate rows, only the first row is preserved.

Visual Representation of using the dplyr package's distinct() method

library(dplyr)

df <- data.frame(
  name = c("Krunal", "Ankit", "Rushabh", "Dhaval", "Tejas", "Rushabh", "Dhaval", "Tejas"),
  score = c(85, 90, 78, 92, 88, 78, 92, 88),
  subject = c("Math", "Math", "History", "History", "Math", "History", "History", "Math"),
  grade = c("10th", "11th", "11th", "10th", "10th", "11th", "10th", "10th")
)

df_unique <- df %>% distinct()

print(df_unique)

Output

Output of using the dplyr package's distinct() method

Use the following code to remove duplicate rows based on a single column(variable).

df %>% distinct(subject, .keep_all = TRUE)

To remove duplicate rows based on multiple columns (variables), use the following code.

df %>% distinct(subject, name, .keep_all = TRUE)

It will return the unique rows based on the subject and name columns.

Leave a Comment