# How to Remove Duplicate Rows in R

Here are three ways to remove duplicate rows in R:

1. Using !duplicated()
2. Using unique()
3. Using dplyr package’s distinct()

## Method 1: Using !duplicated()

To get only the unique rows, you can use the logical negation `!` in conjunction with `duplicated()`.

``````df <- data.frame(
name = c("Krunal", "Ankit", "Rushabh", "Dhaval", "Tejas", "Rushabh", "Dhaval", "Tejas"),
score = c(85, 90, 78, 92, 88, 78, 92, 88),
subject = c("Math", "Math", "History", "History", "Math", "History", "History", "Math"),
grade = c("10th", "11th", "11th", "10th", "10th", "11th", "10th", "10th")
)

df_unique <- df[!duplicated(df), ]

print(df_unique)
``````

Output

One thing to note is that this approach will keep the first occurrence of the duplicate row and remove subsequent duplicates.

If you want to remove all occurrences of duplicate rows, you can use the below code:

``df_unique_all <- df[!(duplicated(df) | duplicated(df, fromLast = TRUE)), ]``

## Method 2: Using unique()

``````df <- data.frame(
name = c("Krunal", "Ankit", "Rushabh", "Dhaval", "Tejas", "Rushabh", "Dhaval", "Tejas"),
score = c(85, 90, 78, 92, 88, 78, 92, 88),
subject = c("Math", "Math", "History", "History", "Math", "History", "History", "Math"),
grade = c("10th", "11th", "11th", "10th", "10th", "11th", "10th", "10th")
)

df_unique <- unique(df)

print(df_unique)
``````

Output

## Method 3: Using the dplyr package’s distinct() function

The distinct() is a function of the dplyr package that can keep unique/distinct rows from the data frame. If there are duplicate rows, only the first row is preserved.

``````library(dplyr)

df <- data.frame(
name = c("Krunal", "Ankit", "Rushabh", "Dhaval", "Tejas", "Rushabh", "Dhaval", "Tejas"),
score = c(85, 90, 78, 92, 88, 78, 92, 88),
subject = c("Math", "Math", "History", "History", "Math", "History", "History", "Math"),
grade = c("10th", "11th", "11th", "10th", "10th", "11th", "10th", "10th")
)

df_unique <- df %>% distinct()

print(df_unique)
``````

Output

Use the following code to remove duplicate rows based on a single column(variable).

``df %>% distinct(subject, .keep_all = TRUE)``

To remove duplicate rows based on multiple columns (variables), use the following code.

``df %>% distinct(subject, name, .keep_all = TRUE)``

It will return the unique rows based on the subject and name columns.