What is the duplicated() Function in R

The duplicated() function in R is “used to check which elements of a vector or data frame are duplicates and returns a logical vector suggesting which elements (rows) are duplicates”.

Syntax

duplicated(dataframe)

Parameters

dataframe: It is a data frame.

Return value

The duplicated() method returns the logical vector of the same length as the input data if it is a vector.

Example 1: Apply duplicated() Function to Vector Object

The duplicated() function returns the plain vector of logical values after applying on a vector object.

data <- c(11, 19, 11, 19, 46, 21)
duplicated(data)

Output

[1] FALSE FALSE TRUE TRUE FALSE FALSE

You can use the duplicated() function to find the duplicate elements in Vector.

data <- c(11, 19, 11, 19, 46, 21)
data[duplicated(data)]

Output

[1] 11 19

We are using the indexing of the vector and duplicated() function to extract the duplicate data from the Vector.

Example 1.1 Use duplicated() with ! operator to remove duplicate elements from a vector

To remove duplicated elements, use the “! duplicated()”, where ! is logical negation. The ! is logical negation. !duplicated() means that we don’t want duplicate rows.

data <- c(11, 19, 11, 19, 46, 21)
data[!duplicated(data)]

Output

[1] 11 19 46 21

Here, we are extracting the unique values from the vector.

Example 2: Apply duplicated() Function to Data Frame

The “duplicated()” function returns the rows duplicated in the form of boolean values.

df <- data.frame(
  Shares = c("TCS", "Reliance", "TCS", "HUL", "Reliance"),
  Price = c(3200, 1900, 3200, 2200, 1900)
)

duplicated(df)

Output

[1] FALSE FALSE TRUE FALSE TRUE

Example 2.1: Use duplicated() with ! operator to remove duplicate elements from a data frame

You can remove duplicate rows from the data frame using the “!duplicated()” expression.

df <- data.frame(Shares = c("TCS", "Reliance", "HDFC Bank", "HUL", "Reliance"),
                 Price = c(3200, 1900, 1500, 2200, 1900))

df

cat("After Removing Duplicates", "\n")
df[!duplicated(df$Price),]

Output

    Shares    Price
1    TCS      3200
2  Reliance   1900
3  HDFC Bank  1500
4  HUL        2200
5  Reliance   1900

After Removing Duplicates

Shares        Price
1 TCS         3200
2 Reliance    1900
3 HDFC Bank   1500
4 HUL         2200

It completely removes the row from the data frame having duplicate values.

Leave a Comment