duplicated() Function in R with Example

When we are working with big data, we normally find an obstacle: Repeated Values. This type of value doesn’t serve a critical problem if we have the capacity to recognize them. Once we have a list of repeated values, it is very easy to remove, eliminate or simply extract them. R provides an inbuilt function that will do the job for us. Let’s see that function in detail.

duplicated() in R

The duplicated() is an inbuilt R function that determines which elements of a vector or data frame are duplicates of elements with smaller subscripts and returns a logical vector indicating which elements (rows) are duplicates.

Syntax

duplicated(data, incomparables = FALSE, fromLast = FALSE, nmax = NA, …)

Parameters

data: It is a vector or a data frame or an array or NULL.

incomparables: It is a vector of values that cannot be compared. FALSE is a special value, meaning that all values can be compared and maybe the only value accepted for methods other than the default. It will be coerced internally to the same type as data.

fromLast: It is the logical argument that indicates if duplication should be considered from the reverse side; for example, the last (or rightmost) of identical elements would correspond to duplicated = FALSE.

nmax: It is the maximum number of unique items expected (greater than one).

…: They are the arguments for specific methods.

MARGIN: It is an array margin to be held fixed: see apply, and note that MARGIN = 0 may be useful.

Return Value

The duplicated() method returns the logical vector of the same length as input data if it is a vector. For a data frame, a logical vector with one element for each row. For a matrix or array, and when MARGIN = 0, a logical array with the same dimensions and dimnames.

The Missing values (“NA“) are regarded as equal, numeric, and complex ones differing from NaN; character strings will be compared in a “common encoding”;

Values in incomparables will never be marked as duplicated. This is intended to be used for a fairly small set of values and will not be efficient for an extensive set.

Find the duplicate elements in R Vector.

To find the duplicate elements in Vector, use the duplicated() function in R language.

data <- c(11, 19, 11, 19, 46, 21)
data[duplicated(data)]

Output

[1] 11 19

You can see that we are using indexing of vector and duplicated() function to extract the duplicate data from the Vector.

If you want to remove duplicated elements, use the ! duplicated(), where ! is logical negation.

The ! is logical negation. !duplicated() means that we don’t want duplicate rows.

data <- c(11, 19, 11, 19, 46, 21)
data[!duplicated(data)]

Output

[1] 11 19 46 21

Here, we are extracting the unique values from the vector.

The duplicated() function returns the plain vector of logical values.

data <- c(11, 19, 11, 19, 46, 21)
duplicated(data)

Output

[1] FALSE FALSE TRUE TRUE FALSE FALSE

Find the duplicate elements in the R data frame.

To remove duplicates from the data frame in R, use the duplicated() function and pass the column name as a parameter and use the ! outside the duplicated() function, which returns the unique rows of the data frame.

df <- data.frame(Shares = c("TCS", "Reliance", "HDFC Bank", "HUL", "Reliance"),
                 Price = c(3200, 1900, 1500, 2200, 1900))

df

cat("After Removing Duplicates", "\n")
df[!duplicated(df$Price),]

Output

    Shares    Price
1    TCS      3200
2  Reliance   1900
3  HDFC Bank  1500
4  HUL        2200
5  Reliance   1900

After Removing Duplicates

Shares        Price
1 TCS         3200
2 Reliance    1900
3 HDFC Bank   1500
4 HUL         2200

It completely removes the row from the data frame having duplicate values.

That is it for duplicated() function in R tutorial.

Leave a Comment