When we are working with big data, we normally find an obstacle: Repeated Values. This type of value doesn’t serve a critical problem if we have the capacity to recognize them. Once we have a list of repeated values, it is easy to remove, eliminate or extract them. R provides an inbuilt function that will do the job for us. Let’s see that function in detail.
duplicated() in R
The duplicated() is a built-in R function that determines which elements of a vector or data frame are duplicates of elements with smaller subscripts and returns a logical vector indicating which elements (rows) are duplicates.
duplicated(data, incomparables = FALSE, fromLast = FALSE, nmax = NA, …)
incomparables: It is a vector of values that cannot be compared. FALSE is a special value, meaning that all values can be compared and maybe the only value accepted for methods other than the default. It will be coerced internally to the same type as data.
fromLast: It is the logical argument that indicates if duplication should be considered from the reverse side; for example, the last (or rightmost) of identical elements would correspond to duplicated = FALSE.
nmax: It is the maximum number of unique items expected (greater than one).
…: They are the arguments for specific methods.
MARGIN: It is an array margin to be held fixed: see apply, and note that MARGIN = 0 may be useful.
The duplicated() method returns the logical vector of the same length as input data if it is a vector. For a data frame, a logical vector with one element for each row. For a matrix or array, and when MARGIN = 0, a logical array with the same dimensions and dimnames.
Values in incomparables will never be marked as duplicated. This is intended to be used for a fairly small set of values and will not be efficient for an extensive set.
Find the duplicate elements in R Vector.
To find the duplicate elements in Vector, use the duplicated() function in R language.
data <- c(11, 19, 11, 19, 46, 21) data[duplicated(data)]
 11 19
You can see that we are using indexing of vector and duplicated() function to extract the duplicate data from the Vector.
If you want to remove duplicated elements, use the ! duplicated(), where ! is logical negation.
The ! is logical negation. !duplicated() means that we don’t want duplicate rows.
data <- c(11, 19, 11, 19, 46, 21) data[!duplicated(data)]
 11 19 46 21
Here, we are extracting the unique values from the vector.
The duplicated() function returns the plain vector of logical values.
data <- c(11, 19, 11, 19, 46, 21) duplicated(data)
 FALSE FALSE TRUE TRUE FALSE FALSE
Find the duplicate elements in the R data frame.
To remove duplicates from the data frame in R, use the duplicated() function and pass the column name as a parameter and use the ! outside the duplicated() function, which returns the unique rows of the data frame.
df <- data.frame(Shares = c("TCS", "Reliance", "HDFC Bank", "HUL", "Reliance"), Price = c(3200, 1900, 1500, 2200, 1900)) df cat("After Removing Duplicates", "\n") df[!duplicated(df$Price),]
Shares Price 1 TCS 3200 2 Reliance 1900 3 HDFC Bank 1500 4 HUL 2200 5 Reliance 1900 After Removing Duplicates Shares Price 1 TCS 3200 2 Reliance 1900 3 HDFC Bank 1500 4 HUL 2200
It completely removes the row from the data frame having duplicate values.
That is it for duplicated() function in R tutorial.
Krunal Lathiya is an Information Technology Engineer by education and web developer by profession. He has worked with many back-end platforms, including Node.js, PHP, and Python. In addition, Krunal has excellent knowledge of Data Science and Machine Learning, and he is an expert in R Language. Krunal has written many programming blogs, which showcases his vast expertise in this field.