R Basic

How to Remove Duplicates from a Vector in R

Duplicate elements in a vector means those elements appear more than once. Duplicates can skew the data analysis and lead to inaccurate results. Removing them leads to more reliable insights.

In R, unique() and subsetting with !duplicated() are efficient ways to remove duplicates.

Method 1: Using unique()

The unique() function is a one-step quick solution that identifies and removes duplicate elements from a vector while preserving the order of the first occurrence. This function is fundamentally optimized for large-scale vectors.

vec <- c(11, 21, 19, 19, 21, 19, 18, 18, 18)

unique_vec <- unique(vec)

unique_vec

# Output: [1] 11 21 19 18

Element “11” appears once, “21” twice, “19” and “18” thrice. So, the final output has only one appearance for each element.

Handling NA

If a vector contains multiple NA values, the unique() method will keep only one NA and remove other NAs.

vec_na <- c(11, 21, 19, 19, NA, 19, 18, 18, NA)

unique_vec_na <- unique(vec_na)

unique_vec_na

# Output: [1] 11 21 19 NA 18

Pros

  1. Minimal code required.
  2. Clearly conveys the intent of the function to remove duplicates and return unique values.

Cons

  1. It cannot modify the logic without additional steps.
  2. It does not return any type of metadata, including how many duplicates are there and so on.

Method 2: Subsetting with !duplicated()

The duplicated() function returns the logical vector, suggesting which elements are duplicates. The ! operator suggests negation. So, if I negate it with !duplicated(), I can subset the original vector to get only unique elements.

Using vec[!duplicated(vec)] would actually keep the first occurrence and remove the duplicates.

vec <- c(11, 21, 19, 19, 21, 19, 18, 18, 18)

unique_vec <- vec[!duplicated(vec)]

unique_vec

# Output: [1] 11 21 19 18

Handling NA

If a vector contains multiple NA values, the vec[!duplicated(vec)] approach will keep only one NA and remove other NAs.

vec_na <- c(11, 21, 19, 19, NA, 19, 18, 18, NA)

unique_vec_na <- vec_na[!duplicated(vec_na)]

unique_vec_na

# Output: [1] 11 21 19 NA 18

Pros

  1. Subsetting is a flexible approach that can combine with other logical conditions for advanced filtering.
  2. The duplicated() returns a logical vector, which you can use as metadata for advanced debugging.
  3. It works well with data frames/matrices.
  4. It can customize duplicates in which you can keep the last occurrence.

Cons

  1. It is complex compared to the unique() method.
  2. This approach might become less efficient when the vector is large.

That’s all!

Recent Posts

R append() Function: Complete Guide

The append() function in R concatenates values to a vector or list at a specified…

11 hours ago

How to Remove NULL from List and Nested List in R

NULL represents a null object, and sometimes, it's logical for the project to filter it…

1 day ago

How to Remove the Last Row or N Rows from DataFrame in R

In a real-life dataset, the last row may contain metadata, summaries, footnotes, or unwanted rows…

3 days ago

How to Remove the First Row of DataFrame in R

When we attempt to remove the first row of a data frame, we are essentially…

6 days ago

R basename() Function

The basename() is a base R function that extracts the last component (or the 'base…

1 week ago

How to Append an Element to a List at Any Position in R

To grow the list, you can add an element (numeric value, character vectors, other lists,…

1 week ago