How to Remove Duplicates from a Vector in R

In R, unique() and subsetting with !duplicated() are efficient ways to remove duplicates.

Removing duplicates from a Vector using unique() in R

Duplicate elements in a vector refer to those elements that appear more than once.

Duplicates can skew the data analysis and lead to inaccurate results. Removing them leads to more reliable insights.

Method 1: Using unique()

The unique() function is a one-step, quick solution that identifies and removes duplicate elements from a vector while preserving the order of the first occurrence. This function is fundamentally optimized for large-scale vectors.

vec <- c(11, 21, 19, 19, 21, 19, 18, 18, 18)

unique_vec <- unique(vec)

unique_vec

# Output: [1] 11 21 19 18

Element “11” appears once, “21” twice, “19” and “18” thrice. So, the final output has only one appearance for each element.

Handling NA

If a vector contains multiple NA values, the unique() method will keep only one NA and remove other NAs.

vec_na <- c(11, 21, 19, 19, NA, 19, 18, 18, NA)

unique_vec_na <- unique(vec_na)

unique_vec_na

# Output: [1] 11 21 19 NA 18

Method 2: Subsetting with !duplicated()

Removing duplicates from a Vector using subsetting with !duplicated() in R

The duplicated() function returns a logical vector indicating which elements are duplicates.

The ! operator suggests negation. Therefore, if I negate it with !duplicated(), I can subset the original vector to obtain only the unique elements.

Using vec[!duplicated(vec)] would keep the first occurrence and remove the duplicates.

vec <- c(11, 21, 19, 19, 21, 19, 18, 18, 18)

unique_vec <- vec[!duplicated(vec)]

unique_vec

# Output: [1] 11 21 19 18

Handling NA

If a vector contains multiple NA values, the vec[!duplicated(vec)] approach will keep only one NA and remove other NAs.

vec_na <- c(11, 21, 19, 19, NA, 19, 18, 18, NA)

unique_vec_na <- vec_na[!duplicated(vec_na)]

unique_vec_na

# Output: [1] 11 21 19 NA 18

That’s all!

Leave a Comment