R Basic

How to Remove Duplicates from a Vector in R

In R, unique() and subsetting with !duplicated() are efficient ways to remove duplicates.

Duplicate elements in a vector refer to those elements that appear more than once.

Duplicates can skew the data analysis and lead to inaccurate results. Removing them leads to more reliable insights.

Method 1: Using unique()

The unique() function is a one-step, quick solution that identifies and removes duplicate elements from a vector while preserving the order of the first occurrence. This function is fundamentally optimized for large-scale vectors.

vec <- c(11, 21, 19, 19, 21, 19, 18, 18, 18)

unique_vec <- unique(vec)

unique_vec

# Output: [1] 11 21 19 18

Element “11” appears once, “21” twice, “19” and “18” thrice. So, the final output has only one appearance for each element.

Handling NA

If a vector contains multiple NA values, the unique() method will keep only one NA and remove other NAs.

vec_na <- c(11, 21, 19, 19, NA, 19, 18, 18, NA)

unique_vec_na <- unique(vec_na)

unique_vec_na

# Output: [1] 11 21 19 NA 18

Method 2: Subsetting with !duplicated()

The duplicated() function returns a logical vector indicating which elements are duplicates.

The ! operator suggests negation. Therefore, if I negate it with !duplicated(), I can subset the original vector to obtain only the unique elements.

Using vec[!duplicated(vec)] would keep the first occurrence and remove the duplicates.

vec <- c(11, 21, 19, 19, 21, 19, 18, 18, 18)

unique_vec <- vec[!duplicated(vec)]

unique_vec

# Output: [1] 11 21 19 18

Handling NA

If a vector contains multiple NA values, the vec[!duplicated(vec)] approach will keep only one NA and remove other NAs.

vec_na <- c(11, 21, 19, 19, NA, 19, 18, 18, NA)

unique_vec_na <- vec_na[!duplicated(vec_na)]

unique_vec_na

# Output: [1] 11 21 19 NA 18

That’s all!

Recent Posts

R scale(): Scaling and Centering of Matrix-like Objects

The scale() function in R centers (subtracting the mean) and/or scales (dividing by the standard…

2 weeks ago

file.rename(): Renaming Single and Multiple Files in R

To rename a file in R, you can use the file.rename() function. It renames a…

3 weeks ago

R prop.table() Function

The prop.table() function in R calculates the proportion or relative frequency of values in a…

3 weeks ago

exp() Function: Calculate Exponential of a Number in R

The exp() is a built-in function that calculates the exponential of its input, raising Euler's…

3 weeks ago

R split() Function: Splitting a Data

The split() function divides the input data into groups based on some criteria, typically specified…

4 weeks ago

colMeans(): Calculating the Mean of Columns in R Data Frame

The colMeans() function in R calculates the arithmetic mean of columns in a numeric matrix,…

1 month ago