R Basic

How to Remove Duplicates from a Vector in R

In R, unique() and subsetting with !duplicated() are efficient ways to remove duplicates.

Duplicate elements in a vector refer to those elements that appear more than once.

Duplicates can skew the data analysis and lead to inaccurate results. Removing them leads to more reliable insights.

Method 1: Using unique()

The unique() function is a one-step, quick solution that identifies and removes duplicate elements from a vector while preserving the order of the first occurrence. This function is fundamentally optimized for large-scale vectors.

vec <- c(11, 21, 19, 19, 21, 19, 18, 18, 18)

unique_vec <- unique(vec)

unique_vec

# Output: [1] 11 21 19 18

Element “11” appears once, “21” twice, “19” and “18” thrice. So, the final output has only one appearance for each element.

Handling NA

If a vector contains multiple NA values, the unique() method will keep only one NA and remove other NAs.

vec_na <- c(11, 21, 19, 19, NA, 19, 18, 18, NA)

unique_vec_na <- unique(vec_na)

unique_vec_na

# Output: [1] 11 21 19 NA 18

Method 2: Subsetting with !duplicated()

The duplicated() function returns a logical vector indicating which elements are duplicates.

The ! operator suggests negation. Therefore, if I negate it with !duplicated(), I can subset the original vector to obtain only the unique elements.

Using vec[!duplicated(vec)] would keep the first occurrence and remove the duplicates.

vec <- c(11, 21, 19, 19, 21, 19, 18, 18, 18)

unique_vec <- vec[!duplicated(vec)]

unique_vec

# Output: [1] 11 21 19 18

Handling NA

If a vector contains multiple NA values, the vec[!duplicated(vec)] approach will keep only one NA and remove other NAs.

vec_na <- c(11, 21, 19, 19, NA, 19, 18, 18, NA)

unique_vec_na <- vec_na[!duplicated(vec_na)]

unique_vec_na

# Output: [1] 11 21 19 NA 18

That’s all!

Recent Posts

colSums(): Calculating the Sum of Columns of a Data Frame in R

The colSums() function in R calculates the sums of columns for numeric matrices, data frames,…

4 days ago

rowSums(): Calculating the Sum of Rows of a Matrix or Data Frame in R

The rowSums() function calculates the sum of values in each numeric row of a matrix,…

7 days ago

R View() Function

The View() is a utility function in R that invokes a more intuitive spreadsheet-style data…

2 weeks ago

summary() Function: Producing Summary Statistics in R

The summary() is a generic function that produces the summary statistics for various R objects,…

3 weeks ago

R paste() Function

The paste() function in R concatenates vectors after converting them to character. paste("Hello", 19, 21,…

4 weeks ago

paste0() Function in R

R paste0() function concatenates strings without any separator between them. It is a shorthand version…

4 weeks ago