In R, unique() and subsetting with !duplicated() are efficient ways to remove duplicates.
Duplicate elements in a vector refer to those elements that appear more than once.
Duplicates can skew the data analysis and lead to inaccurate results. Removing them leads to more reliable insights.
The unique() function is a one-step, quick solution that identifies and removes duplicate elements from a vector while preserving the order of the first occurrence. This function is fundamentally optimized for large-scale vectors.
vec <- c(11, 21, 19, 19, 21, 19, 18, 18, 18)
unique_vec <- unique(vec)
unique_vec
# Output: [1] 11 21 19 18
Element “11” appears once, “21” twice, “19” and “18” thrice. So, the final output has only one appearance for each element.
If a vector contains multiple NA values, the unique() method will keep only one NA and remove other NAs.
vec_na <- c(11, 21, 19, 19, NA, 19, 18, 18, NA)
unique_vec_na <- unique(vec_na)
unique_vec_na
# Output: [1] 11 21 19 NA 18
The duplicated() function returns a logical vector indicating which elements are duplicates.
The ! operator suggests negation. Therefore, if I negate it with !duplicated(), I can subset the original vector to obtain only the unique elements.
Using vec[!duplicated(vec)] would keep the first occurrence and remove the duplicates.
vec <- c(11, 21, 19, 19, 21, 19, 18, 18, 18)
unique_vec <- vec[!duplicated(vec)]
unique_vec
# Output: [1] 11 21 19 18
If a vector contains multiple NA values, the vec[!duplicated(vec)] approach will keep only one NA and remove other NAs.
vec_na <- c(11, 21, 19, 19, NA, 19, 18, 18, NA)
unique_vec_na <- vec_na[!duplicated(vec_na)]
unique_vec_na
# Output: [1] 11 21 19 NA 18
That’s all!
Krunal Lathiya is a seasoned Computer Science expert with over eight years in the tech industry. He boasts deep knowledge in Data Science and Machine Learning. Versed in Python, JavaScript, PHP, R, and Golang. Skilled in frameworks like Angular and React and platforms such as Node.js. His expertise spans both front-end and back-end development. His proficiency in the Python language stands as a testament to his versatility and commitment to the craft.
R cbind (column bind) is a function that combines specified vectors, matrices, or data frames…
The rbind() function combines R objects, such as vectors, matrices, or data frames, by rows.…
The as.numeric() function in R converts valid non-numeric data into numeric data. What do I…
The log() function calculates the natural logarithm (base e) of a numeric vector. By default,…
In R, you can use the dollar sign ($ operator) to access elements (columns) of…
The abs() function calculates the absolute value of a numeric input, returning a non-negative (only…