How to Extract Unique Elements in R using unique() Function

The unique() function has irreplaceable importance in the EDA (Exploratory Data Analysis) as it directly recognizes and drops the duplicate values in the data. Let’s see how to extract the data from the R object.

Extract Unique Elements in R

To extract unique elements from Vector, data frame, or array-like R object, use the unique() function. The unique() is an inbuilt R function that returns a vector, data frame, or array-like object but with duplicate elements/rows removed.

Syntax of unique() function

unique(data, incomparables = FALSE, fromLast = FALSE, nmax = NA, …)

Parameters

data: It is a vector or a data frame or an array or NULL.

incomparables: It is a vector of values that cannot be compared. FALSE is a special value, meaning that all values can be compared and maybe the only value accepted for methods other than the default. It will be coerced internally to the same type as data.

fromLast: It is a logical argument that indicates if duplication should be considered from the last; for example, the last (or rightmost) of identical elements will be kept. This only matters for names or dimnames.

nmax: It is the maximum number of unique items expected (greater than one). See duplicated.

…: They are arguments for specific methods.

MARGIN: It is the array margin to be held fixed: a single integer.

Return Value

If the input data is a Vector, it returns an object of the same type of data input, but with only one copy of each duplicated element.

If the input data is a data frame, it returns a data frame with the same columns but possibly fewer rows.

If the input data is a matrix or array, it is subsetted by [, drop = FALSE], so dimensions and dimnames are copied appropriately, and the result always has the same number of dimensions as input data.

Missing values like NA are regarded as equal, numeric, and complex ones differing from NaN; character strings will be compared in a “common encoding”.

Values in incomparables will never be marked as duplicated. This is intended to be used for a fairly small set of values and will not be efficient for an extensive set.

Find unique values in R Vector.

To find unique values in the R Vector, use the unique() function and pass the Vector as an input argument. It returns the vector of non-repeated values.

data <- c(11, 19, 11, 19, 46, 21)
unique(data, incomparables = FALSE)

Output

[1] 11 19 46 21

You can see that the input vector has two values that appeared more than once. 1st is 11, and 2nd is 19. In the output, we can see that both values have appeared only once.

That means, unique() function only returns one copy of the element.

Extract unique values in R Matrix.

To extract the unique values from R Matrix, use the unique() method. To create a matrix in R, use the matrix() function.

mtrx <- matrix(rep(1:12, length.out = 18), nrow = 6, ncol = 3, byrow = T)
mtrx
cat("Unique values from matrix", "\n")
unique(mtrx)

Output

     [,1] [,2] [,3]
[1,]  1    2    3
[2,]  4    5    6
[3,]  7    8    9
[4,] 10   11   12
[5,]  1    2    3
[6,]  4    5    6

Unique values from matrix

     [,1] [,2] [,3]
[1,]   1    2    3
[2,]   4    5    6
[3,]   7    8    9
[4,]  10   11   12

If fromLast=TRUE, the duplicated row will be considered from the last backward.

mtrx <- matrix(rep(1:12, length.out = 18), nrow = 6, ncol = 3, byrow = T)
mtrx
cat("Unique values from matrix", "\n")
unique(mtrx, fromLast=T)

Output

     [,1] [,2] [,3]
[1,]  1    2    3
[2,]  4    5    6
[3,]  7    8    9
[4,] 10   11   12
[5,]  1    2    3
[6,]  4    5    6

Unique values from matrix

     [,1] [,2] [,3]
[1,]   7   8    9
[2,]  10   11   12
[3,]   1   2    3
[4,]   4   5    6

Find Unique Rows of Data Frame

To extract the unique rows of a data frame in R, use the unique() function and pass the data frame as an argument and the method returns unique rows.

data <- data.frame(a1 = c(11, 11, 21, 31, 41, 21, 21),
                   a2 = c("x", "x", "x", "b", "y", "y", "x"))

data

Output

   a1   a2
1  11   x
2  11   x
3  21   x
4  31   b
5  41   y
6  21   y
7  21   x

Now, use the unique() function to return unique rows from the data frame.

data <- data.frame(a1 = c(11, 11, 21, 31, 41, 21, 21),
                   a2 = c("x", "x", "x", "b", "y", "y", "x"))

data_unique <- unique(data)
data_unique

Output

   a1   a2
1  11   x
3  21   x
4  31   b
5  41   y
6  21   y

You can see from the output that columns remain the same, but duplicate rows have been removed.

Get Unique Values of the columns in the data frame.

You can also pass the column name as an argument to the unique() function that returns the unique value of the particular column.

data <- data.frame(a1 = c(11, 11, 21, 31, 41, 21, 21),
                   a2 = c("x", "x", "x", "b", "y", "y", "x"))

data_column_unique <- unique(data$a2)
data_column_unique

Output

[1] "x" "b" "y"

Count the distinct value of a column in R

To find a length of a vector, use the length() function.

data <- data.frame(a1 = c(11, 11, 21, 31, 41, 21, 21),
                   a2 = c("x", "x", "x", "b", "y", "y", "x"))

data_column_unique_length <- length(unique(data$a2))
data_column_unique_length

Output

[1] 3

That means the a2 column has 3 unique values, which are x, b, and y.

Conclusion

The unique() function eliminates duplicate elements/rows from a vector, data frame, or array.

 

Leave a Comment