What is the n_distinct() Function in R

The n_distinct() function in R is “used to count the number of unique/distinct combinations in a set of one or more vectors.” It’s a faster and more concise equivalent to the nrow(unique(data.frame(…))) function.

Syntax

n_distinct(..., na.rm = FALSE)

Parameters

…: Unnamed vectors. If multiple vectors are supplied, then they should have the same length.

na.rm: If TRUE, exclude missing observations from the count. If there are multiple vectors in …, an observation will be excluded if any of the values are missing.

Example 1: How to Use n_distinct() function

library(dplyr)

vec <- c(1, 2, 3, 3, 4, 4, 4, 5)

num_unique <- n_distinct(vec)
print(num_unique)

Output

[1] 5

Example 2: Using n_distinct() function with data frame

library(dplyr)

# Example with a data frame
df <- data.frame(
  x = c(1, 2, 3, 3, 4, 4),
  y = c("a", "b", "b", "c", "c", "c")
)

# Count unique values in the x column
num_unique_x <- n_distinct(df$x)
print(num_unique_x) # This will output 4

# Count unique combinations of x and y
num_unique_combinations <- n_distinct(df$x, df$y)
print(num_unique_combinations)

Output

[1] 4
[1] 5

The n_distinct() function is specifically beneficial when working with grouped data and want to calculate the number of unique values within each group. You’d typically combine it with the group_by() function for that.

That’s it!

Related posts

How to Count Unique Values by Group in R

Remove Duplicate Rows in R

Leave a Comment