cut() Function in R with Example

R cut() function allows you to cut data into bins and specify ‘cut labels’, so it is beneficial to create a factor from a continuous variable.

cut() Function in R

The cut() is a built-in R function that divides the range of x into intervals and codes the values in x according to which interval they fall. To convert Numeric to Factor in R, use the cut() function.

Syntax

cut(nv, breaks, labels = NULL,
    include.lowest = FALSE, right = TRUE, 
    dig.lab = 3, ordered_result = FALSE, …)

Arguments

nv: It is a numeric input vector.

breaks: It is a Number or vector of breaks.

labels = NULL: They are Labels for each group.

include.lowest = FALSE: Whether to include the lowest ‘break’ or not.

right = TRUE: Whether the right interval is closed (and the left open) or vice versa.

dig.lab = 3: Number of digits of the groups if labels = NULL.

ordered_result = FALSE: Whether to order the factor result or not.

Example

To generate a random distribution number in R, use the rnorm() function. The normal distribution is the collection of random data from independent sources is distributed normally.

data <- stats::rnorm(20)

c <- cut(data, breaks = -3:3)

c

Output

 [1] (0,1] (-1,0] (-2,-1] (0,1] (0,1] (1,2] (-1,0] (2,3] (-1,0]
[10] (0,1] (-1,0] (0,1] (0,1] (-2,-1] (-1,0] (0,1] (-1,0] (1,2]
[19] (-1,0] (-1,0]

Levels: (-3,-2] (-2,-1] (-1,0] (0,1] (1,2] (2,3]

The breaks argument allows you to cut the data in bins and hence to categorize it.

To check the data distribution in different ranges, use the summary() function.

data <- stats::rnorm(20)

c <- cut(data, breaks = -3:3)
summary(c)

Output

(-3,-2]  (-2,-1] (-1,0]  (0,1]  (1,2]  (2,3]
    0       1      9      9       1      0

The numbers are divided into 6 levels. Some levels are empty.

You can set the “breaks” argument to any integer, creating as many intervals (levels) as the defined number. These intervals will be all of the same lengths.

c <- cut(data, breaks = 2)

Output

[1] (-1.39,0.534] (-1.39,0.534] (-1.39,0.534] (0.534,2.46] (-1.39,0.534]
[6] (-1.39,0.534] (0.534,2.46] (-1.39,0.534] (-1.39,0.534] (-1.39,0.534]
[11] (-1.39,0.534] (0.534,2.46] (-1.39,0.534] (0.534,2.46] (-1.39,0.534]
[16] (-1.39,0.534] (0.534,2.46] (-1.39,0.534] (0.534,2.46] (-1.39,0.534]

Levels: (-1.39,0.534] (0.534,2.46]

You can see that the number has been divided into two intervals. You can also specify the intervals you prefer.

data <- stats::rnorm(20)

c <- cut(data, breaks = c(-2, 2, 1))

c

Output

 [1] (1,2] (-2,1] (-2,1] (-2,1] (-2,1] (1,2] (1,2] (-2,1] (-2,1] (1,2]
[11] (-2,1] (-2,1] <NA> (-2,1] (1,2] (-2,1] (-2,1] (-2,1] (-2,1] (-2,1]

Levels: (-2,1] (1,2]

It is worth mentioning that if the intervals have decimals, you can modify the number of decimals with the dig.lab.

data <- stats::rnorm(30)

c <- cut(data, breaks = 6, dig.lab=2)
c

Output

[1] (1,1.8] (-1.4,-0.59] (-0.59,0.22] (-0.59,0.22] (0.22,1]
[6] (-0.59,0.22] (1,1.8] (0.22,1] (-1.4,-0.59] (0.22,1]
[11] (-0.59,0.22] (0.22,1] (1,1.8] (-0.59,0.22] (-2.2,-1.4]
[16] (1,1.8] (1,1.8] (1.8,2.7] (-2.2,-1.4] (0.22,1]
[21] (-2.2,-1.4] (0.22,1] (-1.4,-0.59] (-0.59,0.22] (0.22,1]
[26] (-0.59,0.22] (-2.2,-1.4] (0.22,1] (1.8,2.7] (-0.59,0.22]

Levels: (-2.2,-1.4] (-1.4,-0.59] (-0.59,0.22] (0.22,1] (1,1.8] (1.8,2.7]

Passing labels argument to the cut() function in R

To change the levels of the output factor in the cut() method, use the labels argument.

info <- c(11, 21, 18, 19, 23, 46, 29, 37)

cut(info, breaks = c(0, 2, 10, 60, 40, 50),
          labels = c("First", "Second", "Third", "Fourth", "Fifth"))

Output

[1] Third Third Third Third Third Fourth Third Third

Levels: First Second Third Fourth Fifth

That is it for this tutorial.

Leave a Comment