What are R Factor and Factor Levels

A factor in R is a “data structure that stores categorical data, such as gender, country, marital status, etc”. Factors are the unique values that categorize and store the data as levels.

How to Create Factor in R

To create a factor in R, use the “factor()” method. The factor() method takes a vector as an input and returns the factor. It encodes a vector as a factor. If the argument ordered is TRUE, the factor levels are considered to be ordered. For compatibility with S, there is also a function ordered.

Syntax

fct = factor(x = character(), levels, labels = levels, 
            exclude = NA, ordered = is.ordered(x), nmax = NA)

Parameters

The factor() function takes the following parameters.

  1. x: takes x as a vector as a parameter.
  2. levels: It is an optional vector of the unique values that x might have taken. The default is the unique set of values taken by as.character(x), sorted into increasing order of x.
  3. labels: It is either an optional character vector of labels for the levels or a character string of length 1. Duplicated values in labels can be used to map different values of x to the same factor level.
  4. exclude: The exclude is a vector of values to be excluded when establishing the set of levels. This factor may be with the same level set as x or should be a character.
  5. ordered: It is a logical flag to decide if the levels should be regarded as ordered (in the given order).
  6. nmax: It is an upper bound on the number of levels.

Return Value

It returns the factor.

Example

Factors are saved as integers and have labels associated with these unique integers. While factors look like character vectors, they are integers under the hood, and you need to be careful while treating them like strings.

Let’s define a vector and then use the factor() function to create a factor from the vector.

rv <- c(11, 19, 21, 46, 18, 21)
print(rv)
print(typeof(rv))

print("After converting to factor")
rf <- factor(rv)
print(rf)
print(typeof(rf))

Output

[1] 11 19 21 46 18 21
[1] "double"
[1] "After converting to factor"
[1] 11 19 21 46 18 21
Levels: 11 18 19 21 46
[1] "integer"

You can see that it returns the levels of the factor.

How to check factors in R

To check if the variable is a vector, you can use the “is.factor()” method.

rv <- c(11, 19, 21, 46, 18, 21)
print(rv)
print(is.factor(rv))

print("After converting to factor")
rf <- factor(rv)
print(rf)
print(is.factor(rf))

When we execute the above code, it produces the following result.

[1] 11 19 21 46 18 21
[1] FALSE
[1] "After converting to factor"
[1] 11 19 21 46 18 21
Levels: 11 18 19 21 46
[1] TRUE

When we pass a vector to the is.factor() function returns FALSE, but when we pass the factor to the is.factor() method, it returns TRUE. That means we have successfully created a factor.

The is.factor(), is.ordered(), as.factor(), and as.ordered() are the membership and coercion functions for these classes.

How to access elements of a factor in R

To access elements of a factor, use indexing. Factor indexing starts from 1.

rv <- c(11, 19, 21, 46, 18, 21)
rf <- factor(rv)
print(rf[3])

Output

[1] 21
Levels: 11 18 19 21 46

You can access multiple components bypassing the vector as an index.

rv <- c(11, 19, 21, 46, 18, 21)
rf <- factor(rv)
print(rf[c(2, 3)])

Output

[1] 19 21
Levels: 11 18 19 21 46

Pass the negative index to select all the components except the negative index component.

rv <- c(11, 19, 21, 46, 18, 21)
rf <- factor(rv)
print(rf[-4])

Output

[1] 11 19 21 18 21
Levels: 11 18 19 21 46

You can also pass the logical vector as an index.

rv <- c(11, 19, 21, 46, 18, 21)
rf <- factor(rv)
print(rf[c(FALSE, TRUE, TRUE, TRUE, TRUE, FALSE)])

Output

[1] 19 21 46 18
Levels: 11 18 19 21 46

If the index is TRUE, then the component will include in the output; otherwise, it will not.

How to modify a factor in R

To modify a factor in R, use the “assignment (<-)” operator. However, we cannot choose values outside of its predefined levels.

rv <- c(11, 19, 21, 46, 18)
rf <- factor(rv)
print(rf)
rf[4] <- 11
print(rf)

Output

[1] 11 19 21 46 18
Levels: 11 18 19 21 46
[1] 11 19 21 11 18
Levels: 11 18 19 21 46

You can see that I have modified the 4th component from 46 to 11, but it is modified within the level values. We cannot assign values outside levels.

rv <- c(11, 19, 21, 46, 18)
rf <- factor(rv)
print(rf)
rf[4] <- 29
print(rf)

Output

[1] 11 19 21 46 18
Levels: 11 18 19 21 46
Warning message:
In `[<-.factor`(`*tmp*`, 4, value = 29) :
invalid factor level, NA generated
[1] 11 19 21 <NA> 18
Levels: 11 18 19 21 46

And we get an error: invalid factor level, NA generated.

To fix the invalid factor level error, we need to add the value to the level first.

rv <- c(11, 19, 21, 46, 18)
rf <- factor(rv)
print(rf)
levels(rf) <- c(levels(rf), "29")
rf[4] <- 29
print(rf)

Output

[1] 11 19 21 46 18
Levels: 11 18 19 21 46
[1] 11 19 21 29 18
Levels: 11 18 19 21 46 29

Generating Factor Levels in R

To generate factor levels, use the gl() function. The gl() function takes two integers as input, which indicates how many levels and how many times each level.

Syntax

gl(n, k, labels)

Parameters

The following is the description of the parameters used:

  1. n parameter is the integer giving the number of levels.
  2. k parameter is the integer giving the number of replications.
  3. labels parameter is a vector of labels for the resulting factor levels.

Example

Let’s generate factor levels.

vf <- gl(3, 3, labels = c("Sydney", "Perth", "Melbourne"))
print(vf)

Output

1] Sydney Sydney Sydney Perth Perth Perth Melbourne
[8] Melbourne Melbourne
Levels: Sydney Perth Melbourne

Changing the Order of Levels

The order of the levels in a factor can be changed by applying the factor function again with the new order of the levels.

rv <- c(11, 19, 21, 46, 18)
rf <- factor(rv)
print(rf)

new_rf <- factor(rf, levels = c(46, 21, 19, 18, 11))
print(new_rf)

Output

[1] 11 19 21 46 18
Levels: 11 18 19 21 46
[1] 11 19 21 46 18
Levels: 46 21 19 18 11

You can see from the output that the orders of levels have been changed.

Factors in Data Frame

When creating any data frame with a text data column, R treats the text column as categorical data and creates factors on it.

# Create the vectors for data frame.
name <- c("Krunal", "Ankit", "Niva", "Mansi")
age <- c(27, 25, 23, 19)
gender <- c("male", "male", "female", "female")

# Create the data frame.
input_data <- data.frame(name, age, gender)
print(input_data)

# Print the gender column so see the levels.
print(input_data$gender)

Output

   name  age gender
1 Krunal 27  male
2 Ankit  25  male
3 Niva   23  female
4 Mansi  19  female
[1] "male" "male" "female" "female"

That’s it.

Leave a Comment