R Factor and Factor Levels: How to Create Factors in R

Factors are the list of unique values that are stored as integers. Factors are beneficial in data analysis for statistical modeling. Factors are data structures used for fields that take only predefined, a finite number of values (categorical data). 

R Factor

Factors in R are the data objects used to categorize the data and store it as levels. Factors can store both strings and integers. Factors can be ordered or unordered and are an important class for statistical analysis and for plotting.

Factors are saved as integers and have labels associated with these unique integers. While factors look like character vectors, they are actually integers under the hood, and you need to be careful while treating them like strings.

How to Create Factor in R

To create a Factor in R, use the factor() method. The factor() method takes a vector as an input and returns the factor. The factor() function is used to encode a vector as a factor. If the argument ordered is TRUE, the factor levels are considered to be ordered. For compatibility with S, there is also a function ordered.

Syntax

fct = factor(x = character(), levels, labels = levels, exclude = NA, ordered = is.ordered(x), nmax = NA)

Parameters

The factor() function takes the following parameters.

x

takes x as a vector as a parameter.

levels

It is an optional vector of the unique values that x might have taken. The default is the unique set of values taken by as.character(x), sorted into increasing order of x.

labels

It is either an optional character vector of labels for the levels or a character string of length 1. Duplicated values in labels can be used to map different values of x to the same factor level.

exclude

The exclude is a vector of values to be excluded when establishing the set of levels. This may be a factor with the same level set as x or should be a character.

ordered

It is a logical flag to decide if the levels should be regarded as ordered (in the given order).

nmax

It is an upper bound on the number of levels.

Return Value

It returns the factor.

Example

Let’s define a vector and then use the factor() function to create a factor from the vector.

// Pro.R

rv <- c(11, 19, 21, 46, 18, 21)
print(rv)
print(typeof(rv))

print("After converting to factor")
rf <- factor(rv)
print(rf)
print(typeof(rf))

Output

Rscript Pro.R
[1] 11 19 21 46 18 21
[1] "double"
[1] "After converting to factor"
[1] 11 19 21 46 18 21
Levels: 11 18 19 21 46
[1] "integer"

You can see that it returns the levels of the factor.

How to check factor in R

To check if the factor is created, use the is.factor() method.

rv <- c(11, 19, 21, 46, 18, 21)
print(rv)
print(is.factor(rv))

print("After converting to factor")
rf <- factor(rv)
print(rf)
print(is.factor(rf))

When we execute the above code, it produces the following result.

[1] 11 19 21 46 18 21
[1] FALSE
[1] "After converting to factor"
[1] 11 19 21 46 18 21
Levels: 11 18 19 21 46
[1] TRUE

When we pass a vector to the is.factor() function, it returns FALSE, but when we pass the factor to the is.factor() method, it returns TRUE. That means we have successfully created a factor.

The is.factor(), is.ordered(), as.factor(), and as.ordered() are the membership and coercion functions for these classes.

How to access elements of a factor

To access elements of a factor, use indexing. Factor indexing is starting from 1.

rv <- c(11, 19, 21, 46, 18, 21)
rf <- factor(rv)
print(rf[3])

Output

Rscript Pro.R
[1] 21
Levels: 11 18 19 21 46

You can access the multiple components bypassing the vector as an index.

rv <- c(11, 19, 21, 46, 18, 21)
rf <- factor(rv)
print(rf[c(2, 3)])

Output

[1] 19 21
Levels: 11 18 19 21 46

Pass the negative index to select all the components except the negative index component.

rv <- c(11, 19, 21, 46, 18, 21)
rf <- factor(rv)
print(rf[-4])

Output

[1] 11 19 21 18 21
Levels: 11 18 19 21 46

You can also pass the logical vector as an index.

rv <- c(11, 19, 21, 46, 18, 21)
rf <- factor(rv)
print(rf[c(FALSE, TRUE, TRUE, TRUE, TRUE, FALSE)])

Output

[1] 19 21 46 18
Levels: 11 18 19 21 46

If the index is TRUE, then the component will include in the output; otherwise, it will not.

How to modify a factor in R

To modify a factor in R, use the assignment (<-) operator. However, we cannot choose values outside of its predefined levels.

rv <- c(11, 19, 21, 46, 18)
rf <- factor(rv)
print(rf)
rf[4] <- 11
print(rf)

Output

[1] 11 19 21 46 18
Levels: 11 18 19 21 46
[1] 11 19 21 11 18
Levels: 11 18 19 21 46

You can see that I have modified the 4th component from 46 to 11, but it is modified within the level values. We cannot assign values outside levels.

rv <- c(11, 19, 21, 46, 18)
rf <- factor(rv)
print(rf)
rf[4] <- 29
print(rf)

Output

[1] 11 19 21 46 18
Levels: 11 18 19 21 46
Warning message:
In `[<-.factor`(`*tmp*`, 4, value = 29) :
invalid factor level, NA generated
[1] 11 19 21 <NA> 18
Levels: 11 18 19 21 46

And we get an error: invalid factor level, NA generated.

To solve this invalid factor level error, we need to add the value to the level first.

rv <- c(11, 19, 21, 46, 18)
rf <- factor(rv)
print(rf)
levels(rf) <- c(levels(rf), "29")
rf[4] <- 29
print(rf)

Output

[1] 11 19 21 46 18
Levels: 11 18 19 21 46
[1] 11 19 21 29 18
Levels: 11 18 19 21 46 29/* Your code... */

Generating Factor Levels in R

To generate factor levels, use the gl() function. The gl() function takes two integers as an input, which indicates how many levels and how many times each level.

Syntax

gl(n, k, labels)

Parameters

The following is the description of the parameters used:

  1. n parameter is the integer giving the number of levels.
  2. k parameter is the integer giving the number of replications.
  3. labels parameter is a vector of labels for the resulting factor levels.

Example

Let’s generate factor levels.

vf <- gl(3, 3, labels = c("Sydney", "Perth", "Melbourne"))
print(vf)

Output

1] Sydney Sydney Sydney Perth Perth Perth Melbourne
[8] Melbourne Melbourne
Levels: Sydney Perth Melbourne

Changing the Order of Levels

The order of the levels in a factor can be changed by applying the factor function again with the new order of the levels.

rv <- c(11, 19, 21, 46, 18)
rf <- factor(rv)
print(rf)

new_rf <- factor(rf, levels = c(46, 21, 19, 18, 11))
print(new_rf)

Output

[1] 11 19 21 46 18
Levels: 11 18 19 21 46
[1] 11 19 21 46 18
Levels: 46 21 19 18 11

You can see from the output that the orders of levels have been changed.

Factors in Data Frame

On creating any data frame with a column of text data, R treats the text column as categorical data and creates factors on it.

# Create the vectors for data frame.
name <- c("Krunal", "Ankit", "Niva", "Mansi")
age <- c(27, 25, 23, 19)
gender <- c("male", "male", "female", "female")

# Create the data frame.
input_data <- data.frame(name, age, gender)
print(input_data)

# Print the gender column so see the levels.
print(input_data$gender)

Output

   name  age gender
1 Krunal 27  male
2 Ankit  25  male
3 Niva   23  female
4 Mansi  19  female
[1] "male" "male" "female" "female"

That is it for the R Factor example.

See also

R Matrix

Subsetting in R

Vector Math in R

Leave a Comment