Factors are the list of unique values that are stored as integers. Factors are beneficial in data analysis for statistical modeling. Factors are data structures used for fields that take only predefined, a finite number of values (categorical data).
R Factor
Factors in R are the data objects used to categorize the data and store it as levels. Factors can store both strings and integers. Factors can be ordered or unordered and are an important class for statistical analysis and for plotting.
Factors are saved as integers and have labels associated with these unique integers. While factors look like character vectors, they are actually integers under the hood, and you need to be careful while treating them like strings.
How to Create Factor in R
To create a Factor in R, use the factor() method. The factor() method takes a vector as an input and returns the factor. The factor() function is used to encode a vector as a factor. If the argument ordered is TRUE, the factor levels are considered to be ordered. For compatibility with S, there is also a function ordered.
Syntax
fct = factor(x = character(), levels, labels = levels, exclude = NA, ordered = is.ordered(x), nmax = NA)
Parameters
The factor() function takes the following parameters.
x
takes x as a vector as a parameter.
levels
It is an optional vector of the unique values that x might have taken. The default is the unique set of values taken by as.character(x), sorted into increasing order of x.
labels
It is either an optional character vector of labels for the levels or a character string of length 1. Duplicated values in labels can be used to map different values of x to the same factor level.
exclude
The exclude is a vector of values to be excluded when establishing the set of levels. This may be a factor with the same level set as x or should be a character.
ordered
It is a logical flag to decide if the levels should be regarded as ordered (in the given order).
nmax
It is an upper bound on the number of levels.
Return Value
It returns the factor.
Example
Let’s define a vector and then use the factor() function to create a factor from the vector.
// Pro.R
rv <- c(11, 19, 21, 46, 18, 21)
print(rv)
print(typeof(rv))
print("After converting to factor")
rf <- factor(rv)
print(rf)
print(typeof(rf))
Output
Rscript Pro.R
[1] 11 19 21 46 18 21
[1] "double"
[1] "After converting to factor"
[1] 11 19 21 46 18 21
Levels: 11 18 19 21 46
[1] "integer"
You can see that it returns the levels of the factor.
How to check factor in R
To check if the factor is created, use the is.factor() method.
rv <- c(11, 19, 21, 46, 18, 21)
print(rv)
print(is.factor(rv))
print("After converting to factor")
rf <- factor(rv)
print(rf)
print(is.factor(rf))
When we execute the above code, it produces the following result.
[1] 11 19 21 46 18 21
[1] FALSE
[1] "After converting to factor"
[1] 11 19 21 46 18 21
Levels: 11 18 19 21 46
[1] TRUE
When we pass a vector to the is.factor() function, it returns FALSE, but when we pass the factor to the is.factor() method, it returns TRUE. That means we have successfully created a factor.
The is.factor(), is.ordered(), as.factor(), and as.ordered() are the membership and coercion functions for these classes.
How to access elements of a factor
To access elements of a factor, use indexing. Factor indexing is starting from 1.
rv <- c(11, 19, 21, 46, 18, 21)
rf <- factor(rv)
print(rf[3])
Output
Rscript Pro.R
[1] 21
Levels: 11 18 19 21 46
You can access the multiple components bypassing the vector as an index.
rv <- c(11, 19, 21, 46, 18, 21)
rf <- factor(rv)
print(rf[c(2, 3)])
Output
[1] 19 21
Levels: 11 18 19 21 46
Pass the negative index to select all the components except the negative index component.
rv <- c(11, 19, 21, 46, 18, 21)
rf <- factor(rv)
print(rf[-4])
Output
[1] 11 19 21 18 21
Levels: 11 18 19 21 46
You can also pass the logical vector as an index.
rv <- c(11, 19, 21, 46, 18, 21)
rf <- factor(rv)
print(rf[c(FALSE, TRUE, TRUE, TRUE, TRUE, FALSE)])
Output
[1] 19 21 46 18
Levels: 11 18 19 21 46
If the index is TRUE, then the component will include in the output; otherwise, it will not.
How to modify a factor in R
To modify a factor in R, use the assignment (<-) operator. However, we cannot choose values outside of its predefined levels.
rv <- c(11, 19, 21, 46, 18)
rf <- factor(rv)
print(rf)
rf[4] <- 11
print(rf)
Output
[1] 11 19 21 46 18
Levels: 11 18 19 21 46
[1] 11 19 21 11 18
Levels: 11 18 19 21 46
You can see that I have modified the 4th component from 46 to 11, but it is modified within the level values. We cannot assign values outside levels.
rv <- c(11, 19, 21, 46, 18)
rf <- factor(rv)
print(rf)
rf[4] <- 29
print(rf)
Output
[1] 11 19 21 46 18
Levels: 11 18 19 21 46
Warning message:
In `[<-.factor`(`*tmp*`, 4, value = 29) :
invalid factor level, NA generated
[1] 11 19 21 <NA> 18
Levels: 11 18 19 21 46
And we get an error: invalid factor level, NA generated.
To solve this invalid factor level error, we need to add the value to the level first.
rv <- c(11, 19, 21, 46, 18)
rf <- factor(rv)
print(rf)
levels(rf) <- c(levels(rf), "29")
rf[4] <- 29
print(rf)
Output
[1] 11 19 21 46 18
Levels: 11 18 19 21 46
[1] 11 19 21 29 18
Levels: 11 18 19 21 46 29/* Your code... */
Generating Factor Levels in R
To generate factor levels, use the gl() function. The gl() function takes two integers as an input, which indicates how many levels and how many times each level.
Syntax
gl(n, k, labels)
Parameters
The following is the description of the parameters used:
- n parameter is the integer giving the number of levels.
- k parameter is the integer giving the number of replications.
labels parameter is a vector of labels for the resulting factor levels.
Example
Let’s generate factor levels.
vf <- gl(3, 3, labels = c("Sydney", "Perth", "Melbourne"))
print(vf)
Output
1] Sydney Sydney Sydney Perth Perth Perth Melbourne
[8] Melbourne Melbourne
Levels: Sydney Perth Melbourne
Changing the Order of Levels
The order of the levels in a factor can be changed by applying the factor function again with the new order of the levels.
rv <- c(11, 19, 21, 46, 18)
rf <- factor(rv)
print(rf)
new_rf <- factor(rf, levels = c(46, 21, 19, 18, 11))
print(new_rf)
Output
[1] 11 19 21 46 18
Levels: 11 18 19 21 46
[1] 11 19 21 46 18
Levels: 46 21 19 18 11
You can see from the output that the orders of levels have been changed.
Factors in Data Frame
On creating any data frame with a column of text data, R treats the text column as categorical data and creates factors on it.
# Create the vectors for data frame.
name <- c("Krunal", "Ankit", "Niva", "Mansi")
age <- c(27, 25, 23, 19)
gender <- c("male", "male", "female", "female")
# Create the data frame.
input_data <- data.frame(name, age, gender)
print(input_data)
# Print the gender column so see the levels.
print(input_data$gender)
Output
name age gender
1 Krunal 27 male
2 Ankit 25 male
3 Niva 23 female
4 Mansi 19 female
[1] "male" "male" "female" "female"
That is it for the R Factor example.