The as.factor() function in R converts a vector object into a factor. Factors store unique values as levels and are essential for statistical modeling, data visualization, and ensuring the proper handling of categorical data in analyses.
The above figure perfectly describes the conversion from vector to factor.
The main difference between factor() and as.factor() is that as.factor() does not allow us to specify the level order or ordinality, whereas the factor() function allows us.
If your input data set is vast, it will take time for level management.
Categorical data is data that is divided into distinct groups and categories. For example, colors (red, green, blue), brands (Nike, Puma, Asics), or types of cuisine (Italian, Chinese, Indian) are categories.
As you can see in the above figure, we defined two types of data:
as.factor(input)
Name | Description |
input (required) | It is a vector object. |
It returns a factor object.
Let’s define a vector of floating values and convert it into a factor.
mixed_vec <- c(1.1, 11, 2.2, 19)
as.factor(mixed_vec)
# Output:
# [1] 1.1 11 2.2 19
# Levels: 1.1 2.2 11 19
Let’s define a vector of characters and ensure that the character vector is treated as a categorical variable in analyses.
char_vec <- c("zack", "john", "jian")
as.factor(char_vec)
# Output:
# [1] zack john jian
# Levels: jian john zack
When it comes to a data frame, we need to apply the as.factor() method to the individual columns of it. So, let’s say we have a dataframe that has a ‘grade’ column; we will apply the method to that column, which is essentially a vector.
df <- data.frame(
name = c("Krunal", "Ankit", "Rushabh"),
score = c(85, 90, 78),
subject = c("Math", "Math", "History"),
grade = c("10th", "11th", "11th")
)
df$grade <- as.factor(df$grade)
print(df$grade)
# Output:
# [1] 10th 11th 11th
# Levels: 10th 11th
There are only two unique values in logic: TRUE and FALSE. Therefore, the levels would be limited to TRUE and FALSE.
flags <- c(TRUE, FALSE, TRUE)
factor_flags <- as.factor(flags)
print(factor_flags)
# Output:
# [1] TRUE FALSE TRUE
# Levels: FALSE TRUE
What if the input vector contains a missing value, such as NA? In that case, NA is preserved as a factor value but not at a level.
survey <- c("Yes", "No", NA, "Yes")
factor_survey <- as.factor(survey)
print(factor_survey)
# Output:
# [1] Yes No <NA> Yes
# Levels: No Yes
That’s all!
Krunal Lathiya is a seasoned Computer Science expert with over eight years in the tech industry. He boasts deep knowledge in Data Science and Machine Learning. Versed in Python, JavaScript, PHP, R, and Golang. Skilled in frameworks like Angular and React and platforms such as Node.js. His expertise spans both front-end and back-end development. His proficiency in the Python language stands as a testament to his versatility and commitment to the craft.
The rnorm() method in R generates random numbers from a normal (Gaussian) distribution, which is…
R cbind (column bind) is a function that combines specified vectors, matrices, or data frames…
The rbind() function combines R objects, such as vectors, matrices, or data frames, by rows.…
The as.numeric() function in R converts valid non-numeric data into numeric data. What do I…
The log() function calculates the natural logarithm (base e) of a numeric vector. By default,…
In R, you can use the dollar sign ($ operator) to access elements (columns) of…