R Basic

as.factor() in R: Converting a Vector to Categorical Data

The as.factor() function in R converts a vector object into a factor. Factors store unique values as levels and are essential for statistical modeling, data visualization, and ensuring the proper handling of categorical data in analyses.

The above figure perfectly describes the conversion from vector to factor.

The main difference between factor() and as.factor() is that as.factor() does not allow us to specify the level order or ordinality, whereas the factor() function allows us.

If your input data set is vast, it will take time for level management.

Categorical data is data that is divided into distinct groups and categories. For example, colors (red, green, blue), brands (Nike, Puma, Asics), or types of cuisine (Italian, Chinese, Indian) are categories.

As you can see in the above figure, we defined two types of data:

  1. Numerical data: It consists of numbers or numeric values.
  2. Categorical data: It consists of labels that can be divided into specific groups or categories, such as mode of transportation and colors.

Syntax

as.factor(input)

Parameters

Name Description
input (required) It is a vector object.

Return value

It returns a factor object.

Converting a numeric vector to a factor

Let’s define a vector of floating values and convert it into a factor.

mixed_vec <- c(1.1, 11, 2.2, 19)

as.factor(mixed_vec)

# Output:
# [1] 1.1 11 2.2 19
# Levels: 1.1 2.2 11 19

Converting a character vector to a factor

Let’s define a vector of characters and ensure that the character vector is treated as a categorical variable in analyses.

char_vec <- c("zack", "john", "jian")

as.factor(char_vec)

# Output:
# [1] zack john jian
# Levels: jian john zack

Converting a data frame column to a factor

When it comes to a data frame, we need to apply the as.factor() method to the individual columns of it. So, let’s say we have a dataframe that has a ‘grade’ column; we will apply the method to that column, which is essentially a vector.

df <- data.frame(
  name = c("Krunal", "Ankit", "Rushabh"),
  score = c(85, 90, 78),
  subject = c("Math", "Math", "History"),
  grade = c("10th", "11th", "11th")
)

df$grade <- as.factor(df$grade)

print(df$grade)

# Output:
# [1] 10th 11th 11th
# Levels: 10th 11th

Logical vector

There are only two unique values in logic: TRUE and FALSE. Therefore, the levels would be limited to TRUE and FALSE.

flags <- c(TRUE, FALSE, TRUE)

factor_flags <- as.factor(flags)

print(factor_flags)

# Output:
# [1] TRUE FALSE TRUE
# Levels: FALSE TRUE

Handling missing values

What if the input vector contains a missing value, such as NA? In that case, NA is preserved as a factor value but not at a level.

survey <- c("Yes", "No", NA, "Yes")

factor_survey <- as.factor(survey)

print(factor_survey)

# Output:
# [1] Yes No <NA> Yes
# Levels: No Yes

That’s all!

Recent Posts

file.rename(): Renaming Single and Multiple Files in R

To rename a file in R, you can use the file.rename() function. It renames a…

4 hours ago

R prop.table() Function

The prop.table() function in R calculates the proportion or relative frequency of values in a…

10 hours ago

exp() Function: Calculate Exponential of a Number in R

The exp() is a built-in function that calculates the exponential of its input, raising Euler's…

11 hours ago

R split() Function: Splitting a Data

The split() function divides the input data into groups based on some criteria, typically specified…

1 week ago

colMeans(): Calculating the Mean of Columns in R Data Frame

The colMeans() function in R calculates the arithmetic mean of columns in a numeric matrix,…

2 weeks ago

rowMeans(): Calculating the Mean of rows of a Data Frame in R

The rowMeans() is a built-in, highly vectorized function in R that computes the arithmetic mean…

3 weeks ago