What is as.factor() Function in R

Introduction to Factors in R

Definition of factors in R

R factor is a data type that can store categorical variables.

Factors are stored as a vector of integer values with a related set of character values called the “levels”.

Factors are stored as integers and have labels associated with these unique integers.

Use the factor() function to create a factor in R.

Why factors are essential in data analysis

Factors are helpful in data analysis because they streamline and accelerate the processing of categorical data.

Categorical data represents characteristics or attributes of observations, such as the type of a product, the ethnicity of a person, or the color of clothes.

Factors are also essential because many statistical and machine learning models expect the input to be numeric, and factors provide a suitable way to encode categorical data as numerical values.

What is as.factor in R

The as.factor() is a built-in R function that converts an R object like a vector or a data frame column from numeric to factor.

The as.factor() function takes a vector, column, or data frame as an argument and returns the requested column specified as a factor rather than a numeric one.

Syntax

as.factor(input)

Parameters

input: The as.factor() function takes x as a column in an object of class or data frame.

Return value

It returns the original class object with the requested column specified as a factor rather than a numeric.

Usage

  1. The as.factor() function can convert a character vector to a factor.
  2. The as.factor() function can convert a numeric vector to a factor.
  3. It specifies the levels of a factor.
  4. It reorders the levels of a factor.

Coding implementation

Let’s define a character vector using the c() function.

data <- c("m", "l", "a")

as.factor(data)

Output

[1] m l a
Levels: a l m

How to use as.factor in R

Using as.factor() function to the character object containing numbers

Use the as.factor() function to the numeric vector to a factor and see the output.

data <- c(1.1, 11, 2.2, 19, 21)

as.factor(data)

Output

[1] 1.1 11 2.2 19 21
Levels: 1.1 2.2 11 19 21

Example of converting a character vector to a factor

You can convert a character vector to a factor using the as.factor() function. 

The as.factor() function takes a vector of character values and returns the factor.

data <- c("zack", "synder", "cut")

as.factor(data)

Output

[1] zack synder cut
Levels: cut synder zack

Example of converting a factor to a character vector

Use the as.character() function to convert a factor to a character vector.

data <- c("zack", "synder", "cut")

factr <- as.factor(data)

chr <- as.character(factr)

chr

Output

1] "zack" "synder" "cut"

Using the as.numeric() function, you can convert a factor to numbers.

data <- c("zack", "synder", "cut")

factr <- as.factor(data)

intr <- as.numeric(factr)

intr

Output

[1] 3 2 1

We used an as.numeric() function to convert a factor to a numeric vector. You can see in the output that the numeric codes correspond to the factor levels. For example, “zack corresponds to 3, “snyder corresponds to 2, and “cut” fits 1.

Example of using as.factor() function to a data frame

You can use the as.factor() function to convert a specific data frame column to a factor.

df <- data.frame(Singer = c("MJ", "Justin", "Drake", "Selena", "Rema", "Ed"),
                    Age = c(64, 30, 40, 30, 25, 38))

df$Singer <- as.factor(df$Singer)

print(df$Singer)

Output

[1] MJ Justin Drake Selena Rema Ed
Levels: Drake Ed Justin MJ Rema Selena

In this example, we created a data frame using the data.frame() function which has two columns.

  1. Singer
  2. Age

We converted the “Singer” column to the factor using the as.factor() function and printed the factor with six levels.

In-depth look at as.factor() function and its parameters

Changing the levels of a factor

The levels() function provides access to the levels attribute of a variable.  Use the levels() function along with a factor() function to change the levels of a factor in R.

char_vec <- c("k", "b", "l", "c", "n", "d")
factr <- as.factor(char_vec)

levels(factr) <- c("k", "b", "l", "f", "d", "m", "n")
factr

Output

[1] f k d b m l
Levels: k b l f d m n

We modified the factor levels using the levels() function in this example by assigning a new levels vector.

The levels() function accepts the new levels in the form of a vector and returns the new levels when we print the values of new levels of that factor.

Reordering the levels of a factor

Use the relevel() function along with the factor() function to reorder the levels of a factor in R.

char_vec <- c("k", "b", "l", "c", "n", "d")
factr <- as.factor(char_vec)

factr <- relevel(factr, ref = "b")
factr

Output

[1] k b l c n d
Levels: b c d k l n

In this code example, the level “b” is moved to the front of the factor using the relevel() function.

The resulting factor has levels “b”, “c”, “d”, “k”, “l”, and “n”.

Combining multiple factors into a single factor

You can use the as.factor() function in combination with the c() function to combine multiple factors into a single factor in R.

char_vec_one <- c("k", "b", "l")
char_vec_two <- c("c", "n", "d", "s")

factor_one <- as.factor(char_vec_one)
factor_two <- as.factor(char_vec_two)

combined_factor <- as.factor(c(factor_one, factor_two))
combined_factor

Output

[1] k b l c n d s
Levels: b k l c d n s

After running the above code, we get the combined factor, and if you see the values and levels of that factor, you will see that it is the combination of both factor_one and factor_two.

The resulting factor “combined_factor” has seven levels: “b”, “k”, “l”, “c”, “d”, “n” and “s”.

Performing advanced operations on factors

Splitting a factor into multiple factors

You can use the split() function in combination with the unlist() function to split a factor into multiple factors in R.

char_vec_one <- c("k", "b", "l", "s", "d", "n")

factor_one <- as.factor(char_vec_one)

split <- split(unlist(factor_one), rep(1:2, c(3, 3)))

split

Output

$`1`
[1] k b l
Levels: b d k l n s

$`2`
[1] s d n
Levels: b d k l n s

In this code, we splitted a factor into two factors using the split() and unlist() functions.

Each element will be a factor with three levels: “k”, “b”, and “l” for the first element, and “s”, “d”, and “n” for the second element.

Removing unused levels from a factor

You can use the droplevels() function to remove unused levels from a factor in R. It will remove any levels that do not exist in the factor.

char_vec_one <- c("k", "b", "l", "s", "d", "n")

factor_one <- factor(char_vec_one, level = c("k", "b", "l", "s", "d", "n", "f", "m"))

drop <- droplevels(factor_one)

drop

Output

[1] k b l s d n
Levels: k b l s d n

In this example, the factor_one has levels “k”, “b”, “l”, “s”, “d”, “n”, “f” and “m”.

The levels “f” and “m” don’t exist in the factor, so they are removed by the droplevels() function.

The resulting factor “drop” has only three levels: “k”, “b”, “l”, “s”, “d” and “n”.

Conclusion

The as.factor() function is a wrapper for factor, allowing quick return if the input vector is already a factor.

Use the as.factor() function is helpful to convert a numeric or character vector to a factor.

Leave a Comment