R Basic # Understanding the as.factor() Function in R: Converting to Categorical Data

Before diving into the as.factor() function, let’s understand categorical data and why it is helpful in statistical analysis.

Categorical data is data that is divided into groups and categories that are distinct from each other. For example, **colors (red, green, blue)**, **brands (Nike, Puma, Asics)**, or types of cuisine (Italian, Chinese, Indian) are categories.

As you can see in the above figure, we defined two types of data:

**Numerical data**: It consists of numbers or numeric values.**Categorical data:**It consists of labels you can divide into specific groups or categories, such as mode of transportation and colors.

The main difference between numerical and categorical data is that numerical data can be ordered meaningfully, whereas categorical data represents characteristics or attributes and is often descriptive in nature.

Factors are helpful in statistical modeling because they help in handling categorical data accurately.

The as.factor() function converts a vector object to a factor in R.

In other words, you have a vector of t-shirt size, and you want R to identify **“Small”, “Medium”,** or **“Large”** as distinct categories rather than just text.

`as.factor(input)`

Name | Description |

input (required) | It is a vector object. |

It returns a **factor object.**

The above figure shows that we convert a numeric vector to a factor, and each unique value of a vector becomes the level of the factor.

```
mixed_vec <- c(1.1, 11, 2.2, 19)
as.factor(mixed_vec)
```

**Output**

```
[1] 1.1 11 2.2 19
Levels: 1.1 2.2 11 19
```

In the above figure, the as.factor() function converts each unique character of a character vector into the level of the factor.

```
char_vec <- c("zack", "john", "jian")
as.factor(char_vec)
```

**Output**

```
[1] zack john jian
Levels: jian john zack
```

The above figure shows that as.factor() function converts a single column of the data frame to a factor.

```
df <- data.frame(
name = c("Krunal", "Ankit", "Rushabh"),
score = c(85, 90, 78),
subject = c("Math", "Math", "History"),
grade = c("10th", "11th", "11th")
)
df$grade <- as.factor(df$grade)
print(df$grade)
```

**Output**

```
[1] 10th 11th 11th
Levels: 10th 11th
```

That’s all!

Krunal Lathiya is a seasoned Computer Science expert with over eight years in the tech industry. He boasts deep knowledge in Data Science and Machine Learning. Versed in Python, JavaScript, PHP, R, and Golang. Skilled in frameworks like Angular and React and platforms such as Node.js. His expertise spans both front-end and back-end development. His proficiency in the Python language stands as a testament to his versatility and commitment to the craft.

Picture this: You are playing Snakes and Ladder and need the dice to roll the…

4 months ago

The copy-on-modify semantics is a memory management technique that modifies one or more objects, copies…

4 months ago

The standard deviation is a measure that tells you how spread out data are in…

11 months ago

Mean means the arithmetic average of a number in mathematics. An average is the sum…

11 months ago

List in R is a data structure that can hold multiple types of elements. You…

11 months ago

The ln() function from the SciViews package calculates the natural log of the input vector.…

1 year ago