A **factor** in **R** is a **“data structure that stores categorical data, such as gender, country, marital status, etc”**. **Factors **are the unique values that categorize and store the data as levels.

**How to Create Factor in R**

To **create** a **factor** in **R**, use the **“factor()”** method. The factor() method takes a vector as an input and returns the factor. It encodes a vector as a factor. If the argument ordered is **TRUE**, the factor levels are considered to be ordered. For compatibility with **S**, there is also a function ordered.

**Syntax**

```
fct = factor(x = character(), levels, labels = levels,
exclude = NA, ordered = is.ordered(x), nmax = NA)
```

**Parameters**

The factor() function takes the following parameters.

**x:**takes**x**as a vector as a parameter.**levels:**It is an optional vector of the unique values that x might have taken. The default is the unique set of values taken by as.character(x), sorted into increasing order of x.**labels:**It is either an optional character vector of labels for the levels or a character string of length 1. Duplicated values in labels can be used to map different values of x to the same factor level.**exclude:**The exclude is a vector of values to be excluded when establishing the set of levels. This factor may be with the same level set as x or should be a character.**ordered:**It is a logical flag to decide if the levels should be regarded as ordered (in the given order).**nmax:**It is an upper bound on the number of levels.

**Return Value**

It returns the factor.

**Example**

Factors are saved as integers and have labels associated with these unique integers. While factors look like character vectors, they are integers under the hood, and you need to be careful while treating them like strings.

Let’s define a vector and then use the **factor()** function to create a factor from the vector.

```
rv <- c(11, 19, 21, 46, 18, 21)
print(rv)
print(typeof(rv))
print("After converting to factor")
rf <- factor(rv)
print(rf)
print(typeof(rf))
```

**Output**

```
[1] 11 19 21 46 18 21
[1] "double"
[1] "After converting to factor"
[1] 11 19 21 46 18 21
Levels: 11 18 19 21 46
[1] "integer"
```

You can see that it returns the levels of the factor.

**How to check factors in R**

To check if the variable is a vector, you can use the **“is.factor()” **method.

```
rv <- c(11, 19, 21, 46, 18, 21)
print(rv)
print(is.factor(rv))
print("After converting to factor")
rf <- factor(rv)
print(rf)
print(is.factor(rf))
```

When we execute the above code, it produces the following result.

```
[1] 11 19 21 46 18 21
[1] FALSE
[1] "After converting to factor"
[1] 11 19 21 46 18 21
Levels: 11 18 19 21 46
[1] TRUE
```

When we pass a vector to the **is.factor()** function returns **FALSE, **but when we pass the factor to the **is.factor()** method, it returns **TRUE. **That means we have successfully created a factor.

The **is.factor()**, **is.ordered()**, **as.factor(),** and **as.ordered()** are the membership and coercion functions for these classes.

**How to access elements of a factor in R**

To access elements of a factor, use indexing. Factor indexing starts from 1.

```
rv <- c(11, 19, 21, 46, 18, 21)
rf <- factor(rv)
print(rf[3])
```

**Output**

```
[1] 21
Levels: 11 18 19 21 46
```

You can access multiple components bypassing the vector as an index.

```
rv <- c(11, 19, 21, 46, 18, 21)
rf <- factor(rv)
print(rf[c(2, 3)])
```

**Output**

```
[1] 19 21
Levels: 11 18 19 21 46
```

Pass the negative index to select all the components except the negative index component.

```
rv <- c(11, 19, 21, 46, 18, 21)
rf <- factor(rv)
print(rf[-4])
```

**Output**

```
[1] 11 19 21 18 21
Levels: 11 18 19 21 46
```

You can also pass the logical vector as an index.

```
rv <- c(11, 19, 21, 46, 18, 21)
rf <- factor(rv)
print(rf[c(FALSE, TRUE, TRUE, TRUE, TRUE, FALSE)])
```

**Output**

```
[1] 19 21 46 18
Levels: 11 18 19 21 46
```

If the index is **TRUE,** then the component will include in the output; otherwise, it will not.

**How to modify a factor in R**

To modify a factor in R, use the **“assignment (<-)”** operator. However, we cannot choose values outside of its predefined levels.

```
rv <- c(11, 19, 21, 46, 18)
rf <- factor(rv)
print(rf)
rf[4] <- 11
print(rf)
```

**Output**

```
[1] 11 19 21 46 18
Levels: 11 18 19 21 46
[1] 11 19 21 11 18
Levels: 11 18 19 21 46
```

You can see that I have modified the 4th component from 46 to 11, but it is modified within the level values. We cannot assign values outside levels.

```
rv <- c(11, 19, 21, 46, 18)
rf <- factor(rv)
print(rf)
rf[4] <- 29
print(rf)
```

**Output**

```
[1] 11 19 21 46 18
Levels: 11 18 19 21 46
Warning message:
In `[<-.factor`(`*tmp*`, 4, value = 29) :
invalid factor level, NA generated
[1] 11 19 21 <NA> 18
Levels: 11 18 19 21 46
```

And we get an error: `invalid factor level, NA generated.`

To fix the **invalid factor level error**, we need to add the value to the level first.

```
rv <- c(11, 19, 21, 46, 18)
rf <- factor(rv)
print(rf)
levels(rf) <- c(levels(rf), "29")
rf[4] <- 29
print(rf)
```

**Output**

```
[1] 11 19 21 46 18
Levels: 11 18 19 21 46
[1] 11 19 21 29 18
Levels: 11 18 19 21 46 29
```

**Generating Factor Levels in R**

To generate factor levels, use the **gl()** function. The **gl()** function takes two integers as input, which indicates how many levels and how many times each level.

**Syntax**

```
gl(n, k, labels)
```

**Parameters**

The following is the description of the parameters used:

**n**parameter is the integer giving the number of levels.**k**parameter is the integer giving the number of replications.-
**labels**parameter is a vector of labels for the resulting factor levels.

**Example**

Let’s generate factor levels.

```
vf <- gl(3, 3, labels = c("Sydney", "Perth", "Melbourne"))
print(vf)
```

**Output**

```
1] Sydney Sydney Sydney Perth Perth Perth Melbourne
[8] Melbourne Melbourne
Levels: Sydney Perth Melbourne
```

**Changing the Order of Levels**

The order of the levels in a factor can be changed by applying the factor function again with the new order of the levels.

```
rv <- c(11, 19, 21, 46, 18)
rf <- factor(rv)
print(rf)
new_rf <- factor(rf, levels = c(46, 21, 19, 18, 11))
print(new_rf)
```

**Output**

```
[1] 11 19 21 46 18
Levels: 11 18 19 21 46
[1] 11 19 21 46 18
Levels: 46 21 19 18 11
```

You can see from the output that the orders of levels have been changed.

**Factors in Data Frame**

When creating any data frame with a text data column, R treats the text column as categorical data and creates factors on it.

```
# Create the vectors for data frame.
name <- c("Krunal", "Ankit", "Niva", "Mansi")
age <- c(27, 25, 23, 19)
gender <- c("male", "male", "female", "female")
# Create the data frame.
input_data <- data.frame(name, age, gender)
print(input_data)
# Print the gender column so see the levels.
print(input_data$gender)
```

**Output**

```
name age gender
1 Krunal 27 male
2 Ankit 25 male
3 Niva 23 female
4 Mansi 19 female
[1] "male" "male" "female" "female"
```

That’s it.

Krunal Lathiya is a Software Engineer with over eight years of experience. He has developed a strong foundation in computer science principles and a passion for problem-solving. In addition, Krunal has excellent knowledge of Data Science and Machine Learning, and he is an expert in R Language.