Factors are the list of unique values that are stored as integers. Factors are beneficial in data analysis for statistical modeling. Factors are data structures used for fields that take only predefined, a finite number of values (categorical data).

**R Factor**

Factors in R are the data objects used to categorize the data and store it as levels. Factors can store both strings and integers. Factors can be ordered or unordered and are an important class for statistical analysis and for plotting.

Factors are saved as integers and have labels associated with these unique integers. While factors look like character vectors, they are actually integers under the hood, and you need to be careful while treating them like strings.

**How to Create Factor in R**

To create a Factor in R, use the factor() method. The factor() method takes a vector as an input and returns the factor. The factor() function is used to encode a vector as a factor. If the argument ordered is **TRUE**, the factor levels are considered to be ordered. For compatibility with **S**, there is also a function ordered.

**Syntax**

`fct = factor(x = character(), levels, labels = levels, exclude = NA, ordered = is.ordered(x), nmax = NA)`

**Parameters**

The factor() function takes the following parameters.

**x**

takes **x **as a vector as a parameter.

**levels**

It is an optional vector of the unique values that x might have taken. The default is the unique set of values taken by as.character(x), sorted into increasing order of x.

**labels**

It is either an optional character vector of labels for the levels or a character string of length 1. Duplicated values in labels can be used to map different values of x to the same factor level.

**exclude**

The exclude is a vector of values to be excluded when establishing the set of levels. This may be a factor with the same level set as x or should be a character.

**ordered**

It is a logical flag to decide if the levels should be regarded as ordered (in the given order).

**nmax**

It is an upper bound on the number of levels.

**Return Value**

It returns the factor.

**Example**

Let’s define a vector and then use the factor() function to create a factor from the vector.

```
// Pro.R
rv <- c(11, 19, 21, 46, 18, 21)
print(rv)
print(typeof(rv))
print("After converting to factor")
rf <- factor(rv)
print(rf)
print(typeof(rf))
```

**Output**

```
Rscript Pro.R
[1] 11 19 21 46 18 21
[1] "double"
[1] "After converting to factor"
[1] 11 19 21 46 18 21
Levels: 11 18 19 21 46
[1] "integer"
```

You can see that it returns the levels of the factor.

**How to check factor in R**

To check if the factor is created, use the **is.factor() **method**.**

```
rv <- c(11, 19, 21, 46, 18, 21)
print(rv)
print(is.factor(rv))
print("After converting to factor")
rf <- factor(rv)
print(rf)
print(is.factor(rf))
```

When we execute the above code, it produces the following result.

```
[1] 11 19 21 46 18 21
[1] FALSE
[1] "After converting to factor"
[1] 11 19 21 46 18 21
Levels: 11 18 19 21 46
[1] TRUE
```

When we pass a vector to the **is.factor()** function, it returns **FALSE, **but when we pass the factor to the **is.factor()** method, it returns **TRUE. **That means we have successfully created a factor.

The **is.factor()**, **is.ordered()**, **as.factor(),** and **as.ordered()** are the membership and coercion functions for these classes.

**How to access elements of a factor**

To access elements of a factor, use indexing. Factor indexing is starting from 1.

```
rv <- c(11, 19, 21, 46, 18, 21)
rf <- factor(rv)
print(rf[3])
```

**Output**

```
Rscript Pro.R
[1] 21
Levels: 11 18 19 21 46
```

You can access the multiple components bypassing the vector as an index.

```
rv <- c(11, 19, 21, 46, 18, 21)
rf <- factor(rv)
print(rf[c(2, 3)])
```

**Output**

```
[1] 19 21
Levels: 11 18 19 21 46
```

Pass the negative index to select all the components except the negative index component.

```
rv <- c(11, 19, 21, 46, 18, 21)
rf <- factor(rv)
print(rf[-4])
```

**Output**

```
[1] 11 19 21 18 21
Levels: 11 18 19 21 46
```

You can also pass the logical vector as an index.

```
rv <- c(11, 19, 21, 46, 18, 21)
rf <- factor(rv)
print(rf[c(FALSE, TRUE, TRUE, TRUE, TRUE, FALSE)])
```

**Output**

```
[1] 19 21 46 18
Levels: 11 18 19 21 46
```

If the index is **TRUE,** then the component will include in the output; otherwise, it will not.

**How to modify a factor in R**

To modify a factor in R, use the assignment (<-) operator. However, we cannot choose values outside of its predefined levels.

```
rv <- c(11, 19, 21, 46, 18)
rf <- factor(rv)
print(rf)
rf[4] <- 11
print(rf)
```

**Output**

```
[1] 11 19 21 46 18
Levels: 11 18 19 21 46
[1] 11 19 21 11 18
Levels: 11 18 19 21 46
```

You can see that I have modified the 4th component from 46 to 11, but it is modified within the level values. We cannot assign values outside levels.

```
rv <- c(11, 19, 21, 46, 18)
rf <- factor(rv)
print(rf)
rf[4] <- 29
print(rf)
```

**Output**

```
[1] 11 19 21 46 18
Levels: 11 18 19 21 46
Warning message:
In `[<-.factor`(`*tmp*`, 4, value = 29) :
invalid factor level, NA generated
[1] 11 19 21 <NA> 18
Levels: 11 18 19 21 46
```

And we get an error: `invalid factor level, NA generated.`

To solve this **invalid factor level error**, we need to add the value to the level first.

```
rv <- c(11, 19, 21, 46, 18)
rf <- factor(rv)
print(rf)
levels(rf) <- c(levels(rf), "29")
rf[4] <- 29
print(rf)
```

**Output**

```
[1] 11 19 21 46 18
Levels: 11 18 19 21 46
[1] 11 19 21 29 18
Levels: 11 18 19 21 46 29/* Your code... */
```

**Generating Factor Levels in R**

To generate factor levels, use the **gl()** function. The **gl()** function takes two integers as an input, which indicates how many levels and how many times each level.

**Syntax**

```
gl(n, k, labels)
```

**Parameters**

The following is the description of the parameters used:

**n**parameter is the integer giving the number of levels.**k**parameter is the integer giving the number of replications.**labels**parameter is a vector of labels for the resulting factor levels.

**Example**

Let’s generate factor levels.

```
vf <- gl(3, 3, labels = c("Sydney", "Perth", "Melbourne"))
print(vf)
```

**Output**

```
1] Sydney Sydney Sydney Perth Perth Perth Melbourne
[8] Melbourne Melbourne
Levels: Sydney Perth Melbourne
```

**Changing the Order of Levels**

The order of the levels in a factor can be changed by applying the factor function again with the new order of the levels.

```
rv <- c(11, 19, 21, 46, 18)
rf <- factor(rv)
print(rf)
new_rf <- factor(rf, levels = c(46, 21, 19, 18, 11))
print(new_rf)
```

**Output**

```
[1] 11 19 21 46 18
Levels: 11 18 19 21 46
[1] 11 19 21 46 18
Levels: 46 21 19 18 11
```

You can see from the output that the orders of levels have been changed.

**Factors in Data Frame**

On creating any data frame with a column of text data, R treats the text column as categorical data and creates factors on it.

```
# Create the vectors for data frame.
name <- c("Krunal", "Ankit", "Niva", "Mansi")
age <- c(27, 25, 23, 19)
gender <- c("male", "male", "female", "female")
# Create the data frame.
input_data <- data.frame(name, age, gender)
print(input_data)
# Print the gender column so see the levels.
print(input_data$gender)
```

**Output**

```
name age gender
1 Krunal 27 male
2 Ankit 25 male
3 Niva 23 female
4 Mansi 19 female
[1] "male" "male" "female" "female"
```

That is it for the R Factor example.