What is String as factor in R

The data frames in R can have different data types. But each column should have the same type of data. To create a data frame in R, use the data.frame() function. A data frame is a table or a two-dimensional array-like structure.

By default, R converts character strings to factors while creating data frames directly with data.frame() function.

What is String as factor in R

The stringsAsFactors in R is an argument of the data.frame() function. While creating a data frame, there is an argument of stringsAsFactors. The “stringsAsFactors” is a logical argument that suggests whether the strings in a data frame should be treated as factor variables or as just plain strings.

netflix_data <- data.frame(
  show_id = c(1:4),
  show_name = c("Cabinet of Curiosities", "Stranger Things", "Rick and Morty", "Locke and Key"),
  seasons = c(1, 4, 6, 3),
  stringsAsFactors = FALSE
)

print(netflix_data)

Output

  show_id   show_name             seasons
1    1     Cabinet of Curiosities   1
2    2     Stranger Things          4
3    3     Rick and Morty           6
4    4     Locke and Key            3

We used the stringAsFactors = FALSE as we plan to change the type of strings we will use in the data frame.

The strings are read by default as factors in R which means your data is stored effectively because each unique string gets a number, and whenever it’s used in the data frame, you can store its numerical value.

If you assign any value to that column that is not in the list of factor strings, you will get an error.

To avoid the conversion of strings to factors in R while using the base R function, use the stringsAsFactors = FALSE.

Conclusion

The default behavior of R when creating data frames is to convert all characters into factors. To prevent converting all characters into factors in R, use the stringsAsFactors = FALSE.

That’s it.

Leave a Comment