If you are working with a data frame where you need to perform mathematical operations like addition and subtraction on a column, that column is of type character, but the data is numbers. What will you do? This is where you need to convert that specific column to numeric.
Here are four ways to convert a data frame column from character to numeric in R:
- Using transform() with as.numeric()
- Using as.numeric() with $ operator
- Using as.numeric() with [] operator
- Using dplyr::mutate()
Method 1: Using the transform() with as.numeric()
When you are working with a data frame, with the help of the “transform()” function, use the as.numeric() function to convert a character vector to a numeric vector.
While converting, if you encounter character strings that do not represent numeric values, as.numeric() function will return NA.
df <- data.frame(
COL1 = c("1", "2", "3", "4"),
COL2 = c("11", "19", "21", "46")
)
# Print the original data frame
print("Original data frame:")
print(df)
# Convert the columns to numeric using transform() and as.numeric()
df_numeric <- transform(df,
COL1 = as.numeric(COL1),
COL2 = as.numeric(COL2)
)
# Print the converted data frame
print("Converted data frame:")
print(df_numeric)
sapply(df_numeric, class)
Output
You can see from the above figure that “COL2” is a type of character, and after applying a transform() and as.numeric(), we converted its type to numeric.
Pros
- It is a base function, so we don’t need to load an external package.
- Using the transform() function, you can modify a single column or multiple columns.
Cons
- It is less flexible than the dplyr::mutate() function for conditional transformations.
When to use
- When you want to convert columns quickly.
Converting multiple columns from character to numeric
Using the transform() function, we can convert multiple columns in one go with the help of as.numeric() function.
# Sample data frame with character columns
df <- data.frame(
COL1 = c("1", "2", "3", "4"),
COL2 = c("11", "19", "21", "46"),
COL3 = c("1.5", "2.7", "3.9", "4.1") # Added a column with decimal values
)
# Print the original data frame and its structure
print("Original data frame:")
print(df)
str(df) # Check the structure
# Convert multiple columns to numeric using transform() and as.numeric()
df_numeric <- transform(df,
COL1 = as.numeric(COL1),
COL2 = as.numeric(COL2),
COL3 = as.numeric(COL3)
)
# Print the converted data frame and its structure
print("Converted data frame:")
print(df_numeric)
str(df_numeric) # Check the structure
# Verify the class of each column using sapply()
print("Column classes:")
sapply(df_numeric, class)
Output
Method 2: Using as.numeric() with $ operator
Using the $ operator, you can select the “column” of a data frame and convert that column into a numeric using the “as.numeric()” function.
df <- data.frame(
COL1 = c("1", "2", "3", "4"),
COL2 = c("11", "19", "21", "46")
)
# Print the original data frame
print("Original data frame:")
print(df)
df$COL1 <- as.numeric(df$COL1)
# Print the converted data frame
print("Converted data frame:")
print(df)
sapply(df, class)
Output
Pros
- The syntax is very concise and straightforward.
- It does not require any external packages.
Cons
- You can only modify one column at a time.
- Less suitable for complex transformations.
When to use
- You can use this approach when you must minimize the dependencies.
Method 3: Using as.numeric() with [] operator
You can use the [] operator, which is helpful if you want to refer to columns by name or index more dynamically.
df <- data.frame(
COL1 = c("1", "2", "3", "4"),
COL2 = c("11", "19", "21", "46")
)
# Print the original data frame
print("Original data frame:")
print(df)
df[, "COL1"] <- as.numeric(df[, "COL1"])
# Print the converted data frame
print("Converted data frame:")
print(df)
sapply(df, class)
Output
Pros
- It requires no external dependencies.
- You can convert single or multiple columns by specifying a vector of column names or indices.
Cons
- This approach is not reliable for complex transformations.
When to use
- The main use case is when you must convert a small number of columns.
Method 4: Using mutate() function from the dplyr package
The mutate() function helps modify the columns, and you need to use mutate() with the as.numeric() function to convert specific columns into numeric.
library(dplyr)
df <- data.frame(
COL1 = c("1", "2", "3", "4"),
COL2 = c("11", "19", "21", "46")
)
# Print the original data frame
print("Original data frame:")
print(df)
df <- df %>%
mutate(COL1 = as.numeric(COL1))
# Print the converted data frame
print("Converted data frame:")
print(df)
sapply(df, class)
Output
Remember to handle potential issues with factors when converting columns to numeric. For example, if a column is a factor, convert it to a character and then to a numeric.
data$COL1 <- as.numeric(as.character(data$COL1))
Pros
- It is highly readable and expressive syntax
- It allows for conditional transformations using ifelse() or case_when().
- I highly recommend this approach because it is efficient when we need complex transformations.
Cons
- It requires an external “dplyr” package.
- It is slightly less performant than base R methods for straightforward conversions.
When to use
- When you are already using the dplyr package for data manipulation.
- When code readability and maintainability are a priority.
That’s it!
Krunal Lathiya is a seasoned Computer Science expert with over eight years in the tech industry. He boasts deep knowledge in Data Science and Machine Learning. Versed in Python, JavaScript, PHP, R, and Golang. Skilled in frameworks like Angular and React and platforms such as Node.js. His expertise spans both front-end and back-end development. His proficiency in the Python language stands as a testament to his versatility and commitment to the craft.