Whether you want to add new data to your existing datasets or create new variables based on existing ones, you need to add columns to the existing Data Frame.
Let’s understand how this works behind the scenes. In the Data Frame, each column represents a vector. To add a column, we just need to assign a new vector to a new column name. For multiple columns, we must assign multiple vectors to multiple new columns.
Here are the five ways to add single or multiple columns to a data frame in R:
- Using $ operator
- Using square ([ ]) notation
- Using cbind()
- Using tidyverse::add_column()
- Using dplyr::mutate()
Method 1: Using the $ operator
Define a new vector and assign values to that vector. This new vector will work as column values for a new column. Using the $ operator, create a new column name and assign the new vector as a value to that column.
It’s important that the length of the new vector is the same as the number of rows in the data frame. This creates a new column in your existing data frame, and the values in the new vector are placed into this new column, row by row.
Syntax
df$new_column <- vector
Example
df <- data.frame(
col1 = c(1, 2, 3),
col2 = c(4, 5, 6),
col3 = c(7, 8, 9)
)
df
new_col <- c(10, 11, 12)
df$col4 <- new_col
df
Output
The above figure shows that we added a new column, col4, with the values 10, 11, and 12 added to the data frame df.
Pros
- It provides a simple and cleaner syntax to add a single column.
- It does not create a new copy of the data frame, and it directly modifies the original data frame, making it fast and efficient.
Cons
- It cannot dynamically generate any new variables (column names).
- It does not provide any way to add multiple columns.
Method 2: Using Square Brackets ([])
Another way is to use square brackets. Like df[“new_col”] <- vector. Here, inside the square brackets, you need to define your new column name, which you want to add into a data frame df, and vector is the collection of column values.
Ensure new columns have the same number of rows as the data frame.
If your new column name is the same as the existing column name, it will be replaced with a new column name. So, verify your column names before executing the operation.
Syntax
df[["new_column"]] <- vector
# Or
df["new_column"] <- vector
Example
df <- data.frame(
col1 = c(1, 2, 3),
col2 = c(4, 5, 6),
col3 = c(7, 8, 9)
)
df
new_col <- c(10, 11, 12)
df["col4"] <- new_col
df
Output
The above figure shows that by specifying the name of the new column “col4” within the brackets on the left side of the assignment, new_col is assigned as the value for that column. The final data frame is a new column appended.
Pros
- It allows dynamic column names (e.g., df[[var_name]] <- vector).
Cons
- You cannot add multiple columns with this approach in one go.
- It is slightly more verbose than $ for simple cases.
Method 3: Using cbind()
The basic function of cbind() is to combine data frames by columns. So, if I use the cbind() function with a data frame and another data frame or a vector, it will effectively add new columns to the original data frame.
We can add single or multiple columns to the data frame using the cbind() function.
Syntax
df <- cbind(df, new_col1, new_col2)
Adding a single column
df <- data.frame(
col1 = c(1, 2, 3),
col2 = c(4, 5, 6),
col3 = c(7, 8, 9)
)
df
col4 <- c(10, 11, 12)
df <- cbind(df, col4)
df
Output
From the above output figure, we can see that we combined a data frame with a vector by columns using the cbind() function to append a single column (“col4”).
Adding multiple columns
Let’s define three vectors (columns) that will be added to an existing data frame by columns using the cbind() function.
df <- data.frame(
col1 = c(1, 2, 3),
col2 = c(4, 5, 6),
col3 = c(7, 8, 9)
)
df
col4 <- c(10, 11, 12)
col5 <- c(21, 41, 51)
col6 <- c(71, 81, 91)
df <- cbind(df, col4, col5, col6)
df
Output
In the above output figure, you can see that we added multiple columns “col4”, “col5”, and “col6” to the data frame.
Pros
- Concise method for adding multiple columns at once.
- It is compatible with matrices, vectors, or lists.
- If you want to merge data frames, it is highly efficient.
Cons
- It creates a new data frame which sometimes is an overhead.
- You must define column names explicitly.
Method 4: Using add_column() function from tidyverse
The tidyverse::add_column() function allows us to insert columns at specific positions of the data frame.
Syntax
library(tidyverse)
df <- df %>% add_column(new_col = vector, .before = 1, .after = 2, .name_repair = "check_unique")
Example
library(tidyverse)
df <- data.frame(
col1 = c(1, 2, 3),
col2 = c(4, 5, 6),
col3 = c(7, 8, 9)
)
df
col4 <- c(10, 11, 12)
df <- add_column(df, col4 = c(10, 11, 12), .after = "col3")
df
Output
The above visual represent shows that we have added “col4” after “col3” of the data frame by specifying the .after argument.
Pros
- You can use .before or .after arguments to control where the new column is added.
- Works seamlessly with Tibbles.
- Automatically recycles shorter vectors to match the data frame’s row count.
- It supports tidy evaluation (e.g., using {{ }} or .data pronouns).
Cons
- It requires the “tidyverse” or “dplyr” package to be installed in your R environment.
- It converts data frames into tibbles, which you must keep in mind.
- It creates a modified copy of a data frame, so it is not memory-efficient for large data sets.
Method 5: Using dplyr::mutate()
The dplyr::mutate() method allows us to append single or multiple columns at once while preserving existing ones. So if I load dplyr, I can do df %>% mutate(new_col1 = vec1, new_col2 = vec2). That would add both columns in one go.
Syntax
library(dplyr)
df <- df %>% mutate(new_col1 = vec1, new_col2 = vec2)
Example
library(dplyr)
df <- data.frame(
col1 = c(1, 2, 3),
col2 = c(4, 5, 6),
col3 = c(7, 8, 9)
)
df
col4 <- c(10, 11, 12)
df <- df %>% mutate(col4)
df
Output
You can see from the above picture that we mutated an existing data frame to add “col4” using the dplyr::mutate() method.
Pros
- It provides intuitive and cleaner syntax.
- It supports dynamic column names with := and glue syntax.
Cons
- It requires installing and loading the dplyr package.
- It can be overkill for simple tasks because it does not make sense to load.
- You can seamlessly work with remote databases (via dbplyr).
That’s it!
Krunal Lathiya is a seasoned Computer Science expert with over eight years in the tech industry. He boasts deep knowledge in Data Science and Machine Learning. Versed in Python, JavaScript, PHP, R, and Golang. Skilled in frameworks like Angular and React and platforms such as Node.js. His expertise spans both front-end and back-end development. His proficiency in the Python language stands as a testament to his versatility and commitment to the craft.