R Advanced

Adding Single or Multiple Columns to Data Frame in R

Whether you want to add new data to your existing datasets or create new variables based on existing ones, you need to add columns to the existing Data Frame.

Let’s understand how this works behind the scenes. In the Data Frame, each column represents a vector. To add a column, we just need to assign a new vector to a new column name. For multiple columns, we must assign multiple vectors to multiple new columns.

Here are the five ways to add single or multiple columns to a data frame in R:

  1. Using $ operator
  2. Using square ([ ]) notation
  3. Using cbind()
  4. Using tidyverse::add_column()
  5. Using dplyr::mutate()

Method 1: Using the $ operator

Define a new vector and assign values to that vector. This new vector will work as column values for a new column. Using the $ operator, create a new column name and assign the new vector as a value to that column.

It’s important that the length of the new vector is the same as the number of rows in the data frame. This creates a new column in your existing data frame, and the values in the new vector are placed into this new column, row by row.

Syntax

df$new_column <- vector

Example

df <- data.frame(
  col1 = c(1, 2, 3),
  col2 = c(4, 5, 6),
  col3 = c(7, 8, 9)
)
df
new_col <- c(10, 11, 12)

df$col4 <- new_col
df

Output

The above figure shows that we added a new column, col4, with the values 10, 11, and 12 added to the data frame df.

Pros

  1. It provides a simple and cleaner syntax to add a single column.
  2. It does not create a new copy of the data frame, and it directly modifies the original data frame, making it fast and efficient.

Cons

  1. It cannot dynamically generate any new variables (column names).
  2. It does not provide any way to add multiple columns.

Method 2: Using Square Brackets ([])

Another way is to use square brackets. Like df[“new_col”] <- vector. Here, inside the square brackets, you need to define your new column name, which you want to add into a data frame df, and vector is the collection of column values.

Ensure new columns have the same number of rows as the data frame.

If your new column name is the same as the existing column name, it will be replaced with a new column name. So, verify your column names before executing the operation.

Syntax

df[["new_column"]] <- vector

# Or

df["new_column"] <- vector

Example

df <- data.frame(
  col1 = c(1, 2, 3),
  col2 = c(4, 5, 6),
  col3 = c(7, 8, 9)
)
df

new_col <- c(10, 11, 12)

df["col4"] <- new_col
df

Output

The above figure shows that by specifying the name of the new column “col4” within the brackets on the left side of the assignment, new_col is assigned as the value for that column. The final data frame is a new column appended.

Pros

  1. It allows dynamic column names (e.g., df[[var_name]] <- vector).

Cons

  1. You cannot add multiple columns with this approach in one go.
  2. It is slightly more verbose than $ for simple cases.

Method 3: Using cbind()

The basic function of cbind() is to combine data frames by columns. So, if I use the cbind() function with a data frame and another data frame or a vector, it will effectively add new columns to the original data frame.

We can add single or multiple columns to the data frame using the cbind() function.

Syntax

df <- cbind(df, new_col1, new_col2)

Adding a single column

df <- data.frame(
  col1 = c(1, 2, 3),
  col2 = c(4, 5, 6),
  col3 = c(7, 8, 9)
)
df

col4 <- c(10, 11, 12)

df <- cbind(df, col4)
df

Output

From the above output figure, we can see that we combined a data frame with a vector by columns using the cbind() function to append a single column (“col4”).

Adding multiple columns

Let’s define three vectors (columns) that will be added to an existing data frame by columns using the cbind() function.

df <- data.frame(
  col1 = c(1, 2, 3),
  col2 = c(4, 5, 6),
  col3 = c(7, 8, 9)
)
df

col4 <- c(10, 11, 12)
col5 <- c(21, 41, 51)
col6 <- c(71, 81, 91)

df <- cbind(df, col4, col5, col6)
df

Output

In the above output figure, you can see that we added multiple columns “col4”, “col5”, and “col6” to the data frame.

Pros

  1. Concise method for adding multiple columns at once.
  2. It is compatible with matrices, vectors, or lists.
  3. If you want to merge data frames, it is highly efficient.

Cons

  1. It creates a new data frame which sometimes is an overhead.
  2. You must define column names explicitly.

Method 4: Using add_column() function from tidyverse

The tidyverse::add_column() function allows us to insert columns at specific positions of the data frame.

Syntax

library(tidyverse)

df <- df %>% add_column(new_col = vector, .before = 1, .after = 2, .name_repair = "check_unique")

Example

library(tidyverse)

df <- data.frame(
  col1 = c(1, 2, 3),
  col2 = c(4, 5, 6),
  col3 = c(7, 8, 9)
)
df

col4 <- c(10, 11, 12)

df <- add_column(df, col4 = c(10, 11, 12), .after = "col3")
df

Output

The above visual represent shows that we have added “col4” after “col3” of the data frame by specifying the .after argument.

Pros

  1. You can use .before or .after arguments to control where the new column is added.
  2. Works seamlessly with Tibbles.
  3. Automatically recycles shorter vectors to match the data frame’s row count.
  4. It supports tidy evaluation (e.g., using {{ }} or .data pronouns).

Cons

  1. It requires the “tidyverse” or “dplyr” package to be installed in your R environment.
  2. It converts data frames into tibbles, which you must keep in mind.
  3. It creates a modified copy of a data frame, so it is not memory-efficient for large data sets.

Method 5: Using dplyr::mutate()

The dplyr::mutate() method allows us to append single or multiple columns at once while preserving existing ones. So if I load dplyr, I can do df %>% mutate(new_col1 = vec1, new_col2 = vec2). That would add both columns in one go.

Syntax

library(dplyr)

df <- df %>% mutate(new_col1 = vec1, new_col2 = vec2)

Example

library(dplyr)

df <- data.frame(
  col1 = c(1, 2, 3),
  col2 = c(4, 5, 6),
  col3 = c(7, 8, 9)
)
df

col4 <- c(10, 11, 12)

df <- df %>% mutate(col4)
df

Output

You can see from the above picture that we mutated an existing data frame to add “col4” using the dplyr::mutate() method.

Pros

  1. It provides intuitive and cleaner syntax.
  2. It supports dynamic column names with := and glue syntax.

Cons

  1. It requires installing and loading the dplyr package.
  2. It can be overkill for simple tasks because it does not make sense to load.
  3. You can seamlessly work with remote databases (via dbplyr).

That’s it!

Recent Posts

R length(): Vector, List, Matrix, Array, Data Frame, String

Before executing an operation on an object, it is advisable to check its length, as…

15 hours ago

How to Round Numbers in R

Rounding is a process of approximating a number to a shorter, simpler, and more interpretable…

2 days ago

sqrt() Function: Calculate Square Root in R

The square root of a number is a value that is multiplied by itself, giving…

5 days ago

How to Remove Duplicate Rows from DataFrame in R

Duplicate rows refer to all the values across all columns that are the same in…

6 days ago

How to Remove NA From Vector in R

A vector is a data structure that holds the same type of data. When working…

1 week ago

Converting String to Uppercase in R

For string operations like comparing strings, data standardization, formatting output, or input validation, we may…

1 week ago