R Advanced

colSums(): Calculating the Sum of Columns of a Data Frame in R

The colSums() function in R calculates the sums of columns for numeric matrices, data frames, or arrays.

In this figure, we calculated the sum of each column in the data frame.

For col1, the sum of values is (1 + 2 + 3) 6. For col2, the sum of values is (4 + 5 + 6) 15. For col3, the sum of values is (7 + 8 + 9) 24.

# Creating a data frame
df <- data.frame(
  col1 = c(1, 2, 3),
  col2 = c(4, 5, 6),
  col3 = c(7, 8, 9)
)

# Calculate the column sums.
colSums(df)

# Output:
# col1   col2   col3
# 6       15     24

Syntax

colSums(x, m, n, na.rm = FALSE) 

Parameters

Argument Description
x It is an array of two or more dimensions containing numeric, complex, integer, or logical values or a numeric data frame.
na.rm It is logical. Should missing values (including NA or NaN) be omitted from the calculations?
dims

It specifies which dimensions are treated as “columns” to sum over.

n, m

It represents the number of rows (m) and columns (n) in the data frame df or matrix x.

Handling NA Values

If the NA value is found in a specific column of a data frame, the colSums() function will return NA for that column.

# Create a data frame.
df <- data.frame(
  col1 = c(NA, 2, 3),
  col2 = c(4, NA, 6),
  col3 = c(7, 8, NA)
)

# Calculate the column sums.
colSums(df)

# Output:
# col1    col2    col3
#  NA      NA      NA

You can exclude NA values from the data frame by passing the na.rm = TRUE parameter.

# Create a data frame.
df <- data.frame(
  col1 = c(NA, 2, 3),
  col2 = c(4, NA, 6),
  col3 = c(7, 8, NA)
)

# Calculate the column sums.
colSums(df, na.rm = TRUE)

# Output
# col1    col2   col3 
#  5      10      15

Usage with specific columns

We can use the indexing to calculate the sum of specific columns. Pass the column index to the colSums() function, and it will return the sum for the respective columns.

# Create a data frame.
df <- data.frame(
  col1 = c(1, 2, 3),
  col2 = c(4, 2, 6),
  col3 = c(7, 8, 3)
)

# Calculate the column sums.
colSums(df[, c(2, 3)])

# Output
# col2    col3 
#  12      18

Usage with Matrix

Since a matrix also has rows and columns, we can find the sum of individual columns by passing the matrix to the colSums() function.

mtrx <- matrix(rep(1:9), 3, 3)
mtrx
cat("The sum of columns is: ", "\n")
colSums(mtrx)

Higher-Dimensional Arrays

When you are working with high-dimensional arrays, the dims argument specifies how many of the leading dimensions you want to keep, and the function will sum over all the trailing dimensions.

For example, with a 4D array shaped (a, b, c, d), using colSums(x, dims = 2) keeps the first two dimensions (a, b) and sums over (c, d).

Increasing the value of dims reduces the amount of collapsing, while smaller values collapse more dimensions together.

arr <- array(1:24, dim = c(2, 3, 4)) # 2 rows, 3 columns, 4 slices

colSums(arr, dims = 1)

# Output:
#      [,1] [,2]  [,3]  [,4]
# [1,]  3   15    27    39
# [2,]  7   19    31    43
# [3,]  11  23    35    47

colSums(arr, dims = 2) # Sum over rows and columns (dims=2), result: 4-vector

# Output:
#       [,1]  [,2]  [,3]  [,4]
# [1,]   3     15    27    39
# [2,]   7     19    31    43
# [3,]  11     23    35    47
# [1]   21     57    93   129

In the code arr <- array(1:24, dim = c(2, 3, 4)), we create a 3D array with two rows, three columns, and four slices.

When we call colSums(arr, dims = 1), the first dimension (rows) is preserved, and the function sums over the remaining dimensions (columns and slices).

This results in a 3×4 matrix, where each entry is the sum of values aligned across columns and slices for a given row.

On the other hand, colSums(arr, dims = 2) preserves both the row and column dimensions and collapses over the slice dimension, effectively summing across all slices.

This produces a 2×3 matrix, showing the total sums for each row-column position across all slices.

Calculating the sum of column values in the Data Set

You can use the built-in dataset, like ChickWeight, and calculate the sum of its column values.

But first, let’s get a snapshot of the ChickWeight dataset using the head() function.

head(USArrests, 5)

Output

          Murder Assault UrbanPop  Rape
Alabama    13.2    236     58      21.2
Alaska     10.0    263     48      44.5
Arizona     8.1    294     80      31.0
Arkansas    8.8    190     50      19.5
California  9.0    276     91      40.6

We will calculate the sum of Murder, Assult, UrbanPop, and Rape column values.

colSums(USArrests)

Output

Murder  Assault  UrbanPop   Rape
389.4   8538.0    3277.0   1061.6

Ensure that the object passed to this function is numeric or can be coerced to numeric. If it contains non-numeric columns, it will throw an error.

Recent Posts

rowSums(): Calculating the Sum of Rows of a Matrix or Data Frame in R

The rowSums() function calculates the sum of values in each numeric row of a matrix,…

7 days ago

R View() Function

The View() is a utility function in R that invokes a more intuitive spreadsheet-style data…

2 weeks ago

summary() Function: Producing Summary Statistics in R

The summary() is a generic function that produces the summary statistics for various R objects,…

3 weeks ago

R paste() Function

The paste() function in R concatenates vectors after converting them to character. paste("Hello", 19, 21,…

4 weeks ago

paste0() Function in R

R paste0() function concatenates strings without any separator between them. It is a shorthand version…

4 weeks ago

How to Calculate Standard Error in R

Standard Error (SE) measures the variability or dispersion of the sample mean estimate of a…

1 month ago