colMeans(): Calculating the Mean of Columns in R Data Frame

The colMeans() function in R calculates the arithmetic mean of columns in a numeric matrix, data frame, or array. It efficiently calculates the average value for each column by summing the elements and dividing by the number of elements (or non-missing elements if specified).

Figure of calculating mean of every column of data frame in R

In the above figure, we defined an input data frame, df, that contains three columns.

The colMeans() function accepts the whole data frame and returns a column-wise mean.

Here is the program that demonstrates the above figure:

df <- data.frame(
  col1 = c(1, 2, 3),
  col2 = c(4, 5, 6),
  col3 = c(7, 8, 9)
)

# Calculating the mean of every columns.
colMeans(df)

#  Output:
#  col1    col2    col3
#    2       5      8

Syntax

colMeans(x, na.rm = FALSE)

Parameters

Argument Description
x It represents an array of two or more dimensions containing numeric, complex, integer, or logical values or a numeric data frame.
na.rm It is a logical argument. If TRUE, NA values are ignored.

    Mean of each column and exclude NA values

    What if a data frame contains NA values, and we try to calculate the mean of every column?

    Figure of calculating the mean of every column containing NA values

    df <- data.frame(
      col1 = c(NA, 2, 3),
      col2 = c(4, NA, 6),
      col3 = c(7, 8, NA)
    )
    
    # Calculate the mean of every columns.
    colMeans(df)
    
    #  Output:
    #  col1  col2   col3
    #   NA    NA     NA

    We obtained the output ‘NA’ for every column, as shown in the above figure and code, because ‘NA’ represents a missing value.

    To ignore NA values in a data frame, you need to pass na.rm = TRUE.

    Handling NA values while using colMeans() method in R

    # Create a data frame.
    df <- data.frame(
      col1 = c(NA, 2, 3),
      col2 = c(4, NA, 6),
      col3 = c(7, 8, NA)
    )
    
    # Calculate the mean of every columns.
    colMeans(df, na.rm = TRUE)
    
    # Output:
    #  col1   col2   col3
    #  2.5    5.0    7.5
    

    Calculating the mean of specific columns

    You can calculate the mean of specific columns based on your requirement by passing the column’s index.

    Figure of Calculating the mean of specific columns

    df <- data.frame(
      col1 = c(1, 2, 3),
      col2 = c(4, 5, 6),
      col3 = c(7, 8, 9)
    )
    
    # Calculate the mean col1 and col3
    colMeans(df[c("col1", "col3")])
    
    #  Output:
    #  col1    col3
    #    2      8

    For row-wise means, you can use the rowMeans() function.

    With a Matrix

    A matrix contains rows and columns, and we are only interested in the columns. It works the same as it did with a data frame.

    mat <- matrix(1:6,
      nrow = 2, ncol = 3,
      dimnames = list(NULL, c("A", "B", "C"))
    )
    
    colMeans(mat)
    
    # Output:
    #  A   B   C
    #  3   4   5

    And it returns the mean of each column.

    Matrix with a single element

    If the input matrix contains a single element, it returns that element as a mean because there is nothing to compute against. Mean of a single value is that value only.

    single_mat <- matrix(5)
    
    colMeans(single_mat)
    
    # Output: 5

    That’s all!

    Leave a Comment