# colMeans in R: How to Use colMeans() Function in R

As a data scientist, you often work with a dataset with different categories, each making a column with values for numerous elements going down the columns.

The colMeans() function will be very useful to you to find the mean values of the items for each category. But what is the colMeans() function, and how to use it with numeric matrix, array, data frame, and dataset? Let’s find out in detail.

## colMeans in R

The colMeans() is a built-in R function that calculates the means of each column of a matrix or array. The colMeans() method returns the mean for the specified columns for the data frame, matrix, or arrays.

### Syntax

``colMeans(x, na.rm = FALSE, dims = 1)``

### Parameters

x: It is an array of two or more dimensions, containing numeric, complex, integer, or logical values, or a numeric data frame.

dims: It is an integer: Which dimensions are regarded as ‘columns’ to sum over. It is over dimensions 1:dims.

na.rm: It is a logical argument. If TRUE, NA values are ignored.

### Example

Let’s create a Matrix using the matrix() function and calculate the mean of columns of the matrix.

``````rv <- rep(1:4)

mtrx <- matrix(rv, 2, 2)
mtrx
cat("The mean of rows is: ", "\n")
colMeans(mtrx)``````

#### Output

``````     [,1] [,2]
[1,]   1    3
[2,]   2    4

The mean of rows is:

 1.5 3.5``````

The rep() function replicates numeric values, or text, or the values of a vector for a specific number of times.

The matrix() function will create a 2 X 2 matrix.

The mean of first column values is 1,5 cause 1 + 2 = 3 and 3 / 2 = 1.5 and same for the second column.

## Calculate the mean of columns of the array in R

To create an array in R, use the array() function. Let’s create an array and use the colMeans() function to calculate the mean of columns of the array.

``````arr <- array(1:4, c(2, 2, 2))
arr
cat("The mean of columns is: ", "\n")
colMeans(arr)``````

#### Output

``````, , 1

[,1] [,2]
[1,]  1    3
[2,]  2    4

, , 2

[,1] [,2]
[1,]   1    3
[2,]   2    4

The mean of columns is:
[,1] [,2]
[1,]  1.5  1.5
[2,]  3.5  3.5``````

## Calculating the mean of columns of a data frame in R

To create a data frame in R, use the data.frame() function. To calculate the mean of columns of the data frame, use the colMeans() function.

``````x <- c(2:4)
y <- c(2:4 * 2)
z <- c(2:4 * 3)
w <- c(2:4 * 4)

df <- data.frame(x, y, z, w)
df
cat("The mean of columns of df is: ", "\n")
colMeans(df)``````

#### Output

``````  x  y  z  w
1 2  4  6  8
2 3  6  9 12
3 4  8 12 16

The mean of columns of df is:

x  y  z  w
3  6  9 12``````

## Calculate the mean of columns of a data set in R

You can calculate the mean of columns of the dataset in R using the colMeans() function. We will use the USArrests dataset.

``colMeans(USArrests)``

#### Output

`````` Murder Assault  UrbanPop  Rape
7.788   170.760  65.540   21.232``````

## Handling NA Values (na.rm) in colMeans() function

One of the most regular issues of the R colMeans() function is the existence of NAs (i.e., missing values) in the data. Let’s see what happens when we apply our functions to data with missing values.

``````x <- c(1, 2, NA, 3)
y <- c(NA, 4, 5, 6)
z <- c(7, NA, 8, 9)
w <- c(10, 11, NA, 13)

df <- data.frame(x, y, z, w)
df
cat("The mean of columns of df is: ", "\n")
colMeans(df)``````

#### Output

``````   x  y  z  w
1  1 NA  7 10
2  2 4  NA 11
3 NA 5  8  NA
4 3  6  9  13

The mean of columns of df is:

x   y   z   w
NA  NA  NA  NA``````

You can see that we got all the NAs in the output because every column contains one NA. So, it will return NA in the output.

But no worries, there is an easy solution. We have to add na.rm = TRUE within our functions.

``````x <- c(1, 2, NA, 3)
y <- c(NA, 4, 5, 6)
z <- c(7, NA, 8, 9)
w <- c(10, 11, NA, 13)

df <- data.frame(x, y, z, w)
cat("The mean of columns of df is: ", "\n")
colMeans(df, na.rm = TRUE)``````

#### Output

``````The mean of columns of df is:
x       y       z       w
2.00000 5.00000 8.00000 11.33333``````

As you can see that it ignored the NA values and calculate the mean of the remaining column values. Please note that the handling of missing values is a research topic by itself. Just ignoring NA values is usually not the best idea.

That is it for the colMeans() function in the R tutorial.