Calculating Mean, Median, and Mode in R

For understanding the central tendency of your input dataset, you need to calculate the basic summaries like mean, median, and mode. It gives you the data distribution, which tells you whether it is symmetrical or skewed.

Mean

Mean means the arithmetic average of a number in mathematics. An average is the sum of the total numbers divided by the count of numbers.

Calculating simple mean or average in maths

The above figure shows the mean of individual numbers, which is 2.5.

Vector

To calculate the arithmetic mean of a vector, use the mean() function.

Calculating the mean of a vector

In the above figure, we calculated the mean of the vector containing seven elements.

vec <- c(11, 21, 19, 18, 51, 51, 71)

# Calculating average using mean() 
mean(vec)

# [1] 34.57143

Data frame column

You can calculate the mean of a data frame column using the “mean(df$column)” expression.

Calculating the mean of a data frame column

The above figure returns the mean of DataFrame’s price column.

df <- data.frame(
  id = c(11, 22, 33, 44, 55),
  price = c(19, 46, 21, 11, 18)
)

# Calculate mean of DataFrame column
mean_of_col <- mean(df$price)

mean_of_col

# [1] 23

Excluding NA values

By default, if data frame column has NA, mean() returns NA

If your data frame contains NA values, it does not exclude them by default and returns NA as an output.

This is a default principle of missing or unknown values that makes the mean undefined or not meaningful.

df <- data.frame(
  id = c(11, 22, 33, 44, NA),
  price = c(19, 46, 21, 11, NA)
)

# Calculate mean of DataFrame column
mean_of_col <- mean(df$price)

mean_of_col

# [1] NA

Passing the na.rm = TRUE

In the above figure, we passed the na.rm = TRUE to the mean() function to skip the NA value while calculating the mean of the remaining column values.

df <- data.frame(
  id = c(1, 2, 3, 4, NA),
  price = c(11, 22, 33, 44, NA)
)

# Calculate mean of DataFrame column
mean_of_col <- mean(df$price, na.rm = TRUE)

mean_of_col

# [1] 24.5

In the above code example, we calculated the mean value of the price column’s (11, 22, 33, 44) values, excluding the NA values.

Median

The median is the middle value of the ordered dataset from least to greatest. 

Vector

You can use the built-in median() function to calculate the vector’s median. For an odd number of values in a vector, it will return the exact middle value. For an even number of values, it returns the average of the two medians.

Calculating the median of odd elements of a vector

vec <- 1:5 

median(vec)

# [1] 3

Let’s calculate for an even number of elements:

Calculating the median of even elements of a vector

vec <- 1:6 median

(vec)

# [1] 3.5

Data Frame

If you want to calculate the median of the data frame, you can do it by calculating a specific numeric column. For example, median(df$column).

Calculating the mode of data frame column using DescTools's Mode() function

df <- data.frame(
  col1 = c(1, 2, 3),
  col2 = c(4, 5, 6),
  col3 = c(7, 8, 9)
)

# Calculate median of col1
median(df$col1)

# [1] 2

Handling NA values

If your data frame contains NA values, it will return NA as an output.

Calculating the median on a column that has NA values

The above figure shows that we have a missing value, NA, as an output.

However, we can ignore the NA values in the dataset by using the na.rm = TRUE argument.

Passing na.rm = TRUE to exclude NA values

df <- data.frame(
  col1 = c(1, NA, 3),
  col2 = c(4, 5, 6),
  col3 = c(7, 8, 9)
)

# Calculate median of col1
median(df$col1, na.rm = TRUE)
 # [1] 2

Mode

The mode is the value that appears most frequently in your dataset. A datset can have unimode (1 mode), bimodal (2 modes), or multimodes. Using mode, you can identify the most common category or value in the dataset.

There is no built-in function in R that you can use to calculate the mode.

However, there are two different ways:

  1. Using the DescTools package’s Mode() function
  2. Creating a custom function

Method 1: DescTools package’s Mode()

Install and load the DescTools package:

install.packages("DescTools")

library(DescTools)

Installing DescTools

Mode of a numeric vector

Calculating mode of numeric vector

library(DescTools)

numeric_vector <- c(11, 21, 19, 19, 12)

cat("The mode of numeric vector:", Mode(numeric_vector), "\n")

# The mode of numeric vector: 19

Mode of a character vector

Mode of a character vector

library(DescTools)

character_vector <- c("k", "l", "b", "k")

cat("The mode of character vector:", Mode(character_vector), "\n")

# The mode of character vector: k

Method 2: Creating a custom function

Let’s create a custom function to calculate the mode for data frame columns or a dataset.

Calculating the mode using custom function of data frame columns

mode <- function(x, na.rm = FALSE) {
  if (na.rm) {
    x <- x[!is.na(x)]
  }
  val <- unique(x)
  return(val[which.max(tabulate(match(x, val)))])
}

df <- data.frame(
  col1 = c(1, 2, 3, 2),
  col2 = c(4, 5, 6, 6),
  col3 = c("a", "b", "k", "b")
)

cat("The mode of col1 column:", mode(df$col1), "\n")
cat("The mode of col2 column:", mode(df$col2), "\n")
cat("The mode of col3 column:", mode(df$col3), "\n")


# The mode of col1 column: 2
# The mode of col2 column: 6
# The mode of col3 column: b

That’s it!

Leave a Comment