How to Calculate Average By Group in R

To calculate the average by the group in R, you can use the “aggregate()”, “dplyr package’s group_by() function”, or “data.table” package.

Method 1: Using aggregate() function

The easiest way to calculate the average of a variable by the group in is to use the “aggregate()” function. The aggregate() function is “used to apply a function to each group of a data frame and return the results in a new data frame”.

Example

To work with the aggregate() function, we need a data frame, and to create a data frame in R, we use the data.frame() function.

df <- data.frame(
 students = c("Michael", "Justin", "Taylor", "Selena", 
              "Michael", "Michael", "Taylor"),
 marks = c(90, 80, 85, 75, 89, 87, 89),
 levels = c(10, 7, 8, 7, 6, 8, 5)
)

print(df)
cat("----Calculating the average by name----", "\n")
aggregate(df$marks, list(df$students), FUN = mean)

Output

   students  marks  levels
1   Michael   90     10
2   Justin    80      7
3   Taylor    85      8
4   Selena    75      7
5   Michael   89      6
6   Michael   87      8
7   Taylor    89      5

----Calculating the average by name----

   Group.1      x
1   Justin   80.00000
2   Michael  88.66667
3   Selena   75.00000
4   Taylor   87.00000

The aggregate() function accepts three arguments: the variable to calculate the average for, the data frame to use, and the function to apply. In this case, the function is the mean, which calculates the average.

You can see that the output is a new data frame with two columns: group and value, where value is the average of the value variable for each group.

Method 2: Using the dplyr package

To calculate the average by group using the dplyr package, you can use the “group_by()” and “summarize()” functions. To work with the dplyr package in R, you must install it in your system.

library("dplyr")

df <- data.frame(
  students = c("Michael", "Justin", "Taylor", "Selena", 
               "Michael", "Michael", "Taylor"),
  marks = c(90, 80, 85, 75, 89, 87, 89),
  levels = c(10, 7, 8, 7, 6, 8, 5)
)

print(df)
cat("----Calculating the average by group using dplyr----", "\n")
df %>%
  group_by(students) %>%
  summarise_at(vars(marks), list(group = mean))

Output

   students  marks  levels
1   Michael   90     10
2   Justin    80      7
3   Taylor    85      8 
4   Selena    75      7
5   Michael   89      6
6   Michael   87      8
7   Taylor    89      5

----Calculating the average by group using dplyr----

# A tibble: 4 × 2
    students   group
     <chr>     <dbl>
1   Justin      80
2   Michael     88.7
3   Selena      75
4.  Taylor      87

You can see that the group_by() and summarize() function returns a 4 X 2 tibble, a type of data frame in the R designed to be easier to use and more consistent than standard data frames.

Method 3: Using data.table()

To find the average by group using the data.table package, use the by argument in the mean() function. The data.table package provides a high-performance version of the data.frame class, with additional functionality for working with large datasets.

After creating the data frame, convert the data frame to a data table using the setDT() function.

To work with data.table package, you need to install it first in your system. After that, import the data.table package at the head of your R file.

library("data.table")

df <- data.frame(
  students = c("Michael", "Justin", "Taylor", "Selena", 
               "Michael", "Michael", "Taylor"),
  marks = c(90, 80, 85, 75, 89, 87, 89),
  levels = c(10, 7, 8, 7, 6, 8, 5)
)

print(df)
setDT(df)
cat("----Calculating the average by group using data.table----", "\n")
df[ ,list(average=mean(marks)), by=students]

Output

   students   marks   levels
1   Michael    90       10
2   Justin     80        7
3   Taylor     85        8
4   Selena     75        7
5   Michael    89        6
6   Michael    87        8
7   Taylor     89        5

----Calculating the average by group using data.table----

    students     average
1:  Michael     88.66667
2:  Justin      80.00000
3:  Taylor      87.00000
4:  Selena      75.00000

In this example, we defined a data frame and then converted that data frame to data.table using the setDT() function.

In the next step, we calculated the average of marks by students using the mean() function.

Leave a Comment