R Advanced

How to Calculate the Sum by Group in R Data Frame

In R, you can calculate the sum by group using the base aggregate(), dplyr’s group_by() with summarise(), or the data table package.

What do we mean when we say sum by group? Well, it means dividing the dataset into meaningful groups based on categorical variables and then calculating the summation of the values within each group.

For example, you may need to calculate the total number of quantities sold per category or product. Another example is summing the total revenue, which you can calculate as (Price × Quantity), for each group.

For this practical, we will use the df_sales_data data frame:

df_sales_data <- data.frame(
  Product = c("Apple", "Banana", "Apple", "Milk", "Bread", "Butter", "Milk"),
  Category = c("Fruit", "Fruit", "Fruit", "Dairy", "Bakery", "Dairy", "Dairy"),
  Price = c(1.2, 0.5, 1.2, 2.5, 1.8, 2.0, 2.5),
  Quantity = c(5, 10, 5, 2, 3, 12, 2),
  stringsAsFactors = FALSE
)

print(df_sales_data)

Method 1: Using aggregate()

If you are looking for a built-in solution that does not require packages, you should use the aggregate() function. It accepts a formula for summarization, a dataset, and a function to apply and returns data summarised by groups.

Sum by single group

Let’s calculate the Sum of Quantities by Category:

library(data.table)


df_sales_data <- data.frame(
  Product = c("Apple", "Banana", "Apple", "Milk", "Bread", "Butter", "Milk"),
  Category = c("Fruit", "Fruit", "Fruit", "Dairy", "Bakery", "Dairy", "Dairy"),
  Price = c(1.2, 0.5, 1.2, 2.5, 1.8, 2.0, 2.5),
  Quantity = c(5, 10, 5, 2, 3, 12, 2),
  stringsAsFactors = FALSE
)

print(df_sales_data)

# Calculate sum of quantities by category
aggregate(Quantity ~ Category, data = df_sales_data, FUN = sum)

Output

The above output image shows that we are calculating the total number of quantities based on each category.

  • The Bakery category has only one product, which has only 3 quantities.
  • The dairy category has 3 products, and the total quantities for the category are 16 (2 + 12 + 2).
  • The fruit category has 3 products (2 Apples and 1 Banana), and the total quantities for the category are 20 (5 + 10 + 5).

Sum by multiple groups

Let’s calculate the sum of the Quantity group by Category and Product.

df_sales_data <- data.frame(
  Product = c("Apple", "Banana", "Apple", "Milk", "Bread", "Butter", "Milk"),
  Category = c("Fruit", "Fruit", "Fruit", "Dairy", "Bakery", "Dairy", "Dairy"),
  Price = c(1.2, 0.5, 1.2, 2.5, 1.8, 2.0, 2.5),
  Quantity = c(5, 10, 5, 2, 3, 12, 2),
  stringsAsFactors = FALSE
)

print(df_sales_data)


# Sum by Multiple Groups
aggregate(Quantity ~ Category + Product, data = df_sales_data, FUN = sum)

Output

The above figure shows that we group data frames based on Category and Product and calculate the sum of quantities for each unique product.

Sum of multiple columns by single group

Let’s calculate the sum of multiple columns (Quantity, Price) grouped by Category.

df_sales_data <- data.frame(
  Product = c("Apple", "Banana", "Apple", "Milk", "Bread", "Butter", "Milk"),
  Category = c("Fruit", "Fruit", "Fruit", "Dairy", "Bakery", "Dairy", "Dairy"),
  Price = c(1.2, 0.5, 1.2, 2.5, 1.8, 2.0, 2.5),
  Quantity = c(5, 10, 5, 2, 3, 12, 2),
  stringsAsFactors = FALSE
)

print(df_sales_data)


# Sum Multiple Columns
aggregate(cbind(Quantity, Price) ~ Category, data = df_sales_data, FUN = sum)

Output

The above figure shows the sum of multiple columns categorywise.

Method 2: Using group_by() with summarise()

The dplyr package provides an easy way to group and summarize data frame using the group_by() and summarise() functions.

The group_by() function groups data frame by categorical column.

The summarise() function applies aggregation functions like sum(), mean(), count(), etc. For example, we have already calculated the mean by group and count rows by group.

The output from dplyr functions is a tibble, which is an extended version of the data frame.

Sum by single group

library(dplyr)


df_sales_data <- data.frame(
  Product = c("Apple", "Banana", "Apple", "Milk", "Bread", "Butter", "Milk"),
  Category = c("Fruit", "Fruit", "Fruit", "Dairy", "Bakery", "Dairy", "Dairy"),
  Price = c(1.2, 0.5, 1.2, 2.5, 1.8, 2.0, 2.5),
  Quantity = c(5, 10, 5, 2, 3, 12, 2),
  stringsAsFactors = FALSE
)

print(df_sales_data)


# Using dplyr Approach
# Basic Grouped Sum

df_sales_data %>%
  group_by(Category) %>%
  summarize(Total_Quantity = sum(Quantity))

Output

Grouped sum with multiple columns

Let’s calculate the sum of the total quantity and the sum of the total price grouped by Category and Product.

library(dplyr)


df_sales_data <- data.frame(
  Product = c("Apple", "Banana", "Apple", "Milk", "Bread", "Butter", "Milk"),
  Category = c("Fruit", "Fruit", "Fruit", "Dairy", "Bakery", "Dairy", "Dairy"),
  Price = c(1.2, 0.5, 1.2, 2.5, 1.8, 2.0, 2.5),
  Quantity = c(5, 10, 5, 2, 3, 12, 2),
  stringsAsFactors = FALSE
)

print(df_sales_data)


# Grouped Sum with Multiple Columns
df_sales_data %>%
  group_by(Category, Product) %>%
  summarize(Total_Quantity = sum(Quantity), Total_Price = sum(Price))

Output

Sum all numeric columns

If you want to sum all numeric columns while grouping by one column, you can use group_by(), summarise(), across(), where(), and is.numeric() in dplyr.

library(dplyr)


df_sales_data <- data.frame(
  Product = c("Apple", "Banana", "Apple", "Milk", "Bread", "Butter", "Milk"),
  Category = c("Fruit", "Fruit", "Fruit", "Dairy", "Bakery", "Dairy", "Dairy"),
  Price = c(1.2, 0.5, 1.2, 2.5, 1.8, 2.0, 2.5),
  Quantity = c(5, 10, 5, 2, 3, 12, 2),
  stringsAsFactors = FALSE
)

print(df_sales_data)



# Sum All Numeric Columns

df_sales_data %>% 
  group_by(Category) %>%
  summarise(across(where(is.numeric), sum))

Output

Method 3: Using data.table package

The data.table package provides a highly optimized way to perform data manipulation.

Basic grouped sum

library(data.table)


df_sales_data <- data.frame(
  Product = c("Apple", "Banana", "Apple", "Milk", "Bread", "Butter", "Milk"),
  Category = c("Fruit", "Fruit", "Fruit", "Dairy", "Bakery", "Dairy", "Dairy"),
  Price = c(1.2, 0.5, 1.2, 2.5, 1.8, 2.0, 2.5),
  Quantity = c(5, 10, 5, 2, 3, 12, 2),
  stringsAsFactors = FALSE
)

print(df_sales_data)


# Convert to data.table
dt_sales <- as.data.table(df_sales_data)

# Basic Grouped Sum
dt_sales[, .(Total_Quantity = sum(Quantity)), by = Category]

Output

Grouped sum with multiple columns

library(data.table)


df_sales_data <- data.frame(
  Product = c("Apple", "Banana", "Apple", "Milk", "Bread", "Butter", "Milk"),
  Category = c("Fruit", "Fruit", "Fruit", "Dairy", "Bakery", "Dairy", "Dairy"),
  Price = c(1.2, 0.5, 1.2, 2.5, 1.8, 2.0, 2.5),
  Quantity = c(5, 10, 5, 2, 3, 12, 2),
  stringsAsFactors = FALSE
)

print(df_sales_data)


# Convert to data.table
dt_sales <- as.data.table(df_sales_data)


# Grouped Sum with Multiple Columns
dt_sales[, .(Total_Quantity = sum(Quantity), Total_Price = sum(Price)), by = .(Category, Product)]

Output

If your data set is very large, I highly recommend you use data.table package for the efficient performance.

Recent Posts

How to Calculate Percentage by Group in R Data Frame

To calculate the percentage by the group in R, you need to combine various dplyr…

14 hours ago

How to Calculate the Mean by Group in R Data Frame

What does it mean when we say "mean by group"? It means grouping the data…

3 days ago

How to Count Number of Rows by Group using dplyr in R

Whether you want to summarize the categorical data, identify patterns and trends, or calculate percentages…

4 days ago

dplyr group_by() Function in R

The group_by() function from the dplyr package allows us to group data frames by one…

6 days ago

R dplyr::slice() Function

The dplyr::slice() function subsets rows by their position or index within a data frame. If…

1 week ago

How to Create an Empty Vector and Append Values in R

R vectors are atomic, which means they have homogeneous data types. They are contiguous in…

1 week ago