R dplyr

R dplyr::summarise() or summarize() Function

The dplyr summarise()(or summarize()) function aggregates data into a single summary value for each group or the entire dataset if ungrouped.  

It collapses multiple rows into a concise statistical summary, such as the mean, sum, and count

Developers often use summarize() with group_by(), which splits the data into groups based on one or more categorical variables (Columns you can use for groupings).

They then apply the summary function(s), such as mean and sum, to each group separately to create a comprehensive summary that makes sense.

The resulting data frame (tibble) has a single row or row for each group, with the summary statistic(s) for the entire data frame or for that group.

Syntax

summarise(.data, .by = NULL, .groups = NULL)

# OR

summarize(.data, .by = NULL, .groups = NULL, ...)

Parameters

Name Value
.data It is an input data frame, tibble, or other dataset type.
.by It suggests grouping things before performing operations like mean, sum, count, etc.
.groups It determines how to maintain groups after operating on them.
It tells whether you want to count or summarise.

Example data frame

We will use the stock_market_data data frame.

library(dplyr)

stock_market_data <- data.frame(
  Stock_Symbol = c("AAPL", "TSLA", "MSFT", "NVDA", "JPM", "AMZN", "GOOGL"),
  Industry = c(
    "Technology", "Automotive", "Technology",
    "Technology", "Banking", "E-commerce", "Technology"
  ),
  Share_Price = c(150.75, 720.46, 210.22, 200.36, 135.78, 3456.22, 2800.12)
)

print(stock_market_data)

Basic summary of the data frame

We can calculate the basic summary of the column Share_Price of the data frame by calculating the mean of that column without any type of grouping.

library(dplyr)

stock_market_data <- data.frame(
  Stock_Symbol = c("AAPL", "TSLA", "MSFT", "NVDA", "JPM", "AMZN", "GOOGL"),
  Industry = c(
    "Technology", "Automotive", "Technology",
    "Technology", "Banking", "E-commerce", "Technology"
  ),
  Share_Price = c(150.75, 720.46, 210.22, 200.36, 135.78, 3456.22, 2800.12)
)


print(stock_market_data)

Output

The above output figure illustrates that we utilized the aggregate function mean() for the Share_Price column to calculate the average price for the entire data frame.

You can analyze the data as follows: To buy at least one share of any company, you must have a minimum balance of $1,096.27 in your bank account.

Grouped summary

Summary without a group is like counting your candy without sorting your flavors.

Let’s calculate the average share price by industry:

library(dplyr)

stock_market_data <- data.frame(
  Stock_Symbol = c("AAPL", "TSLA", "MSFT", "NVDA", "JPM", "AMZN", "GOOGL"),
  Industry = c(
    "Technology", "Automotive", "Technology",
    "Technology", "Banking", "E-commerce", "Technology"
  ),
  Share_Price = c(150.75, 720.46, 210.22, 200.36, 135.78, 3456.22, 2800.12)
)


print(stock_market_data)

# Grouped Summary
stock_market_data %>%
  group_by(Industry) %>%
  summarise(avg_price = mean(Share_Price))

Output

The above figure shows the average Share_Price for each Industry. The output is a 4X2 tibble, as you can see.

Multiple summaries

Multiple summaries indicate that you are calculating the average and maximum share prices for the data frame, grouped by industry.

library(dplyr)

stock_market_data <- data.frame(
  Stock_Symbol = c("AAPL", "TSLA", "MSFT", "NVDA", "JPM", "AMZN", "GOOGL"),
  Industry = c(
    "Technology", "Automotive", "Technology",
    "Technology", "Banking", "E-commerce", "Technology"
  ),
  Share_Price = c(150.75, 720.46, 210.22, 200.36, 135.78, 3456.22, 2800.12)
)


print(stock_market_data)


# Multiple Summaries
stock_market_data %>%
  group_by(Industry) %>%
    summarise(
      avg_price = mean(Share_Price),
      min_price = min(Share_Price),
      max_price = max(Share_Price)
  )

Output

In this code, we calculated the average, minimum, and maximum share price (multiple summaries) of Stocks based on Industry. The output is a 4X4 tibble, as you can see.

summarise(across()) with Multiple Columns

The dplyr::across() function allows us to apply multiple summary functions to multiple columns simultaneously.

It is helpful when we want to calculate min, max, count, and mean for multiple columns using the dplyr::summarise() function.

library(dplyr)

stock_market_data <- data.frame(
  Stock_Symbol = c("AAPL", "TSLA", "MSFT", "NVDA", "JPM", "AMZN", "GOOGL"),
  Industry = c(
    "Technology", "Automotive", "Technology",
    "Technology", "Banking", "E-commerce", "Technology"
  ),
  Share_Price = c(150.75, 720.46, 210.22, 200.36, 135.78, 3456.22, 2800.12)
)


print(stock_market_data)


# Summarize across multiple columns with multiple functions
stock_market_data %>%
  summarise(
    across(
      Share_Price, # Column to summarize
      list(mean = mean, max = max), # Functions to apply
      .names = "{col}_{fn}" # Custom naming for output columns
    ),
    unique_industries = n_distinct(Industry) # Additional summary
 )

Output

In this code example, we counted the number of unique industries in our data frame.

We also calculated the mean share price and the max share price from the data frame.

We applied multiple aggregate functions to multiple columns to obtain a summary of the data frame.

Share
Published by
Krunal Lathiya

Recent Posts

cbind() Function: Binding R Objects by Columns

R cbind (column bind) is a function that combines specified vectors, matrices, or data frames…

1 week ago

rbind() Function: Binding Rows in R

The rbind() function combines R objects, such as vectors, matrices, or data frames, by rows.…

1 week ago

as.numeric(): Converting to Numeric Values in R

The as.numeric() function in R converts valid non-numeric data into numeric data. What do I…

2 weeks ago

Calculating Natural Log using log() Function in R

The log() function calculates the natural logarithm (base e) of a numeric vector. By default,…

3 weeks ago

Dollar Sign ($ Operator) in R

In R, you can use the dollar sign ($ operator)  to access elements (columns) of…

1 month ago

Calculating Absolute Value using abs() Function in R

The abs() function calculates the absolute value of a numeric input, returning a non-negative (only…

1 month ago