R Advanced

How to Calculate Percentage by Group in R Data Frame

To calculate the percentage by the group in R, you need to combine various dplyr functions such as  group_by(), summarise(), mutate(), and ungroup().

Percentage by group means calculating the percentage of a variable within each group defined by another variable in a dataset.

Here is the core concept behind it:

  1. First, you must divide your data frame into subgroups based on the unique values in one or more categorical variables using the group_by() function.
  2. In the next step, we will count the number of occurrences or the sum of a specific value within these subgroups.
  3. Then, calculate the total count or sum for each group or the entire dataset.
  4. At last, divide the count/sum for each group by the total and multiply by 100. You will have your percentage by group.

Percentage of Quantity within each Product Group

Let’s calculate the percentage of quantity within each product group.

library(dplyr)


df_sales_data <- data.frame(
  Product = c("Apple", "Banana", "Apple", "Milk", "Banana", "Butter", "Apple"),
  Quantity = c(5, 10, 5, 2, 10, 12, 5),
  stringsAsFactors = FALSE
)

print(df_sales_data)

# Percentage of Quantity within each Product Group

df_sales_data %>%
 group_by(Product) %>%
 mutate(Percent_Quantity = (Quantity / sum(Quantity)) * 100)

Output

The above figure shows that the return value of the dplyr package is 7×3 tibble.

The Percent_Quantity column shows a 33.3% percentage quantity for Apple products because Apple appears 3 times with five quantities each. So, 33.3% quantity for each Apple.

Banana appear 2 times with 10-10 quantities each. So, 50% for each Banana product.

Milk and Butter appear only 1 time with 2 and 12 quantities, so it has 100% quantity for each product group.

Calculate the percentage by sales (price × quantity) within groups

Let’s calculate the percentage of revenue each product contributes to its category.

library(dplyr)

df_sales_data <- data.frame(
  Product = c("Apple", "Banana", "Apple", "Milk", "Bread", "Butter", "Milk"),
  Category = c("Fruit", "Fruit", "Fruit", "Dairy", "Bakery", "Dairy", "Dairy"),
  Price = c(1.2, 0.5, 1.2, 2.5, 1.8, 2.0, 2.5),
  Quantity = c(5, 10, 5, 2, 3, 12, 2),
  stringsAsFactors = FALSE
)

print(df_sales_data)

# Percentage by Sales (Price × Quantity) Within Groups

df_sales_data %>%
  group_by(Category) %>%
  mutate(Percentage = 100 * (Price * Quantity) / sum(Price * Quantity))

Output

Using data.table package

You should use the data.table package when dealing with large datasets because it is highly efficient.

Convert your input data frame to a data.table and then calculate frequency percentage within each category.

library(data.table)


df_sales_data <- data.frame(
  Product = c("Apple", "Banana", "Apple", "Milk", "Bread", "Butter", "Milk"),
  Category = c("Fruit", "Fruit", "Fruit", "Dairy", "Bakery", "Dairy", "Dairy"),
  Price = c(1.2, 0.5, 1.2, 2.5, 1.8, 2.0, 2.5),
  Quantity = c(5, 10, 5, 2, 3, 12, 2),
  stringsAsFactors = FALSE
)

print(df_sales_data)


# Convert to data.table
dt_sales_data <- data.table(df_sales_data)

# Calculate frequency percentage within each category
dt_sales_data[, .(Count = .N), by = .(Category, Product)][
  , Total_count_by_category := sum(Count), by = Category][
  , FrequencyPercent := (Count / Total_count_by_category) * 100][]

Output

That’s all!

Recent Posts

R scale(): Scaling and Centering of Matrix-like Objects

The scale() function in R centers (subtracting the mean) and/or scales (dividing by the standard…

3 months ago

file.rename(): Renaming Single and Multiple Files in R

To rename a file in R, you can use the file.rename() function. It renames a…

3 months ago

R prop.table() Function

The prop.table() function in R calculates the proportion or relative frequency of values in a…

3 months ago

exp() Function: Calculate Exponential of a Number in R

The exp() is a built-in function that calculates the exponential of its input, raising Euler's…

3 months ago

R split() Function: Splitting a Data

The split() function divides the input data into groups based on some criteria, typically specified…

4 months ago

colMeans(): Calculating the Mean of Columns in R Data Frame

The colMeans() function in R calculates the arithmetic mean of columns in a numeric matrix,…

4 months ago