To calculate the percentage by the group in R, you need to combine various dplyr functions such as group_by(), summarise(), mutate(), and ungroup().
Percentage by group means calculating the percentage of a variable within each group defined by another variable in a dataset.
Here is the core concept behind it:
Let’s calculate the percentage of quantity within each product group.
library(dplyr)
df_sales_data <- data.frame(
Product = c("Apple", "Banana", "Apple", "Milk", "Banana", "Butter", "Apple"),
Quantity = c(5, 10, 5, 2, 10, 12, 5),
stringsAsFactors = FALSE
)
print(df_sales_data)
# Percentage of Quantity within each Product Group
df_sales_data %>%
group_by(Product) %>%
mutate(Percent_Quantity = (Quantity / sum(Quantity)) * 100)
Output
The Percent_Quantity column shows a 33.3% percentage quantity for Apple products because Apple appears 3 times with five quantities each. So, 33.3% quantity for each Apple.
Banana appear 2 times with 10-10 quantities each. So, 50% for each Banana product.
Milk and Butter appear only 1 time with 2 and 12 quantities, so it has 100% quantity for each product group.
Let’s calculate the percentage of revenue each product contributes to its category.
library(dplyr)
df_sales_data <- data.frame(
Product = c("Apple", "Banana", "Apple", "Milk", "Bread", "Butter", "Milk"),
Category = c("Fruit", "Fruit", "Fruit", "Dairy", "Bakery", "Dairy", "Dairy"),
Price = c(1.2, 0.5, 1.2, 2.5, 1.8, 2.0, 2.5),
Quantity = c(5, 10, 5, 2, 3, 12, 2),
stringsAsFactors = FALSE
)
print(df_sales_data)
# Percentage by Sales (Price × Quantity) Within Groups
df_sales_data %>%
group_by(Category) %>%
mutate(Percentage = 100 * (Price * Quantity) / sum(Price * Quantity))
Output
You should use the data.table package when dealing with large datasets because it is highly efficient.
Convert your input data frame to a data.table and then calculate frequency percentage within each category.
library(data.table)
df_sales_data <- data.frame(
Product = c("Apple", "Banana", "Apple", "Milk", "Bread", "Butter", "Milk"),
Category = c("Fruit", "Fruit", "Fruit", "Dairy", "Bakery", "Dairy", "Dairy"),
Price = c(1.2, 0.5, 1.2, 2.5, 1.8, 2.0, 2.5),
Quantity = c(5, 10, 5, 2, 3, 12, 2),
stringsAsFactors = FALSE
)
print(df_sales_data)
# Convert to data.table
dt_sales_data <- data.table(df_sales_data)
# Calculate frequency percentage within each category
dt_sales_data[, .(Count = .N), by = .(Category, Product)][
, Total_count_by_category := sum(Count), by = Category][
, FrequencyPercent := (Count / Total_count_by_category) * 100][]
Output
That’s all!
Krunal Lathiya is a seasoned Computer Science expert with over eight years in the tech industry. He boasts deep knowledge in Data Science and Machine Learning. Versed in Python, JavaScript, PHP, R, and Golang. Skilled in frameworks like Angular and React and platforms such as Node.js. His expertise spans both front-end and back-end development. His proficiency in the Python language stands as a testament to his versatility and commitment to the craft.
What do you mean by counting unique values by group? Well, it means you divide…
In R, you can calculate the sum by group using the base aggregate(), dplyr's group_by()…
What does it mean when we say "mean by group"? It means grouping the data…
Whether you want to summarize the categorical data, identify patterns and trends, or calculate percentages…
The group_by() function from the dplyr package allows us to group data frames by one…
The dplyr::slice() function subsets rows by their position or index within a data frame. If…