Percentages and percentiles are related in many ways, and sometimes these names are used interchangeably, but they are totally different terms. A percentage describes a fraction, while a percentile describes the fraction of the data points of a data set below a specific point. Both a percentage and percentile value provide useful information about the dataset, but they are not the same; please keep in mind that.
What is Percentile
The nth percentile rank within a dataset is the dataset’s value with a specific percentage (n) of the data points below. To illustrate how the percentile works, I will explain by finding the 11th, 19th, and 21st percentiles.
11 19 21 29 37 46 52
Here is our example already in numerical order; there are seven values in this data set. To find the percentile, we take the percentage of the number of values in the dataset, count up that number of values, and then go to the next value. That value is our percentile.
How to Calculate Percentile in R
To calculate percentile in R, use the quantile() method. The quantile() is a built-in generic function that produces sample quantiles corresponding to the given probabilities. The smallest observation corresponds to a probability of 0 and the largest to a probability of 1.
Syntax of quantile()
quantile(x, probs = seq(0, 1, 0.25), na.rm = FALSE, names = TRUE, type = 7, …)
x: It is a numeric vector whose sample quantiles are wanted or an object of a class for which a method has been defined (see also ‘details’). NA and NaN values are not allowed in numeric vectors unless na.rm is TRUE.
prob: It is a numeric vector of probabilities with values in. (Values up to 2e-14 outside that range are accepted and moved to the nearby endpoint.)
na.rm: It is logical; if true, any NA and NaN’s are removed from x before the quantiles are computed.
names: It is logical; if true, the result has a names attribute. Set to FALSE for speedup with many probs.
type: It is an integer between 1 and 9 selecting one of the nine quantile algorithms detailed below to be used.
…: It is further arguments passed to or from other methods.
To use the inbuilt R dataset, you need to write data(dataset name) at the start of your file. Then you can use that dataset.
The nth percentile of an observation variable is the value that cuts off the first n percent of the data values when it is sorted in ascending order. We will use the iris dataset and find the percentile of Petal.Length column at 0, 0.25, 0.5, 0.75, 1 percentiles.
data(iris) ln <- iris$Petal.Length quantile(ln, probs = c(0, 0.25, 0.5, 0.75, 1))
0% 25% 50% 75% 100% 1.00 1.60 4.35 5.10 6.90
We can also find the 19th, 21th, and 46th percentiles of the petal.length in the data set iris.
data(iris) ln <- iris$Petal.Length quantile(ln, probs = c(0.19, 0.21, 0.46))
19% 21% 46% 1.500 1.500 4.154
Applications of calculating percentile in R
If you have a long dataset, finding the numbers that describe a given percentage in a dataset can tell you much about it. For example, it can tell you how concentrated and skewed the values are.
That is it for calculating percentile in R tutorial.