The summary() function in R returns the minimum value, first quartile (25th percentile), median (50th percentile), mean, third quartile (75th percentile), and maximum value.
In real-life data sets, this is often one of the first functions applied after data importation or model fitting to get an initial understanding of the data or model results.
To get a high-level overview of an object like a dataset, a vector, or the results of a statistical model, use this function.
Syntax
summary(data, maxsum)
Parameters
- data: It is an R object for which you want a summary.
- maxsum: An integer suggests how many levels should be shown for factors.
Return Value
The output varies greatly depending on the type of object it is applied to.
Example 1: Summary of data frame
df <- data.frame(
service_id = c(1:5),
service_name = c("Netflix", "Disney+", "HBOMAX", "Hulu", "Peacock"),
service_price = c(18, 10, 15, 7, 12),
stringsAsFactors = FALSE
)
cat("The summary() of data frame is", "\n")
summary(df)
Output
The summary() of data frame is
service_id service_name service_price
Min. :1 Length:5 Min. : 7.0
1st Qu.:2 Class :character 1st Qu.:10.0
Median :3 Mode :character Median :12.0
Mean :3 Mean :12.4
3rd Qu.:4 3rd Qu.:15.0
Max. :5 Max. :18.0
Example 2: Summary of list
vec <- 1:5
list <- list(vec)
cat("The summary() of list is", "\n")
summary(list)
Output
The summary() of list is
Length Class Mode
[1,] 5 -none- numeric
Example 3: Summary of array
rv <- c(19, 21)
rv2 <- c(46, 4)
arr <- array(c(rv, rv2), dim = c(2, 2, 2))
cat("The summary() of array is", "\n")
summary(arr)
Output
The summary() of array is
Min. 1st Qu. Median Mean 3rd Qu. Max.
4.00 15.25 20.00 22.50 27.25 46.00
Example 4: Summary of matrix
rv <- c(11, 18, 19, 21)
mtrx <- matrix(rv, nrow = 2, ncol = 2)
cat("The summary() of matrix is", "\n")
summary(mtrx)
Output
The summary() of matrix is
V1 V2
Min. :11.00 Min. :19.0
1st Qu.:12.75 1st Qu.:19.5
Median :14.50 Median :20.0
Mean :14.50 Mean :20.0
3rd Qu.:16.25 3rd Qu.:20.5
Max. :18.00 Max. :21.0
Example 5: Summary of vector
vec <- 1:5
vec
cat("The summary() of vector is", "\n")
summary(vec)
Output
[1] 1 2 3 4 5
The summary() of vector is
Min. 1st Quantile Median Mean 3rd Quantile Max.
1 2 3 3 4 5
Example 6: Summary of linear regression model
Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered an explanatory variable, and the other is a dependent variable.
A widespread application of the summary functions is the calculation of summary statistics of statistical models.
set.seed(93274)
l_x <- rnorm(1000)
l_y <- rnorm(1000) + l_x
mod <- lm(l_y ~ l_x)
summary(mod)
Output
Call:
lm(formula = l_y ~ l_x)
Residuals:
Min 1Q Median 3Q Max
-3.7337 -0.6964 -0.0047 0.7333 3.3489
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.02159 0.03292 -0.656 0.512
l_x 1.00156 0.03262 30.707 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.041 on 998 degrees of freedom
Multiple R-squared: 0.4858, Adjusted R-squared: 0.4853
F-statistic: 942.9 on 1 and 998 DF, p-value: < 2.2e-16
For more detailed or specific summaries, other functions like str(), table(), or specialized packages for statistical modeling might be necessary.
Krunal Lathiya is a seasoned Computer Science expert with over eight years in the tech industry. He boasts deep knowledge in Data Science and Machine Learning. Versed in Python, JavaScript, PHP, R, and Golang. Skilled in frameworks like Angular and React and platforms such as Node.js. His expertise spans both front-end and back-end development. His proficiency in the Python language stands as a testament to his versatility and commitment to the craft.