To get a better idea of the distribution of your variables in the dataset, use the summary() function. If you need a quick survey of your dataset, you can, of course, always use the R str() function and look at the structure.
summary() Function in R
The summary() is an inbuilt generic function in R used to produce result summaries of various model fitting functions. The summary() method entreats specific methods that depend on the class of the first argument.
summary(object, maxsum = 7, digits = max(3, getOption("digits")-3), …)
object: It is an object for which a summary is desired.
maxsum: It is an integer, indicating how many levels should be shown for factors.
digits: It is an integer, used for number formatting with signif().
The summary() function returns the value that depends on the class of its argument.
Let’s apply the summary() function to a vector that will act like the R object.
vec <- 1:5 vec cat("The summary() of vector is", "\n") summary(vec)
 1 2 3 4 5 The summary() of vector is Min. 1st Quantile Median Mean 3rd Quantile Max. 1 2 3 3 4 5
As you can see from the output that the summary() of a vector returns descriptive statistics such as the minimum, the 1st quantile, the median, the mean, the 3rd quantile, and the maximum value of our input data.
summary() function on R List
To get the summary of the list in R, use the summary() function. To define a list, use the list() function and pass the elements as arguments.
vec <- 1:5 list <- list(vec) cat("The summary() of list is", "\n") summary(vec)
The summary() of list is Min. 1st Qu. Median Mean 3rd Qu. Max. 1 2 3 3 4 5
summary() function on R Array
To get the summary of an array in R, use the summary() function. To create an array in R, use the array() function. The array() function takes a vector as an argument and uses the dim parameter to create an array.
rv <- c(19, 21) rv2 <- c(46, 4) arr <- array(c(rv, rv2), dim = c(2, 2, 2)) cat("The summary() of array is", "\n") summary(arr)
The summary() of array is Min. 1st Qu. Median Mean 3rd Qu. Max. 4.00 15.25 20.00 22.50 27.25 46.00
summary() function on R Matrix
To get the summary of a matrix in R, use the summary() function. To create a matrix in R, use the matrix() function, and pass the vector, nrow, and ncol parameters.
rv <- c(11, 18, 19, 21) mtrx <- matrix(rv, nrow = 2, ncol = 2) cat("The summary() of matrix is", "\n") summary(mtrx)
The summary() of matrix is V1 V2 Min. :11.00 Min. :19.0 1st Qu.:12.75 1st Qu.:19.5 Median :14.50 Median :20.0 Mean :14.50 Mean :20.0 3rd Qu.:16.25 3rd Qu.:20.5 Max. :18.00 Max. :21.0
summary() function on R data frame
To get the summary of a data frame in R, use the summary() function. To create a data frame in R, use data.frame() function.
df <- data.frame( service_id = c(1:5), service_name = c("Netflix", "Disney+", "HBOMAX", "Hulu", "Peacock"), service_price = c(18, 10, 15, 7, 12), stringsAsFactors = FALSE ) cat("The summary() of data frame is", "\n") summary(df)
The summary() of data frame is service_id service_name service_price Min. :1 Length:5 Min. : 7.0 1st Qu.:2 Class :character 1st Qu.:10.0 Median :3 Mode :character Median :12.0 Mean :3 Mean :12.4 3rd Qu.:4 3rd Qu.:15.0 Max. :5 Max. :18.0
summary() function on Linear Regression Model
Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable, and the other is considered to be a dependent variable.
A widespread application of the summary functions is the computation of summary statistics of statistical models. Let’s see the following code.
set.seed(93274) l_x <- rnorm(1000) l_y <- rnorm(1000) + l_x mod <- lm(l_y ~ l_x) summary(mod)
Call: lm(formula = l_y ~ l_x) Residuals: Min 1Q Median 3Q Max -3.7337 -0.6964 -0.0047 0.7333 3.3489 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.02159 0.03292 -0.656 0.512 l_x 1.00156 0.03262 30.707 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.041 on 998 degrees of freedom Multiple R-squared: 0.4858, Adjusted R-squared: 0.4853 F-statistic: 942.9 on 1 and 998 DF, p-value: < 2.2e-16
Our example data consists of two randomly distributed numeric vectors. We can estimate a linear regression model.
The data object mod contains the output of our linear regression. Now, we have applied the summary() function to this model object to print summary statistics for this model.
That is it for summary() function tutorial.