How to Use the predict() Function in R

The predict() function in R is “used to make predictions based on input data”. You can use it with linear regression models, logistic regression models, decision trees, and other models to predict the dependent variable’s value for new, unseen data points.

Syntax

predict(object, newdata, interval = "confidence", level = 0.95)

Parameters

  1. object: The fitted model object, typically created using functions like lm(), glm(), or other model-fitting functions in R.
  2. newdata: A data frame containing the values of the independent variables for which you want to predict the dependent variable. The column names in newdata should match the independent variables used in the model.
  3. interval: A character string specifying the type of interval to compute. Set this to “confidence” to obtain confidence intervals.
  4. level: The confidence level for the interval, typically set to 0.95 for a 95% confidence interval. This can be changed to any value between 0 and 1.

Return value

The predict() function returns a vector or matrix of predicted values, depending on the type of model and the nature of the dependent variable (e.g., continuous, binary, categorical).

Example: R program of predict() function

# Load the dataset
data(mtcars)

# Fit a linear model with miles per gallon (mpg)
# As the dependent variable and weight (wt) as the independent variable
model <- lm(mpg ~ wt, data = mtcars)

# Create a data frame with new data points
# For which we want to predict the mpg
new_data <- data.frame(wt = c(2.5, 3.0))

# Use the predict() function to predict mpg for the new data points
predicted_mpg <- predict(model, newdata = new_data)

# Print the predicted values
print(predicted_mpg)

Output

   1          2
23.92395   21.25171

In this code, we used the predict() function with a fitted linear model and a data frame new_data containing new data points to predict the mpg (miles per gallon) values. The output is a vector of predicted values for each row in the new_data dataframe.

Confidence in the Predicted Values

You can obtain confidence intervals for the predicted values by setting the interval argument to “confidence” when using the predict() function in R.

Confidence intervals give you an idea of the uncertainty around the predicted values, considering the variability in the data and the estimation uncertainty in the model parameters.

# Load the dataset
data(mtcars)

# Fit a linear model with miles per gallon (mpg)
# As the dependent variable and weight (wt) as the independent variable
model <- lm(mpg ~ wt, data = mtcars)

# Create a data frame with new data points
# For which we want to predict the mpg
new_data <- data.frame(wt = c(2.5, 3.0))

# Use the predict() function to predict mpg
# For the new data points and obtain 95% confidence intervals
predicted_mpg_confidence <- predict(model,
  newdata = new_data,
  interval = "confidence", level = 0.95
)

# Print the predicted values and confidence intervals
print(predicted_mpg_confidence)

Output

     fit        lwr      upr
1  23.92395  22.55284  25.29506
2  21.25171  20.12444  22.37899

The output is a matrix with three columns: fit, lwr, and upr. The fit column contains the predicted values, while the lwr and upr columns contain the lower and upper bounds of the confidence intervals, respectively.

Leave a Comment