A scatterplot in R is a set of dotted points representing individual pieces of data on the horizontal and vertical axis. The first argument of the plot() function is the x-axis variable, and the second argument is the y-axis variable.
plot(x, y, main, xlab, ylab, xlim, ylim, axes)
- x: It is the data set whose values are the horizontal coordinates.
- y: It is the data set whose values are the vertical coordinates.
- main: It is the tile of the graph.
- xlab: It is the label on the horizontal axis.
- ylab: It is the label on the vertical axis.
- xlim: It is the limit of the values of x used for plotting.
- ylim: It is the limit of the values of y used for plotting.
- axes: It indicates whether both axes should be drawn on the plot.
Example 1: Simple scatterplot
For creating a scatterplot, we will use the shows_data.csv file.
From that csv file, we will use Year and IMDb columns to draw a scatterplot.
To read a csv data, use the read.csv() function.
data <- read.csv("shows_data.csv") df <- head(data) print(df)
We will pluck the Year and IMDb columns to create a scatter plot.
Let’s create a scatterplot of 30 rows.
data <- read.csv("shows_data.csv") df <- head(data, 30) print(df) x <- df$Year y <- df$IMDb plot(x, y, main = "IMDB vs Year", xlab = "Year", ylab = "IMDb Ratings", pch = 19)
Example 2: Use a built-in dataset to create a scatterplot
We will use the faithful dataset.
df <- head(faithful) print(df)
eruptions waiting 1 3.600 79 2 1.800 54 3 3.333 74 4 2.283 62 5 4.533 85 6 2.883 55
In the dataset faithful, we pair up the eruptions and waiting for values in the same observation as (x, y) coordinates. Then we plot the points in the Cartesian plane.
df <- head(faithful) print(df) duration <- faithful$eruptions waiting <- faithful$waiting plot(duration, waiting, xlab = "Eruption duration", ylab = "Time waited", main = "Duration vs Time waited" )
We can generate a linear regression model of the two variables with the lm function and then draw a trend line with abline.
abline(lm(waiting ~ duration))
See the below complete code.
df <- head(faithful) print(df) duration <- faithful$eruptions waiting <- faithful$waiting plot(duration, waiting, xlab = "Eruption duration", ylab = "Time waited", main = "Duration vs Time waited" ) abline(lm(waiting ~ duration))