How to Create Scatterplots in R

Here are two ways to create scatterplots in R:

  1. Using plot()
  2. Using geom_point() function from ggplot2

A scatterplot is a set of dotted points representing individual pieces of data on the horizontal and vertical axis.

Method 1: Using plot()

The plot() is a generic function for plotting that can used to create basic graphs of a different type, including scatter plots.

Example 1: Creating a simple scatterplot

We will use the shows_data.csv file.

From that csv file, we will use Year and IMDb columns to draw a scatterplot.

To read a csv data, use the read.csv() function.

data <- read.csv("shows_data.csv")

df <- head(data)

print(df)

Output

Create a data frame in R

We will pluck the Year and IMDb columns to create a scatter plot.

Let’s create a scatterplot of 30 rows.

data <- read.csv("shows_data.csv")
df <- head(data, 30)
print(df)

x <- df$Year
y <- df$IMDb

plot(x, y, main = "IMDB vs Year",
 xlab = "Year", ylab = "IMDb Ratings",
 pch = 19)

Output

Create Scatter Plot in R with Example

Example 2: Use a built-in dataset

We will use the faithful dataset.

df <- head(faithful)
print(df)

Output

   eruptions  waiting
1   3.600       79
2   1.800       54
3   3.333       74
4   2.283       62
5   4.533       85
6   2.883       55

In the dataset faithful, we pair up the eruptions and wait for values in the same observation as (x, y) coordinates. Then we plot the points in the Cartesian plane.

df <- head(faithful)
print(df)

duration <- faithful$eruptions
waiting <- faithful$waiting

plot(duration, waiting,
 xlab = "Eruption duration",
 ylab = "Time waited",
 main = "Duration vs Time waited"
)

Output

Create Scatter Plot

Example 3: Enhanced solution

We can generate a linear regression model of the two variables with the lm function and then draw a trend line with abline.

abline(lm(waiting ~ duration))

See the below complete code.

df <- head(faithful)
print(df)

duration <- faithful$eruptions
waiting <- faithful$waiting

plot(duration, waiting,
 xlab = "Eruption duration",
 ylab = "Time waited",
 main = "Duration vs Time waited"
 )

abline(lm(waiting ~ duration))

Output

Enhanced Solution

Method 2: Using geom_point() function from ggplot2

You can use the geom_point() function from the ggplot2 package to create scatterplots, which are helpful for examining the relationship between two continuous variables.

First, you need to install and load ggplot2:

# Install ggplot2 if you haven't already

install.packages("ggplot2")

library(ggplot2)

Now, you can create a scatterplot:

library(ggplot2)

# Sample data
x <- rnorm(100)
y <- rnorm(100)

df <- data.frame(x, y)

# Scatterplot with ggplot2
ggplot(df, aes(x = x, y = y)) +
  geom_point() +
  ggtitle("Scatterplot") +
  xlab("X-axis") +
  ylab("Y-axis")

Output

Using the ggplot2 package to create scatterplot in R

Related posts

pch

lwd

barchart

bty

Leave a Comment