Here are two ways to create scatterplots in R:
- Using plot()
- Using geom_point() function from ggplot2
A scatterplot is a set of dotted points representing individual pieces of data on the horizontal and vertical axis.
Method 1: Using plot()
The plot() is a generic function for plotting that can used to create basic graphs of a different type, including scatter plots.
Example 1: Creating a simple scatterplot
We will use the shows_data.csv file.
From that csv file, we will use Year and IMDb columns to draw a scatterplot.
To read a csv data, use the read.csv() function.
data <- read.csv("shows_data.csv")
df <- head(data)
print(df)
Output
We will pluck the Year and IMDb columns to create a scatter plot.
Let’s create a scatterplot of 30 rows.
data <- read.csv("shows_data.csv")
df <- head(data, 30)
print(df)
x <- df$Year
y <- df$IMDb
plot(x, y, main = "IMDB vs Year",
xlab = "Year", ylab = "IMDb Ratings",
pch = 19)
Output
Example 2: Use a built-in dataset
We will use the faithful dataset.
df <- head(faithful)
print(df)
Output
eruptions waiting
1 3.600 79
2 1.800 54
3 3.333 74
4 2.283 62
5 4.533 85
6 2.883 55
In the dataset faithful, we pair up the eruptions and wait for values in the same observation as (x, y) coordinates. Then we plot the points in the Cartesian plane.
df <- head(faithful)
print(df)
duration <- faithful$eruptions
waiting <- faithful$waiting
plot(duration, waiting,
xlab = "Eruption duration",
ylab = "Time waited",
main = "Duration vs Time waited"
)
Output
Example 3: Enhanced solution
We can generate a linear regression model of the two variables with the lm function and then draw a trend line with abline.
abline(lm(waiting ~ duration))
See the below complete code.
df <- head(faithful)
print(df)
duration <- faithful$eruptions
waiting <- faithful$waiting
plot(duration, waiting,
xlab = "Eruption duration",
ylab = "Time waited",
main = "Duration vs Time waited"
)
abline(lm(waiting ~ duration))
Output
Method 2: Using geom_point() function from ggplot2
You can use the geom_point() function from the ggplot2 package to create scatterplots, which are helpful for examining the relationship between two continuous variables.
First, you need to install and load ggplot2:
# Install ggplot2 if you haven't already
install.packages("ggplot2")
library(ggplot2)
Now, you can create a scatterplot:
library(ggplot2)
# Sample data
x <- rnorm(100)
y <- rnorm(100)
df <- data.frame(x, y)
# Scatterplot with ggplot2
ggplot(df, aes(x = x, y = y)) +
geom_point() +
ggtitle("Scatterplot") +
xlab("X-axis") +
ylab("Y-axis")
Output
Related posts
Krunal Lathiya is a seasoned Computer Science expert with over eight years in the tech industry. He boasts deep knowledge in Data Science and Machine Learning. Versed in Python, JavaScript, PHP, R, and Golang. Skilled in frameworks like Angular and React and platforms such as Node.js. His expertise spans both front-end and back-end development. His proficiency in the Python language stands as a testament to his versatility and commitment to the craft.