R Advanced

Understanding of rnorm() Function in R

The rnorm() method in R generates random numbers from a normal (Gaussian) distribution, which is characterized by a bell-shaped curve defined by its mean and standard deviation, which are widely used in statistical simulations and data analysis.

To understand in simple terms, let me take an example.

Let’s say I want to select some candies. Now, some candies are small, some are big, but most of them are just the right size—not too small, not too big. That “just right size” is called the mean (the average).

We selected five candies. Most should be around size 10, but they can be slightly larger or smaller (by about 2 sizes). The size of 2, smaller or bigger, is the standard deviation. Here, the total number of random numbers is 5. Now, you can understand what is mean and what is sd in random numbers.

For generating the same number multiple times, you can use a set.seed() method.

Syntax

rnorm(n, mean, sd)

Parameters

Name	Description
n	It specifies the number of random numbers to generate, and this value must be a positive integer.
mean	It is a value of the observation data. The default value is 0.
sd	Standard deviation. Its default value is 1.

Return value

By default, it returns numbers from a standard normal distribution (mean = 0, standard deviation = 1).

Generating standard normal random numbers (NRG)

Let’s generate 10 random numbers from a standard normal distribution (mean=0, sd=1).

data <- rnorm(10) 

print(data)

# Output: [1] -1.5360670 0.2471094 -1.1806552 -1.1448586 -0.6113512 
#         [6] 0.3448430  0.9522712 0.3176696 2.3503968 -0.1918144

Custom mean and standard deviation

Let’s generate five random numbers with a mean of 100 and a standard deviation of 15.

set.seed(456)

iq_scores <- rnorm(5, mean = 100, sd = 15)

print(iq_scores)

# Output: [1] 79.84718 109.32663 112.01312 79.16661 89.28465

We have 10 observations whose value is around the mean of 2.

Simulating the heights of the human population

Let’s consider a real-world example where we want to simulate the heights of 100 men, whose average height is 170 cm and whose standard deviation is 10 cm.

That means we have a mean of 170 and 10 plus or minus, which is sd.

data <- rnorm(100, 170, 10)

print(data)

Output

 [1] 158.2510 180.6479 187.3056 191.6965 194.6436 156.7804 173.1956
 [8] 179.4331 176.0277 164.2596 154.9793 169.3680 165.8675 145.3365
 [15] 195.5529 151.9154 161.6852 171.3717 174.6469 145.7286 161.3719
 [22] 190.5701 167.9142 163.6796 157.5316 172.2447 174.3563 171.3076
 [29] 166.3947 185.0756 167.1701 171.4087 151.0265 163.9975 177.3185
 [36] 176.2810 179.3618 166.9730 166.9589 174.7424 166.5725 181.3311
 [43] 175.2131 152.9896 189.9179 161.2098 171.1099 174.0227 172.7908
 [50] 163.0921 164.6657 163.1869 175.9643 177.6391 178.2297 163.2634
 [57] 168.8777 164.9482 157.1909 175.8665 154.4594 178.2447 172.8234
 [64] 168.0787 168.0108 153.4720 163.0311 154.9616 166.4673 184.2978
 [71] 153.4157 164.4439 180.2366 170.2234 168.1334 167.0586 185.9537
 [78] 169.2638 172.0199 183.1606 162.9529 163.0902 167.0587 181.1529
 [85] 163.2014 174.1792 169.4121 162.7256 169.6268 164.5417 170.7820
 [92] 181.8166 171.0858 181.1536 153.4309 167.6885 170.7262 185.7054
 [99] 174.6625 180.9943

The output indicates that most men have a height of around 170 cm, which is the mean value.

The standard deviation of 10 means that men might have heights of 160 cm or 180 cm, and there will be some outliers.

Let’s plot the chart using ggplot2 based on these values:

# Load the ggplot2 package
library(ggplot2)

# Setting the seed for reproducibility
set.seed(123)

# Generating random heights
heights <- rnorm(100, mean = 170, sd = 10)

# Creating a data frame (ggplot2 works best with data frames)
heights_df <- data.frame(heights = heights)

# Plot the histogram
ggplot(heights_df, aes(x = heights)) +
  geom_histogram(binwidth = 1, fill = "blue", color = "black") +
  labs(title = "Distribution of Simulated Heights",
  x = "Height (cm)",
  y = "Frequency")

We plotted a chart based on the normally distributed heights of men, whose mean value is 170 cm and sd is 10.

Generating test scores for a class

Let’s consider an example of generating test scores for a class in which a student averages 75.

A standard deviation of 15 means some students also get around 60 and 90 scores, and some of them have 100 scores, which you can count as outliers.

# Set the seed for reproducibility
set.seed(123)

test_scores <- rnorm(30, mean = 75, sd = 15)

print(test_scores)

Output

[1] 66.59287 71.54734 98.38062 76.05763 76.93932 100.72597 81.91374
[8] 56.02408 64.69721 68.31507 93.36123 80.39721 81.01157 76.66024
[15] 66.66238 101.80370 82.46776 45.50074 85.52034 67.90813 58.98264
[22] 71.73038 59.60993 64.06663 65.62441 49.69960 87.56681 77.30060
[29] 57.92795 93.80722

Here is a code that plots a histogram that represents the normal random distribution of scores:

# Install ggplot2 if it's not already installed
if(!require(ggplot2)){
  install.packages("ggplot2")
  library(ggplot2)
}

# Set the seed for reproducibility
set.seed(123)

# Generate 30 random numbers with mean = 75 and sd = 15
test_scores <- rnorm(30, mean = 75, sd = 15)

# Create a data frame from the test scores
test_scores_df <- data.frame(score = test_scores)

# Plot the histogram using ggplot2
ggplot(test_scores_df, aes(x = score)) +
  geom_histogram(binwidth = 5, fill = "green", color = "black") +
  ggtitle("Histogram of Simulated Test Scores") +
  xlab("Test Score") +
  ylab("Frequency")

Output

We used a histogram to illustrate the random distribution of test scores, and you can see that I have highlighted the mean, standard deviation, and outliers in the chart above.

Monte Carlo Simulation

Let’s use the rnorm() method to estimate the mean of a normal distribution via simulation.

set.seed(101)

n_sim <- 10000

samples <- rnorm(n_sim, mean = 10, sd = 2)

estimated_mean <- mean(samples)

print(estimated_mean)

# Output: [1] 10.01056

The sample mean (~10.01) is close to the true mean (10), demonstrating how to use this method in simulations to approximate population parameters.

Krunal Lathiya

Krunal Lathiya is a seasoned Computer Science expert with over eight years in the tech industry. He boasts deep knowledge in Data Science and Machine Learning. Versed in Python, JavaScript, PHP, R, and Golang. Skilled in frameworks like Angular and React and platforms such as Node.js. His expertise spans both front-end and back-end development. His proficiency in the Python language stands as a testament to his versatility and commitment to the craft.