R sample Function: The Complete Guide

Statisticians usually need to take the dataset samples and then calculate the statistics. Taking a sample is easy with R because of the sample() method, as it is nothing more than a subset of data. 

R sample

The sample() in R is a built-in function that takes a sample of the specified size from the input elements using either with or without replacement. The sample() function takes data, size, replace, and prob as arguments.

By default, the sample() function randomly reorders the elements passed as the first argument. This means that the default size is the size of the given array.replace=TRUE.

Syntax

sample(data, size, replace = FALSE, prob = NULL)

Parameters

Data: It is either a vector of one or more elements from which to choose or a positive integer.

n: It is a positive number, the number of items to choose from.

size: It is a non-negative integer giving the number of items to choose from.

replace: Should sampling be with replacement?

prob: It is a vector of probability weights for obtaining the elements of the vector being sampled.

Example

Let’s define a numerical vector using :(colon operator) and sample the 5 values from that vector.

data <- 1:20
sample(data, 5, replace = FALSE, prob = NULL)

Output

[1] 17 6 13 11 19

In this example, we are creating a vector with 20 values. Then use the sample() method and pass the data vector and length, which will be five, which means it will pluck the random five elements from the vector and returns those values.

For the sample, the default for size is the number of items inferred from the first argument, so that sample(x) generates a random permutation of the elements of x (or 1:x).

If replace is FALSE, these probabilities are applied sequentially; that is, the probability of choosing the next element is proportional to the weights among the remaining items.

If the data vector has length 1, is numeric, and data >= 1, sampling via sample takes place from 1:x.

Applying replace = TRUE in sample() function

If you want to simulate rolls of a die and get 12 results, then you can use the sample() function and pass the length of 12, which will repeat the numbers since we will give the replace = TRUE. Please note that the dice have only 6 different numbers.

See the following code.

data <- 1:6
sample(data, 12, replace = TRUE)

Output

 [1] 4 3 5 1 2 2 2 3 6 6 5 6

You can see that some numbers are repeated 3 times, twosome two times, and some appear only once.

Because the return value of the sample() function is a randomly determined number, if you try the sample() function repeatedly, you will get different results every time. 

➜ R RScript Pro.R
[1] 4 3 5 1 2 2 2 3 6 6 5 6
➜ R RScript Pro.R
[1] 1 2 6 5 6 6 3 1 5 5 6 6
➜ R RScript Pro.R
[1] 4 1 1 6 1 6 1 3 5 5 5 2
➜ R RScript Pro.R
[1] 2 2 4 5 6 1 1 3 3 5 1 5
➜ R RScript Pro.R
[1] 3 4 3 1 1 2 1 5 6 3 3 2

You can see that we will get different outputs every time we run the program.

Random Reordering of Data using sample() function

The most common usage of the sample function is the random subsampling of data. First, let’s subsample the vector.

rv <- 1:20

sample(rv, size = 10)

Output

[1] 16  8  11  20  19  10  4  17  21  12

Generating a Sample from a Dataset

The sample() function can generate random sample rows from a dataset. 

len <- length(mtcars)
sample_rows <- sample(len, 10)
print(sample_rows)

Output

 [1]  8  1  6  9  2  10  11  3  5  7

Sampling with uneven probabilities using sample() function

To modify the probabilities of our random selection, pass the “prob” argument of the sample function.

rv <- 1:11

sample(rv, size = 10, replace = TRUE, prob = c(0.6, rep(0.1, 10)))

Output

 [1]  1  11  1  1  5  3  10  1  1  2

A random sampling of list elements using the sample() function

You can use the sample() function to get the random elements from the List in R.

lst <- list(
 1:5,
 833,
 c("K", "LLL", "Ouija"),
 "Board",
 5
)
len_list <- length(lst)
list_samp <- lst[sample(len_list, size = 3)]
list_samp

Output

[[1]]
[1] 1 2 3 4 5

[[2]]
[1] "K"  "LLL"  "Ouija"

[[3]]
[1] 833

Random sampling of data frame rows

To extract the random subset of rows from a data frame in R, use the sample() function.

df <- data.frame(a1 = 1:10,
 a2 = letters[1:10],
 a3 = letters[1:10],
 a4 = letters[1:10],
 a5 = letters[1:10],
 a6 = letters[1:10],
 a7 = letters[1:10],
 a8 = letters[1:10],
 a9 = letters[1:10],
 a10 = letters[1:10])

df_len <- length(df)

df_sample <- df[sample(seq_len(df_len), size = 3), ]

df_sample

Output

   a1  a2  a3  a4  a5  a6  a7  a8  a9  a10
8  8   h   h   h   h   h   h   h   h    h
1  1   a   a   a   a   a   a   a   a    a
10 10  j   j   j   j   j   j   j   j    j

That is it for the sample() function in R.

Leave a Comment