Setdiff in R: How to Calculate Set difference of Subsets

R setdiff() function can be used to find differences between two sets. Let’s deep dive into the setdiff() method.

setdiff in r

The setdiff is a built-in R function that calculates the set difference of subsets of a probability space. The setdiff() method shows which elements of a vector or data frame x do not exist in a vector or data frame y.

The elements of setdiff(x,y) are those elements in x but not in y.

Syntax

setdiff(x, y)

Arguments

x: It is either a vector or data frame.

y: It is either a vector or data frame.

Return value

It returns a data frame or subset of probability space of the same type as its arguments. If you use the setdiff() function oppositely, for example, Y and X are interchanged, you will get a different result.

Applying setdiff() method to numeric vectors in R

A vector is a fundamental data structure in R which has a sequence of items that share the same data type. To create a vector in R, use the c() function. For example, let’s create two vectors and then pass those vectors the setdiff() function.

rv <- c(19, 21, 11, 18, 22)
rv2 <- c(11, 18, 20, 22, 46)

setdiff(rv, rv2)

Output

[1] 19  21

In this example, the first vector(rv) has two 19 and 21 values that do not exist in the second vector(rv2); that’s why the setdiff() function returns these two values from the first vector. In short, the output values appear in x, but they do not appear in y.

Let’s oppositely use x and y and pass these two vectors to the setdiff() function.

rv <- c(19, 21, 11, 18, 21)
rv2 <- c(11, 18, 20, 22, 46)

setdiff(rv2, rv)

Output

[1] 20  22  46

You can see that the rv2 vector’s values will be there in the output, which does not exist in the rv vector.

Using setdiff() function on character vectors

A character vector in R consists of characters. Thus, the text in R is described by character vectors.

rv <- c("Shiba Inu", "Doge", "Bitcoin Cash")
rv2 <- c("Polkadot", "Bitcoin", "Bitcoin Cash")

setdiff(rv, rv2)

Output

[1] "Shiba Inu"   "Doge"

In this example, the output consists of character values that exist in the rv vector but not in the rv2 vector.

Applying setdiff() to data frames

A data frame is a tabular data structure in R that consists of rows and columns. To calculate the difference between two data frames in R, use the setdiff() function.

x <- data.frame(
 x1 = c(11, 21, 19, 46),
 x2 = c(51, 15, 11, 14),
 x3 = c(19, 21, 13, 41)
)

y <- data.frame(
 x1 = c(11, 14, 8, 1),
 x2 = c(51, 15, 1, 41),
 x3 = c(12, 42, 43, 4)
)

setdiff(x, y)

Output

   x1   x2   x3
1  11   51   19
2  21   15   21
3  19   11   13
4  46   14   41

Use third-party packages

To use the cards() function in R, first, install the prob package in your R-studio or environment.

After installing it, you need to call it on the head of the file.

We will apply the setdiff() function to the subset of the cards() data.

library("prob")

kads <- cards()
a <- subset(kads, suit == "Diamond")
v <- subset(kads, rank == "A")
setdiff(v, a)

Output

Loading required package: combinat

Attaching package: ‘combinat’

The following object is masked from ‘package:utils’:

 combn

Loading required package: fAsianOptions
Loading required package: timeDate
Loading required package: timeSeries
Loading required package: fBasics
Loading required package: fOptions

Attaching package: ‘prob’

The following objects are masked from ‘package:base’:

 intersect, setdiff, union

   rank   suit
13  A     Club
39  A     Heart
52  A     Spade

That’s it for this tutorial.

Leave a Comment