NA in R: How to Represent Missing Data in R

R has two ways of representing missing data that are NaN and Na. The NaN means “Not a Number”, and it means there is a result, but it cannot be represented in the computer. The second is NA means Not Available, which means that the data is missing for unknown reasons. So let’s deep dive into NA in R.

NA in R

To represent the missing data in R, use the NA symbol. The NA stands for Not Available. NA is a creation of R used to express a value that is not known as a placeholder.

NA is not a member of the IEEE standard for floating-point numbers.

Use NA in vector to fill the missing values

Let’s define a vector with an NA value and use the is.na() function to check if which component has NA value, and in that case, it returns TRUE as a logical vector; other values will be FALSE.

rv <- c(11, NA, 18, 19, 21)
rv
is.na(rv)

Output

[1] 11 NA 18 19 21
[1] FALSE TRUE FALSE FALSE FALSE

As you can see that our second component of the vector contains an NA value.

The test for missing values

To find the missing values in R, use the is.na() method, which returns the logical vector with TRUE. In our example, is.na() method returns TRUE to that second component, and all the others are FALSE.

Using NA in Matrix to fill the missing values.

Let’s fill the empty values of the matrix with NA values and see the output.

rv <- c(11, NA, 18, 19, 21, 46, NA, 29, 20)
mtrx <- matrix(rv, nrow = 3, ncol = 3)
mtrx

Output

     [,1] [,2] [,3]
[1,]  11   19   NA
[2,]  NA   21   29
[3,]  18   46   20

To check the NA values in Matrix, use the is.na() function.

rv <- c(11, NA, 18, 19, 21, 46, NA, 29, 20)
mtrx <- matrix(rv, nrow = 3, ncol = 3)
is.na(mtrx)

Output

      [,1]  [,2]  [,3]
[1,] FALSE FALSE  TRUE
[2,] TRUE  FALSE  FALSE
[3,] FALSE FALSE  FALSE

Apply mathematical operation to NA values

If you add any numeric numbers to NA, then it will result in NA.

21 + NA
sqrt(NA)
NA + NA

Output

[1] NA
[1] NA
[1] NA

And we get the NA in all the outputs, but if we add NA + NaN, it will return NaN.

NaN + NA

Output

[1] NaN

How to exclude NA values from the analysis

If you are calculating a mean of vector and that vector contains NA values, then you can exclude that NA value and calculate the mean of remaining values. But if you don’t exclude the NA, then it will return NA in the output.

rv <- c(1, 2, NA, 4, 5)
mean(rv)

Output

[1] NA

To exclude the NA value, pass the na.rm=TRUE as a second parameter in the mean() function.

rv <- c(1, 2, NA, 4, 5)
mean(rv, na.rm = TRUE)

Output

[1] 3

That means it has a calculated mean of 4 values(1, 2, 4, 5) whose sum is 12 and the mean is 3.

To remove the NA values from a vector, use the na.omit() function. The na.omit() method returns the object with listwise deletion of missing values.

rv <- c(1, 2, NA, 4, 5)
na.omit(rv)

Output

[1] 1 2 4 5
attr(,"na.action")
[1] 3
attr(,"class")
[1] "omit"

As you can see in the output that NA is omitted.

Difference between NA and NaN in R

NaN is different from NA. NaN implies a result that cannot be calculated for whatever reason or is not a floating-point number.

Numeric calculations whose result is undefined, such as ‘0/0’, give the NaN value. NA is usually interpreted as a missing value and has several modes, including NA_integer_ and NA_real_.

Conclusion

If you are working with an external dataset, then you often come across NA values, and handle those values is a difficult task but using proper functions and methods, you can efficiently handle that value. NA is more of a placeholder stating that it has a missing value. That is it for this tutorial.

Leave a Comment