R Advanced

How to Remove NA Values from Data Frame in R

NA values are missing values. They are somehow absent from a data frame. Before creating a model based on a data frame, we need to clean the data frame from missing values, and it depends on different scenarios.

Here are four different ways for different scenarios to remove NA values from a data frame:

  1. Use na.omit()
  2. Use complete.cases()
  3. Use is.na() with Subsetting
  4. Tidyverse approach

Here is the sample data frame that contains NA values:

df <- data.frame(
  name = c("Krunal", "Ankit", "Rushabh", "Dhaval", "Niva", NA),
  score = c(85, 90, 78, 92, 92, NA),
  subject = c("Math", "Math", NA, "History", "Biology", "Science"),
  grade = c("10th", "12th", "11th", "10th", NA, NA)
)

print(df)

Method 1: Using na.omit()

The main operation of the na.omit() function is to remove all the rows containing any NA values from a data frame.

df <- data.frame(
 name = c("Krunal", "Ankit", "Rushabh", "Dhaval", "Niva", NA),
 score = c(85, 90, 78, 92, 92, NA),
 subject = c("Math", "Math", NA, "History", "Biology", "Science"),
 grade = c("10th", "12th", "11th", "10th", NA, NA)
)

print(df)

# Removing rows with NA values
df_without_na <- na.omit(df)

print(df_without_na)

Output

The above output figure shows that rows 3, 5, and 6 have been removed because they all contain at least one NA value.

Method 2: Using complete.cases()

The complete.cases() function provides flexibility: You can either remove all the rows that contain NA values or remove rows with NA in specific columns.

df <- data.frame(
  name = c("Krunal", "Ankit", "Rushabh", "Dhaval", "Niva", NA),
  score = c(85, 90, 78, 92, 92, NA),
  subject = c("Math", "Math", NA, "History", "Biology", "Science"),
  grade = c("10th", "12th", "11th", "10th", NA, NA)
)

print(df)

# Removing rows with NA values
df_clean <- df[complete.cases(df), ]

print(df_clean)

Output

You can use the complete.cases() on selected columns like this:

Let’s remove NAs based on columns “name” and “grade”.

df <- data.frame(
  name = c("Krunal", "Ankit", "Rushabh", "Dhaval", "Niva", NA),
  score = c(85, 90, 78, 92, 92, NA),
  subject = c("Math", "Math", NA, "History", "Biology", "Science"),
  grade = c("10th", "12th", "11th", "10th", NA, NA)
)

print(df)

# Removing rows with NA values
df_clean_cols <- df[complete.cases(df[, c("name", "grade")]), ]

print(df_clean_cols)

Output

The “subject” column also has an NA value, but we did not specify that, so it is in the output.

Method 3: Use is.na() with Subsetting

Subsetting is a process where you apply specific conditions to a data frame and select the rows based on the outcome of those conditions. It is like selecting rows based on your filter.

You can subset rows where specific columns are not NA.

df <- data.frame(
  name = c("Krunal", "Ankit", "Rushabh", "Dhaval", "Niva", NA),
  score = c(85, 90, 78, 92, 92, NA),
  subject = c("Math", "Math", NA, "History", "Biology", "Science"),
  grade = c("10th", "12th", "11th", "10th", NA, NA)
)

print(df)

# Removing rows with NA values from column "name"
df_clean_name <- df[!is.na(df$name), ]

print(df_clean_name)

Output

Method 4: Using tidyverse (dplyr/tidyr)

The tidyr::drop_na() function is designed to drop rows containing missing values.

Install the tidyr package and then load it using the library() function.

library(tidyr)

df <- data.frame(
  name = c("Krunal", "Ankit", "Rushabh", "Dhaval", "Niva", NA),
  score = c(85, 90, 78, 92, 92, NA),
  subject = c("Math", "Math", NA, "History", "Biology", "Science"),
  grade = c("10th", "12th", "11th", "10th", NA, NA)
)

print(df)

# Removing all rows with NA values
df_clean <- df %>% drop_na()

print(df_clean)

Output

You can also remove rows with NA based on specific columns using the drop_na() function.

library(tidyr)

df <- data.frame(
  name = c("Krunal", "Ankit", "Rushabh", "Dhaval", "Niva", NA),
  score = c(85, 90, 78, 92, 92, NA),
  subject = c("Math", "Math", NA, "History", "Biology", "Science"),
  grade = c("10th", "12th", "11th", "10th", NA, NA)
)

print(df)

# Removing all rows with NA values
df_clean_cols <- df %>% drop_na(name, grade)

print(df_clean_cols)

Output

That’s all!

Recent Posts

cbind() Function: Binding R Objects by Columns

R cbind (column bind) is a function that combines specified vectors, matrices, or data frames…

1 week ago

rbind() Function: Binding Rows in R

The rbind() function combines R objects, such as vectors, matrices, or data frames, by rows.…

1 week ago

as.numeric(): Converting to Numeric Values in R

The as.numeric() function in R converts valid non-numeric data into numeric data. What do I…

2 weeks ago

Calculating Natural Log using log() Function in R

The log() function calculates the natural logarithm (base e) of a numeric vector. By default,…

3 weeks ago

Dollar Sign ($ Operator) in R

In R, you can use the dollar sign ($ operator)  to access elements (columns) of…

1 month ago

Calculating Absolute Value using abs() Function in R

The abs() function calculates the absolute value of a numeric input, returning a non-negative (only…

1 month ago