Mastering grepl() Function in R

The grepl() function (stands for “grep logical”) in R searches for patterns within each element of a character vector. It returns a logical vector of the same length as input, where each value is either TRUE or FALSE. If the pattern matches that specific element, it returns TRUE. Otherwise, it returns FALSE.

Basic usage of grepl() function in R

The above figure shows that the grepl() function returns TRUE only for “GT” because it matches with that element, and others return FALSE.

It’s a variation of the grep() function. Using grepl(), you can filter the dataset, perform pattern matching, or text processing.

Syntax

grepl(pattern, x, ignore.case = FALSE, perl = FALSE,
      fixed = FALSE, useBytes = FALSE)

Parameters

Name Description
pattern It is a character string containing a regular expression or a fixed string if you set fixed=TRUE.
x It is a character vector in which to search for a pattern.
perl It is a logical flag. If set to TRUE, you can enable Perl-compatible regular expressions (PCRE).
fixed It is a logical flag. If set to TRUE, the pattern is interpreted as a fixed string, not a regex.
useBytes It is a logical flag. If TRUE, the matching is done byte-by-byte rather than character-by-character.

Basic pattern matching

Let’s initialize a character vector and check for the specific pattern (string) for each element in that vector.

# Character Vector
ipl <- c("CSK", "RCB", "GT", "DC", "KKR")

# Search for the pattern "GT"
matches <- grepl("GT", ipl)

print(matches)

# Output: [1] FALSE FALSE TRUE FALSE FALSE

Since GT is at index 3, the output shows that the third element is TRUE. All else is FALSE.

Let’s check for just “C” in the ipl vector:

single character matching using grepl()

# Character Vector
ipl <- c("CSK", "RCB", "GT", "DC", "KKR")

# Search for the pattern "GT"
matches <- grepl("C", ipl)

print(matches)

# Output: [1] TRUE TRUE FALSE TRUE FALSE

Since “C” appears in the first, second, and fourth elements, it returns those to TRUE, others to FALSE.

Case-Insensitive Search

In the above code example, we checked for “C”, which is a capital letter. What if we check for the small letter “c” and find out the output:

# Character Vector
ipl <- c("CSK", "RCB", "GT", "DC", "KKR")

# Searching for the pattern "c"
matches <- grepl("c", ipl)

print(matches)

# Output: [1] FALSE FALSE FALSE FALSE FALSE

And we did not find a match? But why? The element “c” is there, but actually it is not there. The small “c” is not there in the character vector, “C” (capital c) is there. That means the case has been mismatched.

The solution is to set the ignore.case = TRUE:

# Character Vector
ipl <- c("CSK", "RCB", "GT", "DC", "KKR")

# Searching for the pattern "c"
matches <- grepl("c", ipl, ignore.case = TRUE)

print(matches)

# Output: [1] TRUE TRUE FALSE TRUE FALSE

And now it matches because we explicitly told it to ignore the cases while matching.

Regular Expressions

For pattern matching, you can use regular expressions. The only requirement is that you know the rules of regex to create a pattern that can match each individual element of a character vector.

# Character Vector
ipl <- c("CSK7", "RCB18", "GT", "DC", "KKR")

# Searching for the pattern "c"
matches <- grepl("\\d", ipl)

print(matches)

# Output: [1] TRUE TRUE FALSE FALSE FALSE

The pattern “\\d” searches for any element that has numbers in it. If it finds, it returns TRUE; otherwise, it returns FALSE.

Fixed String Matching

Instead of passing a regex, we will pass the fixed string as a pattern. If it finds the exact string, it returns TRUE; else, it returns FALSE.

# Character Vector
ipl <- c("CSK7", "RCB.18", "GT", "DC", "KKR")

# Searching for the pattern "c"
matches <- grepl(".", ipl, fixed = TRUE)

print(matches)

# Output: [1] FALSE TRUE FALSE FALSE FALSE

Since the RCB.18 contains the “.” literal, it returns TRUE else FALSE.

Filtering Data Frames

Let’s use the built-in dataset mtcars and find the records that contain the letter “M”.

mtcars[grepl("^M", rownames(mtcars)), ]

Output of Filtering Data Frames

grepl() in RStudio

You will find all the information about the grepl() method in the image below in RStudio.

grepl() function in RStudio

The functions like grep(), grepl(), regexpr(), gregexpr(), and regexec() search for matches to argument patterns within every item of a character vector.

Leave a Comment