R Basic

Mastering grepl() Function in R

The grepl() function (stands for “grep logical”) in R searches for patterns within each element of a character vector. It returns a logical vector of the same length as input, where each value is either TRUE or FALSE. If the pattern matches that specific element, it returns TRUE. Otherwise, it returns FALSE.

The above figure shows that the grepl() function returns TRUE only for “GT” because it matches with that element, and others return FALSE.

It’s a variation of the grep() function. Using grepl(), you can filter the dataset, perform pattern matching, or text processing.

Syntax

grepl(pattern, x, ignore.case = FALSE, perl = FALSE,
      fixed = FALSE, useBytes = FALSE)

Parameters

Name Description
pattern It is a character string containing a regular expression or a fixed string if you set fixed=TRUE.
x It is a character vector in which to search for a pattern.
perl It is a logical flag. If set to TRUE, you can enable Perl-compatible regular expressions (PCRE).
fixed It is a logical flag. If set to TRUE, the pattern is interpreted as a fixed string, not a regex.
useBytes It is a logical flag. If TRUE, the matching is done byte-by-byte rather than character-by-character.

Basic pattern matching

Let’s initialize a character vector and check for the specific pattern (string) for each element in that vector.

# Character Vector
ipl <- c("CSK", "RCB", "GT", "DC", "KKR")

# Search for the pattern "GT"
matches <- grepl("GT", ipl)

print(matches)

# Output: [1] FALSE FALSE TRUE FALSE FALSE

Since GT is at index 3, the output shows that the third element is TRUE. All else is FALSE.

Let’s check for just “C” in the ipl vector:

# Character Vector
ipl <- c("CSK", "RCB", "GT", "DC", "KKR")

# Search for the pattern "GT"
matches <- grepl("C", ipl)

print(matches)

# Output: [1] TRUE TRUE FALSE TRUE FALSE

Since “C” appears in the first, second, and fourth elements, it returns those to TRUE, others to FALSE.

Case-Insensitive Search

In the above code example, we checked for “C”, which is a capital letter. What if we check for the small letter “c” and find out the output:

# Character Vector
ipl <- c("CSK", "RCB", "GT", "DC", "KKR")

# Searching for the pattern "c"
matches <- grepl("c", ipl)

print(matches)

# Output: [1] FALSE FALSE FALSE FALSE FALSE

And we did not find a match? But why? The element “c” is there, but actually it is not there. The small “c” is not there in the character vector, “C” (capital c) is there. That means the case has been mismatched.

The solution is to set the ignore.case = TRUE:

# Character Vector
ipl <- c("CSK", "RCB", "GT", "DC", "KKR")

# Searching for the pattern "c"
matches <- grepl("c", ipl, ignore.case = TRUE)

print(matches)

# Output: [1] TRUE TRUE FALSE TRUE FALSE

And now it matches because we explicitly told it to ignore the cases while matching.

Regular Expressions

For pattern matching, you can use regular expressions. The only requirement is that you know the rules of regex to create a pattern that can match each individual element of a character vector.

# Character Vector
ipl <- c("CSK7", "RCB18", "GT", "DC", "KKR")

# Searching for the pattern "c"
matches <- grepl("\\d", ipl)

print(matches)

# Output: [1] TRUE TRUE FALSE FALSE FALSE

The pattern “\\d” searches for any element that has numbers in it. If it finds, it returns TRUE; otherwise, it returns FALSE.

Fixed String Matching

Instead of passing a regex, we will pass the fixed string as a pattern. If it finds the exact string, it returns TRUE; else, it returns FALSE.

# Character Vector
ipl <- c("CSK7", "RCB.18", "GT", "DC", "KKR")

# Searching for the pattern "c"
matches <- grepl(".", ipl, fixed = TRUE)

print(matches)

# Output: [1] FALSE TRUE FALSE FALSE FALSE

Since the RCB.18 contains the “.” literal, it returns TRUE else FALSE.

Filtering Data Frames

Let’s use the built-in dataset mtcars and find the records that contain the letter “M”.

mtcars[grepl("^M", rownames(mtcars)), ]

grepl() in RStudio

You will find all the information about the grepl() method in the image below in RStudio.

The functions like grep(), grepl(), regexpr(), gregexpr(), and regexec() search for matches to argument patterns within every item of a character vector.

Recent Posts

cbind() Function: Binding R Objects by Columns

R cbind (column bind) is a function that combines specified vectors, matrices, or data frames…

2 weeks ago

rbind() Function: Binding Rows in R

The rbind() function combines R objects, such as vectors, matrices, or data frames, by rows.…

2 weeks ago

as.numeric(): Converting to Numeric Values in R

The as.numeric() function in R converts valid non-numeric data into numeric data. What do I…

3 weeks ago

Calculating Natural Log using log() Function in R

The log() function calculates the natural logarithm (base e) of a numeric vector. By default,…

4 weeks ago

Dollar Sign ($ Operator) in R

In R, you can use the dollar sign ($ operator)  to access elements (columns) of…

1 month ago

Calculating Absolute Value using abs() Function in R

The abs() function calculates the absolute value of a numeric input, returning a non-negative (only…

2 months ago