R Basic

What is grepl() Function in R

The grepl() function (stands for “grep logical”) in R searches for patterns within each element of a character vector. It returns a logical vector of the same length as input, where each value is either TRUE or FALSE.

If the pattern matches that specific element, it returns TRUE. Otherwise, it returns FALSE.

The above figure shows that the grepl() returns TRUE only for “GT” because it matches with that element, and others return FALSE. It’s a variation of the grep() function.

Syntax

grepl(pattern, x, ignore.case = FALSE, perl = FALSE,
      fixed = FALSE, useBytes = FALSE)

Parameters

Name Description
pattern It is a character string containing a regular expression or a fixed string if you set fixed=TRUE.
x It is a character vector in which to search for a pattern.
ignore.case If TRUE, the case is ignored during matching. It is case-insensitive matching.

If FALSE, the matching will be case sensitive.

perl It is a logical flag. If set to TRUE, you can enable Perl-compatible regular expressions (PCRE).
fixed It is a logical flag. If set to TRUE, the pattern is interpreted as a fixed string, not a regex.
useBytes It is a logical flag. If TRUE, the matching is done byte-by-byte rather than character-by-character.

Basic pattern matching

Let’s initialize a character vector and check for the specific pattern (string) for each element in that vector.

# Character Vector
ipl <- c("CSK", "RCB", "GT", "DC", "KKR")

# Search for the pattern "GT"
matches <- grepl("GT", ipl)

print(matches)

# Output: [1] FALSE FALSE TRUE FALSE FALSE

Since GT is at index 3, the output shows that the third element is TRUE. All else is FALSE.

Let’s check for just “C” in the ipl vector:

# Character Vector
ipl <- c("CSK", "RCB", "GT", "DC", "KKR")

# Search for the pattern "GT"
matches <- grepl("C", ipl)

print(matches)

# Output: [1] TRUE TRUE FALSE TRUE FALSE

Since “C” appears in the first, second, and fourth elements, it returns those to TRUE, others to FALSE.

Case-Insensitive Search

In the above code example, we checked for “C”, which is a capital letter. What if we check for the small letter “c” and find out the output:

# Character Vector
ipl <- c("CSK", "RCB", "GT", "DC", "KKR")

# Searching for the pattern "c"
matches <- grepl("c", ipl)

print(matches)

# Output: [1] FALSE FALSE FALSE FALSE FALSE

And we did not find a match? But why? The element “c” is there, but it’s actually not there.

The small “c” is not there in the character vector, “C” (capital c) is there. That means the case has been mismatched.

The solution is to set the ignore.case = TRUE:

# Character Vector
ipl <- c("CSK", "RCB", "GT", "DC", "KKR")

# Searching for the pattern "c"
matches <- grepl("c", ipl, ignore.case = TRUE)

print(matches)

# Output: [1] TRUE TRUE FALSE TRUE FALSE

And now it matches because we explicitly told it to ignore the cases while matching.

Regular Expressions

For pattern matching, you can use regular expressions.

The only requirement is that you know the rules of regex to create a pattern that can match each individual element of a character vector.

# Character Vector
ipl <- c("CSK7", "RCB18", "GT", "DC", "KKR")

# Searching for the pattern "c"
matches <- grepl("\\d", ipl)

print(matches)

# Output: [1] TRUE TRUE FALSE FALSE FALSE

The pattern “\\d” searches for any element that has numbers in it. If it finds, it returns TRUE; otherwise, it returns FALSE.

Fixed String Matching

Instead of passing a regex, we will pass the fixed string as a pattern. 

If it finds the exact string, it returns TRUE; else, it returns FALSE.

# Character Vector
ipl <- c("CSK7", "RCB.18", "GT", "DC", "KKR")

# Searching for the pattern "c"
matches <- grepl(".", ipl, fixed = TRUE)

print(matches)

# Output: [1] FALSE TRUE FALSE FALSE FALSE

Since the RCB.18 contains the “.” literal, it returns TRUE else FALSE.

Filtering Data Frames

Let’s use the built-in dataset mtcars and find the records that contain the letter “M”.

mtcars[grepl("^M", rownames(mtcars)), ]

grepl() in RStudio

You will find all the information about the grepl() method in the image below in RStudio.

The functions like grep(), grepl(), regexpr(), gregexpr(), and regexec() search for matches to argument patterns within every item of a character vector.

Recent Posts

summary() Function: Producing Summary Statistics in R

The summary() is a generic function that produces the summary statistics for various R objects,…

4 days ago

R paste() Function

The paste() function in R concatenates vectors after converting them to character. paste("Hello", 19, 21,…

2 weeks ago

paste0() Function in R

R paste0() function concatenates strings without any separator between them. It is a shorthand version…

2 weeks ago

How to Calculate Standard Error in R

Standard Error (SE) measures the variability or dispersion of the sample mean estimate of a…

2 weeks ago

R max() and min() Functions

max() The max() function in R finds the maximum value of a vector or data…

3 weeks ago

R as.Date() Function: Working with Dates

The as.Date() function in R converts various types of date and time objects or character…

3 weeks ago