The grepl() function (stands for “grep logical”) in R searches for patterns within each element of a character vector. It returns a logical vector of the same length as input, where each value is either TRUE or FALSE. If the pattern matches that specific element, it returns TRUE. Otherwise, it returns FALSE.
The above figure shows that the grepl() function returns TRUE only for “GT” because it matches with that element, and others return FALSE.
It’s a variation of the grep() function. Using grepl(), you can filter the dataset, perform pattern matching, or text processing.
grepl(pattern, x, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE)
Name | Description |
pattern | It is a character string containing a regular expression or a fixed string if you set fixed=TRUE. |
x | It is a character vector in which to search for a pattern. |
perl | It is a logical flag. If set to TRUE, you can enable Perl-compatible regular expressions (PCRE). |
fixed | It is a logical flag. If set to TRUE, the pattern is interpreted as a fixed string, not a regex. |
useBytes | It is a logical flag. If TRUE, the matching is done byte-by-byte rather than character-by-character. |
Let’s initialize a character vector and check for the specific pattern (string) for each element in that vector.
# Character Vector
ipl <- c("CSK", "RCB", "GT", "DC", "KKR")
# Search for the pattern "GT"
matches <- grepl("GT", ipl)
print(matches)
# Output: [1] FALSE FALSE TRUE FALSE FALSE
Since GT is at index 3, the output shows that the third element is TRUE. All else is FALSE.
Let’s check for just “C” in the ipl vector:
# Character Vector
ipl <- c("CSK", "RCB", "GT", "DC", "KKR")
# Search for the pattern "GT"
matches <- grepl("C", ipl)
print(matches)
# Output: [1] TRUE TRUE FALSE TRUE FALSE
Since “C” appears in the first, second, and fourth elements, it returns those to TRUE, others to FALSE.
In the above code example, we checked for “C”, which is a capital letter. What if we check for the small letter “c” and find out the output:
# Character Vector
ipl <- c("CSK", "RCB", "GT", "DC", "KKR")
# Searching for the pattern "c"
matches <- grepl("c", ipl)
print(matches)
# Output: [1] FALSE FALSE FALSE FALSE FALSE
And we did not find a match? But why? The element “c” is there, but actually it is not there. The small “c” is not there in the character vector, “C” (capital c) is there. That means the case has been mismatched.
The solution is to set the ignore.case = TRUE:
# Character Vector
ipl <- c("CSK", "RCB", "GT", "DC", "KKR")
# Searching for the pattern "c"
matches <- grepl("c", ipl, ignore.case = TRUE)
print(matches)
# Output: [1] TRUE TRUE FALSE TRUE FALSE
And now it matches because we explicitly told it to ignore the cases while matching.
For pattern matching, you can use regular expressions. The only requirement is that you know the rules of regex to create a pattern that can match each individual element of a character vector.
# Character Vector
ipl <- c("CSK7", "RCB18", "GT", "DC", "KKR")
# Searching for the pattern "c"
matches <- grepl("\\d", ipl)
print(matches)
# Output: [1] TRUE TRUE FALSE FALSE FALSE
The pattern “\\d” searches for any element that has numbers in it. If it finds, it returns TRUE; otherwise, it returns FALSE.
Instead of passing a regex, we will pass the fixed string as a pattern. If it finds the exact string, it returns TRUE; else, it returns FALSE.
# Character Vector
ipl <- c("CSK7", "RCB.18", "GT", "DC", "KKR")
# Searching for the pattern "c"
matches <- grepl(".", ipl, fixed = TRUE)
print(matches)
# Output: [1] FALSE TRUE FALSE FALSE FALSE
Since the RCB.18 contains the “.” literal, it returns TRUE else FALSE.
Let’s use the built-in dataset mtcars and find the records that contain the letter “M”.
mtcars[grepl("^M", rownames(mtcars)), ]
You will find all the information about the grepl() method in the image below in RStudio.
The functions like grep(), grepl(), regexpr(), gregexpr(), and regexec() search for matches to argument patterns within every item of a character vector.
Krunal Lathiya is a seasoned Computer Science expert with over eight years in the tech industry. He boasts deep knowledge in Data Science and Machine Learning. Versed in Python, JavaScript, PHP, R, and Golang. Skilled in frameworks like Angular and React and platforms such as Node.js. His expertise spans both front-end and back-end development. His proficiency in the Python language stands as a testament to his versatility and commitment to the craft.
R cbind (column bind) is a function that combines specified vectors, matrices, or data frames…
The rbind() function combines R objects, such as vectors, matrices, or data frames, by rows.…
The as.numeric() function in R converts valid non-numeric data into numeric data. What do I…
The log() function calculates the natural logarithm (base e) of a numeric vector. By default,…
In R, you can use the dollar sign ($ operator) to access elements (columns) of…
The abs() function calculates the absolute value of a numeric input, returning a non-negative (only…