grep in R: How to Use grep() Function in R

Suppose we wanted to identify the records for all the victims of floods in China. How could we do that? Here I use grep() to match the literal into the character vector of flooded victims. Let’s see how to use grep() function in R and difference between grep() and grepl() functions.

grep in R

The grep() is a built-in R function that searches for matches to argument patterns within each element of a character vector. 

Syntax

grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,
     fixed = FALSE, useBytes = FALSE, invert = FALSE)

Arguments

pattern: It takes a character string containing a regular expression (or character string for fixed = TRUE) to match the given character vector.

x: It is a character vector where matches are sought, or an object which can be coerced by as.character to a character vector.

ignore.case: If FALSE, the pattern matching is case sensitive, and if TRUE, the case is ignored during matching.

perl: It is a logical argument and should Perl-compatible regular expressions be used.

value: If it is a FALSE, a vector containing the (integer) indices of the matches determined by grep() is returned, and if TRUE, a vector containing the matching elements themselves is returned.

fixed: It is a logical argument, and If it is TRUE, the pattern is a string to be matched as is. Overrides all conflicting arguments.

useBytes: It is a logical argument, and if it is TRUE, the matching is done byte-by-byte rather than character-by-character.

invert: It is logical, and If it is TRUE, return indices or values for elements that do not match.

Return Value

The grep() function returns a vector of the indices of the elements of x that yielded a match (or not, for invert = TRUE).

Example

Let’s search character “a” in the character string.

data <- c("Newgen", "Happiest Minds", "Tata Elxsi", "LTTS")

print(grep("a", data))

Output

[1] 2 3

In this example, “a” is matched with Happiest Minds and Tata Elxsi. So, it returns the index of these strings.

It searches for matches of the input character “a” within the example vector data and returns the indices of vector elements that contain the character “a”.

grep() vs. grepl() functions in R

The grepl() is a built-in function that searches for matches of a string or string vector. The grepl() function returns TRUE if a string contains the pattern, otherwise FALSE. The grep() function searches for matches of a certain character pattern.

These grep() and grepl() functions search for matches of a regular expression/pattern in a character vector. The grep() returns the indices into the character vector that contains a match or the specific strings that happen to have the match. grepl() returns a TRUE/FALSE vector indicating which elements of the character vector contain a match.

data <- c("Newgen", "Happiest Minds", "Tata Elxsi", "LTTS")

print(grep("a", data))

print(grepl("a", data))

Output

[1] 2 3

[1] FALSE TRUE TRUE FALSE

You can see that the grepl() function returns boolean values instead of returns the indices like the grep() function.

Apply grep() function with Multiple Patterns

The grep() function checks for multiple character patterns in our vector of character strings.

data <- c("Newgen", "Happiest Minds", "Tata Elxsi", "LTTS")

print(grep("a|t", data))

Output

[1] 2  3

In this example, we are searching a or t in a string, and it found two strings that contain both character vectors.

That’s it for the grep() function in R.

Leave a Comment