R grep(): Finding a Position of Matched Pattern

The grep() function in R  searches for matches to a pattern within a character vector. It returns indices or values of elements that match the pattern. It is part of a family of functions that includes grepl(), regexpr(), gregexpr(), sub(), and gsub().

Syntax

grep(
  pattern, 
  char, 
  ignore.case = FALSE, 
  value = FALSE, 
  invert = FALSE, 
  perl = FALSE, 
  fixed = FALSE, 
  useBytes = FALSE, 
  ...
)

Parameters

Name Value
pattern It can be a regular expression or a fixed string to search for.
char It is a character vector where the search occurs.
ignore.case By default, it is FALSE, but if set to TRUE, the match is case-insensitive.
value If set to TRUE, it returns matching values. By default, it is FALSE.
perl If set to TRUE, you can write Perl-compatible regex for complex searching.
fixed By default, it is FALSE, but if TRUE, pattern is a fixed string, not a regular expression.
usedBytes If TRUE, it matches byte codes rather than characters. By default, it is FALSE.

Return Value

By default, it returns indices, but if you pass value = TRUE, it returns the value instead of the index. It returns a character vector of the elements of x that matched.

Basic usage

Figure of using grep() function in R

vec <- c("Amazon", "Apple", "Netflix", "Spotify")

print(grep("i", vec))

Output

[1] 3  4

Case Insensitivity

Figure of passing ignore.case argument to the grep() function

In the above figure, we are searching for the character “a” in our input character vector. The “a” character does not exist in the input character vector, but “A” exists. Because we are passing “ignore.case = TRUE”, that means now, it will look for “a” or “A” and since “A” exists, it will return the index for the vector that contains A.

rv <- c("Amazon", "Apple", "Netflix", "Spotify")

print(grep("a", rv, ignore.case = TRUE))

# Output: [1] 1  2

Passing multiple patterns

This function can check for multiple character patterns in the vector of character strings and returns the indices of elements that contain the pattern.

Figure of passing multiple patterns to the grep() function in R

rv <- c("Amazon", "Apple", "Netflix", "Spotify")

print(grep("o|i", rv, ignore.case = TRUE))

# Output: [1] 1  3  4

Returning Matching Elements

If you set the argument value=TRUE, it will return the actual matched elements themselves instead of their indices.

Figure of Passing value = TRUE argument to the grep() function

rv <- c("Amazon", "Apple", "Netflix", "Spotify")

print(grep("o|i", rv, ignore.case = TRUE, value = TRUE))

# Output: [1] "Amazon" "Netflix" "Spotify"

Fixed Strings

To match the pattern as a fixed string rather than a regular expression, use fixed = TRUE:

rv <- c("Amazon", "Apple", "Netflix", "Spotify")

print(grep("A", rv, fixed = TRUE))

# Output: [1] 1  2

Perl-compatible regular expressions

You can write Perl-compatible regular expressions that can help write complex patterns and matches. 

IDs <- c("ID:219", "ID:4567", "ID:89")

# match exactly three digits
grep("^ID:\\d{3}$", IDs, perl = TRUE, value = TRUE)

# Output: [1] "ID:219"

We searched for an element with exactly three digits and found the first one.

Invert match (elements that don’t match)

By passing invert = TRUE, you are saying that you don’t want to include the elements that match. Only include non-matching elements.

rv <- c("Amazon", "Apple", "Netflix", "Spotify")

grep("A", rv, invert = TRUE, value = TRUE)

# Output: [1] "Netflix" "Spotify"

Searching file names returned by the list.files()

Let’s say you want to know the list of only CSV files in your current directory. Here, you can use the list.files() method with the grep() method to get precisely what you want.

csv_files <- grep("\\.csv$", list.files(), value = TRUE)

print(csv_files)

# Output:
# [1] "data_types.csv" "data.csv" "input_domains.csv"
# [4] "missing_data.csv"

That’s it.

Leave a Comment