R agrep() Function: How to Match Patterns in R String

The agrep() method in R uses the TRE, a portable POSIX-compliant pattern-matching library that supports relative (fuzzy) matching. If the fixed is FALSE, the pattern is interpreted as a POSIX-extended regular expression, like grep when called with perl=FALSE. Let’s see the definition of the grep() function and how to use it in R programming.

R agrep() Function

The agrep() is a built-in R function used to search for approximate matches to pattern within each element of the given string. The agrep() function searches for approximate matches to the pattern and within each string element using the generalized Levenshtein edit distance.

Syntax

agrep(pattern, x, max.distance = 0.1, costs = NULL,
      ignore.case = FALSE, value = FALSE, fixed = TRUE,
      useBytes = FALSE)

Arguments

pattern: Specified pattern which is going to be matched with given elements of the string.

x: Specified string vector.

ignore.case: If its value is TRUE, it ignores the case.

max.distance: Maximum distance allowed for a match. Expressed either as integer or as a fraction of the pattern length times the maximal transformation cost or a list with possible components.

costs: It is a numeric vector or list with names partially matching insertions, deletions, and substitutions giving the respective costs for computing the generalized Levenshtein distance, or NULL (default), indicating using the unit cost for all three possible transformations.

value: If its value is TRUE, it returns the matching elements vector, else returns the indices vector.

logical: It is a logical argument, and If TRUE (default), the pattern is matched literally (as is). Otherwise, it is reached as a regular expression.

useBytes: It is a logical argument in a multibyte locale, should the comparison be character-by-character (the default) or byte-by-byte.

Return Value

The agrep() function returns a vector giving the indices of the elements that yielded a match, or if the value is TRUE the matched elements.

Example

Let’s define a vector with four elements. We will define two pairs. One pair has two elements whose values are the same but with one difference: its case. So one will be uppercase, and one will be in lowercase. The second pair has the same rule.

To create a Vector in R, use the c( ) function.

data <- c("R LANG", "r lang", "LOS", "los")

You can see that the first pair value is R LANG and r lang. The second pair is LOS and los.

Now, we will apply the agrep() function to this vector with different arguments and see what it returns.

data <- c("R LANG", "r lang", "LOS", "los")

# Calling agrep() function 

agrep("R LANG", data)
agrep("LOS", data)

Output

[1] 1
[1] 3

The agrep() function tries to match the input string to the vector of elements, and if it found the match, then it will return the index of the vector element. By default, the method is case sensitive, so if the values are the same but cases are different, then it will compare and does not match.

The output of R LANG is 1 because it is found on the first element of the vector and the same for the LOS, which is on the third element. Please remember that the vector index starts with 1 and not 0.

Passing ignore.case = TRUE

If we pass the optional argument, ignore.case = TRUE, then regardless of its case, the agrep() function only try to match the value and completely ignores the cases, and returns the index based on the value.

data <- c("R LANG", "r lang", "LOS", "los")

# Calling agrep() function 

agrep("r lang", data, ignore.case = TRUE)
agrep("los", data, ignore.case = TRUE)

Output

[1] 1 2
[1] 3 4

And it completely ignores the cases and returns the indices based on their values.

Passing max = 1 to the agrep() function.

Let’s pass the max = 1 to the agrep() function in R and see the output.

data <- c("R LANG", "r lang", "LOS", "los")

# Calling agrep() function 

agrep("r lang", data, ignore.case = TRUE, max = 1)

Output

[1] 1 2

Other Examples

Let’s see some of the following other examples.

agrep("sheldor", "1 sheldor 2")
agrep("sheldor", c(" 1 sheldon 2", "1 sheldor 2"), max = list(sub = 0))
agrep("sheldy", c("1 sheldon", "1", "1 SHELDON"), max = 2)
agrep("sheldy", c("1 sheldon", "1", "1 SHELDON"), max = 2, value = TRUE)
agrep("sheldy", c("1 sheldon", "1", "1 SHELDON"), max = 2, ignore.case = TRUE)

Output

[1] 1
[1] 1 2
[1] 1
[1] "1 sheldon"
[1] 1 3

That is it for agrep() function in R programming language.

Leave a Comment