Require usage of direct methods for subsetting strings via regex
Source:R/regex_subset_linter.R
regex_subset_linter.Rd
Using value = TRUE
in grep()
returns the subset of the input that matches
the pattern, e.g. grep("[a-m]", letters, value = TRUE)
will return the
first 13 elements (a
through m
).
Details
letters[grep("[a-m]", letters)]
and letters[grepl("[a-m]", letters)]
both return the same thing, but more circuitously and more verbosely.
The stringr
package also provides an even more readable alternative,
namely str_subset()
, which should be preferred to versions using
str_detect()
and str_which()
.
Exceptions
Note that x[grep(pattern, x)]
and grep(pattern, x, value = TRUE)
are not completely interchangeable when x
is not character
(most commonly, when x
is a factor), because the output of the
latter will be a character vector while the former remains a factor.
It still may be preferable to refactor such code, as it may be faster
to match the pattern on levels(x)
and use that to subset instead.
See also
linters for a complete list of linters available in lintr.
Examples
# will produce lints
lint(
text = "x[grep(pattern, x)]",
linters = regex_subset_linter()
)
#> ::warning file=<text>,line=1,col=3::file=<text>,line=1,col=3,[regex_subset_linter] Prefer grep(pattern, x, ..., value = TRUE) over x[grep(pattern, x, ...)] and x[grepl(pattern, x, ...)].
lint(
text = "x[stringr::str_which(x, pattern)]",
linters = regex_subset_linter()
)
#> ::warning file=<text>,line=1,col=3::file=<text>,line=1,col=3,[regex_subset_linter] Prefer stringr::str_subset(x, pattern) over x[str_detect(x, pattern)] and x[str_which(x, pattern)].
# okay
lint(
text = "grep(pattern, x, value = TRUE)",
linters = regex_subset_linter()
)
lint(
text = "stringr::str_subset(x, pattern)",
linters = regex_subset_linter()
)