Creating new linters
lintr maintainers
2024-03-25
Source:vignettes/creating_linters.Rmd
creating_linters.Rmd
This vignette describes the steps necessary to create a new linter.
See the last section for some details specific to writing new linters for lintr.
A good example of a simple linter is the
pipe_call_linter
.
#' Pipe call linter
#'
#' Force explicit calls in magrittr pipes, e.g.,
#' `1:3 %>% sum()` instead of `1:3 %>% sum`.
#'
#' @evalRd rd_tags("pipe_call_linter")
#' @seealso [linters] for a complete list of linters available in lintr.
#' @export
pipe_call_linter <- function() {
xpath <- "//expr[preceding-sibling::SPECIAL[text() = '%>%'] and *[1][self::SYMBOL]]"
Linter(function(source_expression) {
if (!is_lint_level(source_expression, "expression")) {
return(list())
}
xml <- source_expression$xml_parsed_content
bad_expr <- xml2::xml_find_all(xml, xpath)
xml_nodes_to_lints(
bad_expr,
source_expression = source_expression,
lint_message = "Use explicit calls in magrittr pipes, i.e., `a %>% foo` should be `a %>% foo()`.",
type = "warning"
)
})
}
Let’s walk through the parts of the linter individually.
Writing the linter
#' Pipe call linter
#'
#' Force explicit calls in magrittr pipes, e.g.,
#' `1:3 %>% sum()` instead of `1:3 %>% sum`.
Describe the linter, giving it a title and briefly covering the usages that are discouraged when the linter is active.
#' @evalRd rd_tags("pipe_call_linter")
#' @seealso [linters] for a complete list of linters available in lintr.
#' @export
These lines (1) generate a Tags section in the documentation for the
linter1;
(2) link to the full table of available linters; and (3) mark the
function for export. The most unfamiliar here is probably (1), which can
be skipped outside of lintr
itself.
Next, we define the name of the new linter. The convention is to
suffix all linter names with _linter
. All
_linter
functions are function factories that return a
closure that will do the actual linting function. We could define
additional parameters that are useful for the linter in this function
declaration (see, e.g. assignment_linter
), but
pipe_call_linter
requires no additional arguments.
xpath <- "//expr[preceding-sibling::SPECIAL[text() = '%>%'] and *[1][self::SYMBOL]]"
Here is the core linter logic. xpath
is an XPath
expression for expressions matching the discouraged usage.
xpath
is saved inside the factory code (as opposed to
inside the linter itself) for efficiency. Often, the xpath
will be somewhat complicated / involve some assembly code using
paste()
or glue::glue()
[^See
infix_spaces_linter()
for an example of this], in which
case it is preferable to execute this code only once when creating the
linter; the cached XPath is then re-used on each expression in each file
where the linter is run.
Let’s examine the XPath a bit more closely:
//expr # global search (//) for 'expr' nodes (R expressions), at any nesting level
[ # node[...] looks for any 'node' satisfying conditions in ...
preceding-sibling:: # "siblings" are at the same nesting level in XML
SPECIAL[ # 'SPECIAL' is the parse token for infix operators like %% or %+%
text() = '%>%' # text() returns the string associated with this node
] #
and # combine conditions with 'and'
* # match any node
[1] # match the first such node
[self::SYMBOL] # match if the current node is a 'SYMBOL' (i.e., a 'name' in R)
] #
Taken together, that means we want to match expr
nodes
preceded by the %>%
infix operator whose first child
node is a name
. That maps pretty closely to the description
of what the pipe_call_linter
is looking for, but there is
subtlety in mapping between the R code you’re used to and how they show
up in the XML representation. expr
nodes in particular take
some practice to get accustomed to – use the plentiful XPath-based
linters in lintr
as a guide to get extra practice2. Note:
xml2
implements XPath 1.0, which lacks some helpful
features available in XPath 2.0.
This is the closure. It will be called on the
source_expression
variable that contains the top level
expressions in the file to be linted. The call to Linter()
gives this closure the class ‘linter’ (it also guesses the name of the
linter; see ?Linter
for more details).
The raw text of the expression is available from
source_file$content
. However, it is not generally possible
to implement linters from the raw text – consider
equals_na_linter
. If we just look for == NA
in
the text of the file, we’ll generate many false positives, e.g. in
comments (such as
# note: is.na() is the proper way to check == NA
) or inside
character literals (such as
warning("don't use == NA to check missingness")
). We’re
also likely to generate false negatives, for example when
==
and NA
appear on different lines! Working
around these issues using only the un-parsed text in every situation
amounts to re-implementing the parser.
Therefore it is recommended to work with the tokens from
source_file$parsed_content
or
source_file$xml_parsed_content
, as they are tokenized from
the R
parser. These tokens are obtained from
parse()
and utils::getParseData()
calls done
prior to calling the new linter. getParseData()
returns a
data.frame
with information from the source parse tree of
the file being linted. A list of tokens is available from r-source/src/main/gram.y.
source_file$xml_parsed_content
uses
xmlparsedata::xml_parse_data()
to convert the
getParseData()
output into an XML tree, which enables
writing linter logic in XPath, a
powerful language for expressing paths within the nested XML data
structure. Most linters in lintr
are built using XPath
because it is a powerful language for computation on the abstract syntax
tree / parse tree.
if (!is_lint_level(source_expression, "expression")) {
return(list())
}
Here, we return early if source_expression
is not the
expression-level object. get_source_expression()
returns an
object that parses the input file in two ways – once is done
expression-by-expression, the other contains all of the expressions in
the file. This is done to facilitate caching. Suppose your package has a
long source file (e.g., 100s of expressions) – rather than run linters
on every expression every time the file is updated, when caching is
activated lintr
will only run the linter again on
expressions that have changed.
Therefore, it is preferable to write expression-level linters
whenever possible. Two types of exceptions observed in
lintr
are (1) when several or all expressions are
required to ensure the linter logic applies (e.g.,
conjunct_test_linter
looks for consecutive calls to
stopifnot()
, which will typically appear on consecutive
expressions) or (2) when the linter logic is very simple & fast to
compute, so that the overhead of re-running the linter is low (e.g.,
single_quotes_linter
). In those cases, use
is_lint_level(source_expression, "file")
.
xml <- source_expression$xml_parsed_content
bad_expr <- xml2::xml_find_all(xml, xpath)
source_expression$xml_parsed_content
is copied to a
local variable (this is not strictly necessary but facilitates
debugging). Then xml2::xml_find_all()
is used to execute
the XPath on this particular expression. Keep in mind that it is
typically possible for a single expression to generate more than one
lint – for example, x %>% na.omit %>% sum
will
trigger the pipe_call_linter()
twice3.
xml_nodes_to_lints(
bad_expr,
source_expression = source_expression,
lint_message = "Use explicit calls in magrittr pipes, i.e., `a %>% foo` should be `a %>% foo()`.",
type = "warning"
)
Finally, we pass the matching XML node(s) to
xml_nodes_to_lints()
, which returns Lint
objects corresponding to any “bad” usages found in
source_expression
. See ?Lint
and
?xml_nodes_to_lints
for details about the arguments. Note
that here, the message
for the lint is always the same, but
for many linters, the message is customized to more closely match the
observed usage. In such cases, xml_nodes_to_lint()
can
conveniently accept a function in lint_message
which takes
a node as input and converts it to a customized message. See, for
example, seq_linter
.
Writing linter tests
(NB: this section uses the assignment_linter()
which has
simpler examples than pipe_continuation_linter()
.)
lintr works best inside the testthat
unit testing framework, in particular, lintr exports
lintr::expect_lint()
which is designed as a companion to
other testthat expectations.
You can define tests inside separate test_that
calls.
Most of the linters use the same default form.
You then test a series of expectations for the linter using
expect_lint
. Please see ?expect_lint
for a
full description of the parameters.
The main three aspects to test are:
- Linter returns no lints when there is nothing to lint, e.g.
expect_lint("blah", NULL, assignment_linter())
- Linter returns a lint when there is something to lint, e.g.
expect_lint("blah=1",
rex("Use <-, not =, for assignment."),
assignment_linter()
)
- As many edge cases as you can think of that might break it, e.g.
expect_lint("fun((blah = fun(1)))",
rex("Use <-, not =, for assignment."),
assignment_linter()
)
You may want to test that additional lint
attributes are
correct, such as the type, line number, column number, e.g.
expect_lint("blah=1",
list(message = "assignment", line_number = 1, column_number = 5, type = "style"),
assignment_linter()
)
Finally, it is a good idea to test that your linter reports multiple lints if needed, e.g.
expect_lint("blah=1; blah=2",
list(
list(line_number = 1, column_number = 5),
list(line_number = 1, column_number = 13),
)
assignment_linter()
)
It is always better to write too many tests rather than too few.
Other utilities for writing custom linters
Besides is_lint_level()
, lintr also
exports some other helpers generally useful for writing custom linters;
these are used a lot in the internals of our own helpers, and so they’ve
been tested and demonstrated their utility already.
-
get_r_string()
: Whenever your linter needs to examine the value of a character literal (e.g., whether an argument value is set to some string), use this to extract the string exactly as R will see it. This is especially important to make your logic robust to R-4-style raw strings likeR"-(hello)-"
, which is otherwise difficult to express, for example as an XPath.
Contributing to {lintr}
More details about writing tests for new {lintr}
linters
The lintr package uses testthat for testing. You
can run all of the currently available tests using
devtools::test()
. If you want to run only the tests in a
given file use the filter
argument to
devtools::test()
.
Linter tests should be put in the tests/testthat/
folder. The test filename should be the linter name prefixed by
test-
, e.g.
test-pipe_continuation_linter.R
.
Adding your linter to the default_linters
If your linter implements part of the tidyverse style guide you can
add it to default_linters
. This object is created in the
file zzz.R
(this name ensures that it will always run after
all the linters are defined). Add your linter name to the
default_linters
list before the NULL
at the
end, and add a corresponding test case to the test script
./tests/testthat/default_linter_testcode.R
.
Submit pull request
Push your changes to a branch of your fork of the lintr repository, and submit a pull request to get your linter merged into lintr!