Using nrow(subset(x, condition))
to count the instances where condition
applies inefficiently requires doing a full subset of x
just to
count the number of rows in the resulting subset.
There are a number of equivalent expressions that don't require the full
subset, e.g. with(x, sum(condition))
(or, more generically,
with(x, sum(condition, na.rm = TRUE))
).
See also
linters for a complete list of linters available in lintr.
Examples
# will produce lints
lint(
text = "nrow(subset(x, is_treatment))",
linters = nrow_subset_linter()
)
#> <text>:1:1: warning: [nrow_subset_linter] Use arithmetic to count the number of rows satisfying a condition, rather than fully subsetting the data.frame and counting the resulting rows. For example, replace nrow(subset(x, is_treatment)) with sum(x$is_treatment). NB: use na.rm = TRUE if `is_treatment` has missing values.
#> nrow(subset(x, is_treatment))
#> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
lint(
text = "nrow(filter(x, is_treatment))",
linters = nrow_subset_linter()
)
#> <text>:1:1: warning: [nrow_subset_linter] Use arithmetic to count the number of rows satisfying a condition, rather than fully subsetting the data.frame and counting the resulting rows. For example, replace nrow(subset(x, is_treatment)) with sum(x$is_treatment). NB: use na.rm = TRUE if `is_treatment` has missing values.
#> nrow(filter(x, is_treatment))
#> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
lint(
text = "x %>% filter(x, is_treatment) %>% nrow()",
linters = nrow_subset_linter()
)
#> <text>:1:1: warning: [nrow_subset_linter] Use arithmetic to count the number of rows satisfying a condition, rather than fully subsetting the data.frame and counting the resulting rows. For example, replace nrow(subset(x, is_treatment)) with sum(x$is_treatment). NB: use na.rm = TRUE if `is_treatment` has missing values.
#> x %>% filter(x, is_treatment) %>% nrow()
#> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# okay
lint(
text = "with(x, sum(is_treatment, na.rm = TRUE))",
linters = nrow_subset_linter()
)
#> ℹ No lints found.