Fix for CRAN checks.
Better compatibility with rlang 1.0.0 errors. More to come soon.
Fix for CRAN checks.
tidyselect has been re-licensed as MIT (#217).
Predicate functions must now be wrapped with
We made this change to avoid puzzling error messages when a variable is unexpectedly missing from the data frame and there is a corresponding function in the environment:
# Attempts to invoke `data()` function data.frame(x = 1) %>% select(data)
Now tidyselect will correctly complain about a missing variable rather than trying to invoke a function.
For compatibility we will support predicate functions starting with
is for 1 version.
Fixed issue preventing repeated deprecation messages when
tidyselect_verbosity is set to
This is the 1.0.0 release of tidyselect. It features a more solidly defined and implemented syntax, support for predicate functions, new boolean operators, and much more.
New Get started vignette for client packages. Read it with
vignette("tidyselect") or at https://tidyselect.r-lib.org/articles/tidyselect.html.
The definition of the tidyselect language has been consolidated. A technical description is now available: https://tidyselect.r-lib.org/articles/syntax.html.
all_of()instead. Referring to contextual objects with a bare name is brittle because it might be masked by a data frame column. Using
all_of()is safe (#76).
tidyselect now uses vctrs for validating inputs. These changes may reveal programming errors that were previously silent. They may also cause failures if your unit tests make faulty assumptions about the content of error messages created in tidyselect:
Out-of-bounds errors are thrown when a name doesn’t exist or a location is too large for the input.
Logical vectors now fail properly.
Selected variables now must be unique. It was previously possible to return duplicate selections in some circumstances.
The input names can no longer contain
Note that we recommend
testthat::verify_output() for monitoring error messages thrown from packages that you don’t control. Unlike
verify_output() does not cause CMD check failures when error messages have changed. See https://www.tidyverse.org/blog/2019/11/testthat-2-3-0/ for more information.
The boolean operators can now be used to create selections (#106).
!negates a selection.
|takes the union of two selections.
&takes the intersection of two selections.
Many thanks to Irene Steves (@isteves) for suggesting this UI.
You can now use predicate functions in selection contexts:
Improved support for named elements. It is now possible to assign the same name to multiple elements, if the input data structure doesn’t require unique names (i.e. anything but a data frame).
The selection engine has been rewritten to support a clearer separation between data-expressions (calls to
c) and env-expressions (anything else). This means you can now safely use expressions of the type:
Even if the data frame
data contains a column also named
data, the subexpression
ncol(data) is still correctly evaluated. The
data:ncol(data) expression is equivalent to
data is looked up in the relevant context without ambiguity:
While this example above is a bit contrived, there are many realistic cases where these changes make it easier to write safe code:
The new selection helpers
any_of() are strict variants of
one_of(). The former always fails if some variables are unknown, while the latter does not.
all_of() is safer to use when you expect all selected variables to exist.
any_of() is useful in other cases, for instance to ensure variables are selected out:
Selection helpers like
starts_with() are now available in all selection contexts, even when they haven’t been attached to the search path. The most visible consequence of this change is that it is now easier to use selection functions without attaching the host package:
# Before dplyr::select(mtcars, dplyr::starts_with("c")) # After dplyr::select(mtcars, starts_with("c"))
It is still recommended to export the helpers from your package so that users can easily look up the documentation with
starts_with(c("a", "b")) starts_with("a") | starts_with("b")
Better support for selecting with S3 vectors. For instance, factors are treated as characters.
Take the full data rather than just names. This makes it possible to use function predicates in selection context.
Return a numeric vector of locations rather than a vector of names. This makes it possible to use tidyselect with inputs that support duplicate names, like regular vectors.
.strict argument of
vars_select() now works more robustly and consistently.
Using arithmetic operators in selection context now fails more informatively (#84).
It is now possible to select columns in data frames containing duplicate variables (#94). However, the duplicates can’t be part of the final selection.
eval_rename() has better support for existing duplicates (but creating new duplicates is an error).
vars_pull() now includes the faulty expression in error messages.
tidyselect is now much faster with many columns, thanks to a performance fix in
rlang::env_bind() as well as internal fixes.
- now supports character vectors in addition to strings. This makes it easy to unquote column names to exclude from the set:
vars <- c("cyl", "am", "disp", "drat") vars_select(names(mtcars), - !!vars)
last_col() now issues an error when the variable vector is empty.
The main point of this release is to revert a troublesome behaviour introduced in tidyselect 0.1.0. It also includes a few features.
The special evaluation semantics for selection have been changed back to the old behaviour because the new rules were causing too much trouble and confusion. From now on data expressions (symbols and calls to
c()) can refer to both registered variables and to objects from the context.
However the semantics for context expressions (any calls other than to
c()) remain the same. Those expressions are evaluated in the context only and cannot refer to registered variables.
If you’re writing functions and refer to contextual objects, it is still a good idea to avoid data expressions. Since registered variables are change as a function of user input and you never know if your local objects might be shadowed by a variable. Consider:
n <- 2 vars_select(letters, 1:n)
Should that select up to the second element of
letters or up to the 14th? Since the variables have precedence in a data expression, this will select the 14 first letters. This can be made more robust by turning the data expression into a context expression:
vars_select(letters, seq(1, n))
You can also use quasiquotation since unquoted arguments are guaranteed to be evaluated without any user data in scope. While equivalent because of the special rules for context expressions, this may be clearer to the reader accustomed to tidy eval:
vars_select(letters, seq(1, !! n))
Finally, you may want to be more explicit in the opposite direction. If you expect a variable to be found in the data but not in the context, you can use the
vars_select(names(mtcars), .data$cyl : .data$drat)
The new select helper
last_col() is helpful to select over a custom range:
- now handle strings as well. This makes it easy to unquote a column name:
(!!name) : last_col() or
vars_select() gains a
.strict argument similar to
rename_vars(). If set to
FALSE, errors about unknown variables are ignored.
vars_select() now treats
NULL as empty inputs. This follows a trend in the tidyverse tools.
vars_rename() is now implemented with the tidy eval framework. Like
vars_select(), expressions are evaluated without any user data in scope. In addition a variable context is now established so you can write rename helpers. Those should return a single round number or a string (variable position or variable name).
The selection helpers are now exported in a list
vars_select_helpers. This is intended for APIs that embed the helpers in the evaluation environment.
varshas been renamed to
.varsto avoid spurious matching.
We took this opportunity to make a few changes to the API:
rename_vars() are now
vars_rename(). This follows the tidyverse convention that a prefix corresponds to the input type while suffixes indicate the output type. Similarly,
select_var() is now
The arguments are now prefixed with dots to limit argument matching issues. While the dots help, it is still a good idea to splice a list of captured quosures to make sure dotted arguments are never matched to
vars_select()’s named arguments:
vars_select(vars, !!! quos(...))
Error messages can now be customised. For consistency with dplyr, error messages refer to “columns” by default. This assumes that the variables being selected come from a data frame. If this is not appropriate for your DSL, you can now add an attribute
vars_type to the
.vars vector to specify alternative names. This must be a character vector of length 2 whose first component is the singular form and the second is the plural. For example,
tidyselect provides a few more ways of establishing a variable context:
scoped_vars() sets up a variable context along with an an exit hook that automatically restores the previous variables. It is the preferred way of changing the variable context.
with_vars() takes variables and an expression and evaluates the latter in the context of the former.
poke_vars() establishes a new variable context. It returns the previous context invisibly and it is your responsibility to restore it after you are done. This is for expert use only.
The evaluation semantics for selecting verbs have changed. Symbols are now evaluated in a data-only context that is isolated from the calling environment. This means that you can no longer refer to local variables unless you are explicitly unquoting these variables with
!!, which is mostly for expert use.
Note that since dplyr 0.7, helper calls (like
starts_with()) obey the opposite behaviour and are evaluated in the calling context isolated from the data context. To sum up, symbols can only refer to data frame objects, while helpers can only refer to contextual objects. This differs from usual R evaluation semantics where both the data and the calling environment are in scope (with the former prevailing over the latter).