FAQ - Note: Using an external vector in selections is ambiguous
Source:R/faq.R
faq-external-vector.Rd
Ambiguity between columns and external variables
With selecting functions like dplyr::select()
or
tidyr::pivot_longer()
, you can refer to variables by name:
mtcars %>% select(cyl, am, vs)
#> # A tibble: 32 x 3
#> cyl am vs
#> <dbl> <dbl> <dbl>
#> 1 6 1 0
#> 2 6 1 0
#> 3 4 1 1
#> 4 6 0 1
#> # i 28 more rows
mtcars %>% select(mpg:disp)
#> # A tibble: 32 x 3
#> mpg cyl disp
#> <dbl> <dbl> <dbl>
#> 1 21 6 160
#> 2 21 6 160
#> 3 22.8 4 108
#> 4 21.4 6 258
#> # i 28 more rows
For historical reasons, it is also possible to refer an external vector of variable names. You get the correct result, but with a warning informing you that selecting with an external variable is ambiguous because it is not clear whether you want a data frame column or an external object.
vars <- c("cyl", "am", "vs")
result <- mtcars %>% select(vars)
#> Warning: Using an external vector in selections was deprecated in tidyselect
#> 1.1.0.
#> i Please use `all_of()` or `any_of()` instead.
#> # Was:
#> data %>% select(vars)
#>
#> # Now:
#> data %>% select(all_of(vars))
#>
#> See
#> <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this
#> warning was generated.
We have decided to deprecate this particular approach to using external vectors because they introduce ambiguity. Imagine that the data frame contains a column with the same name as your external variable.
some_df <- mtcars[1:4, ]
some_df$vars <- 1:nrow(some_df)
These are very different objects but it isn’t a problem if the context
forces you to be specific about where to find vars
:
vars
#> [1] "cyl" "am" "vs"
some_df$vars
#> [1] 1 2 3 4
In a selection context however, the column wins:
some_df %>% select(vars)
#> # A tibble: 4 x 1
#> vars
#> <int>
#> 1 1
#> 2 2
#> 3 3
#> 4 4
Fixing the ambiguity
To make your selection code more robust and silence the message, use
all_of()
to force the external vector:
some_df %>% select(all_of(vars))
#> # A tibble: 4 x 3
#> cyl am vs
#> <dbl> <dbl> <dbl>
#> 1 6 1 0
#> 2 6 1 0
#> 3 4 1 1
#> 4 6 0 1
For more information or if you have comments about this, please see the Github issue tracking the deprecation process.