This selection helper selects the variables for which a
function returns TRUE
.
Examples
Selection helpers can be used in functions like dplyr::select()
or tidyr::pivot_longer()
. Let's first attach the tidyverse:
where()
takes a function and returns all variables for which the
function returns TRUE
:
is.factor(iris[[4]])
#> [1] FALSE
is.factor(iris[[5]])
#> [1] TRUE
iris %>% select(where(is.factor))
#> # A tibble: 150 x 1
#> Species
#> <fct>
#> 1 setosa
#> 2 setosa
#> 3 setosa
#> 4 setosa
#> # i 146 more rows
is.numeric(iris[[4]])
#> [1] TRUE
is.numeric(iris[[5]])
#> [1] FALSE
iris %>% select(where(is.numeric))
#> # A tibble: 150 x 4
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> <dbl> <dbl> <dbl> <dbl>
#> 1 5.1 3.5 1.4 0.2
#> 2 4.9 3 1.4 0.2
#> 3 4.7 3.2 1.3 0.2
#> 4 4.6 3.1 1.5 0.2
#> # i 146 more rows
The formula shorthand
You can use purrr-like formulas as a shortcut for creating a function on the spot. These expressions are equivalent:
iris %>% select(where(is.numeric))
#> # A tibble: 150 x 4
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> <dbl> <dbl> <dbl> <dbl>
#> 1 5.1 3.5 1.4 0.2
#> 2 4.9 3 1.4 0.2
#> 3 4.7 3.2 1.3 0.2
#> 4 4.6 3.1 1.5 0.2
#> # i 146 more rows
iris %>% select(where(function(x) is.numeric(x)))
#> # A tibble: 150 x 4
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> <dbl> <dbl> <dbl> <dbl>
#> 1 5.1 3.5 1.4 0.2
#> 2 4.9 3 1.4 0.2
#> 3 4.7 3.2 1.3 0.2
#> 4 4.6 3.1 1.5 0.2
#> # i 146 more rows
iris %>% select(where(~ is.numeric(.x)))
#> # A tibble: 150 x 4
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> <dbl> <dbl> <dbl> <dbl>
#> 1 5.1 3.5 1.4 0.2
#> 2 4.9 3 1.4 0.2
#> 3 4.7 3.2 1.3 0.2
#> 4 4.6 3.1 1.5 0.2
#> # i 146 more rows
The shorthand is useful for adding logic inline. Here we select all numeric variables whose mean is greater than 3.5:
iris %>% select(where(~ is.numeric(.x) && mean(.x) > 3.5))
#> # A tibble: 150 x 2
#> Sepal.Length Petal.Length
#> <dbl> <dbl>
#> 1 5.1 1.4
#> 2 4.9 1.4
#> 3 4.7 1.3
#> 4 4.6 1.5
#> # i 146 more rows