Tidyverse selections implement a dialect of R where operators make it easy to select variables:
: for selecting a range of consecutive variables.
! for taking the complement of a set of variables.
& and | for selecting the intersection or the union of two
sets of variables.
c() for combining selections.
In addition, you can use selection helpers such as:
everything(): Matches all variables.
last_col(): Select last variable, possibly with an offset.
These helpers select variables based on their names:
starts_with(): Starts with a prefix.
ends_with(): Ends with a suffix.
contains(): Contains a literal string.
matches(): Matches a regular expression.
num_range(): Matches a numerical range like x01, x02, x03.
These functions select variables from a character vector.
Here we show the usage for the basic selection operators. See the
specific help pages to learn about helpers like starts_with().
The selection language can be used in functions like
dplyr::select() or tidyr::pivot_longer(). Let's first attach
the tidyverse:
library(tidyverse) # For better printing iris <- as_tibble(iris)
Select variables by name:
starwars %>% select(height)
## # A tibble: 87 x 1 ## height ## <int> ## 1 172 ## 2 167 ## 3 96 ## 4 202 ## # ... with 83 more rows
iris %>% pivot_longer(Sepal.Length)
## # A tibble: 150 x 6 ## Sepal.Width Petal.Length Petal.Width Species name value ## <dbl> <dbl> <dbl> <fct> <chr> <dbl> ## 1 3.5 1.4 0.2 setosa Sepal.Length 5.1 ## 2 3 1.4 0.2 setosa Sepal.Length 4.9 ## 3 3.2 1.3 0.2 setosa Sepal.Length 4.7 ## 4 3.1 1.5 0.2 setosa Sepal.Length 4.6 ## # ... with 146 more rows
Select multiple variables by separating them with commas. Note how the order of columns is determined by the order of inputs:
starwars %>% select(homeworld, height, mass)
## # A tibble: 87 x 3 ## homeworld height mass ## <chr> <int> <dbl> ## 1 Tatooine 172 77 ## 2 Tatooine 167 75 ## 3 Naboo 96 32 ## 4 Tatooine 202 136 ## # ... with 83 more rows
Functions like tidyr::pivot_longer() don't take variables with
dots. In this case use c() to select multiple variables:
iris %>% pivot_longer(c(Sepal.Length, Petal.Length))
## # A tibble: 300 x 5 ## Sepal.Width Petal.Width Species name value ## <dbl> <dbl> <fct> <chr> <dbl> ## 1 3.5 0.2 setosa Sepal.Length 5.1 ## 2 3.5 0.2 setosa Petal.Length 1.4 ## 3 3 0.2 setosa Sepal.Length 4.9 ## 4 3 0.2 setosa Petal.Length 1.4 ## # ... with 296 more rows
The : operator selects a range of consecutive variables:
starwars %>% select(name:mass)
## # A tibble: 87 x 3 ## name height mass ## <chr> <int> <dbl> ## 1 Luke Skywalker 172 77 ## 2 C-3PO 167 75 ## 3 R2-D2 96 32 ## 4 Darth Vader 202 136 ## # ... with 83 more rows
The ! operator negates a selection:
starwars %>% select(!(name:mass))
## # A tibble: 87 x 11 ## hair_color skin_color eye_color birth_year sex gender homeworld species ## <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> ## 1 blond fair blue 19 male mascu~ Tatooine Human ## 2 <NA> gold yellow 112 none mascu~ Tatooine Droid ## 3 <NA> white, bl~ red 33 none mascu~ Naboo Droid ## 4 none white yellow 41.9 male mascu~ Tatooine Human ## # ... with 83 more rows, and 3 more variables: films <list>, vehicles <list>, ## # starships <list>
iris %>% select(!c(Sepal.Length, Petal.Length))
## # A tibble: 150 x 3 ## Sepal.Width Petal.Width Species ## <dbl> <dbl> <fct> ## 1 3.5 0.2 setosa ## 2 3 0.2 setosa ## 3 3.2 0.2 setosa ## 4 3.1 0.2 setosa ## # ... with 146 more rows
iris %>% select(!ends_with("Width"))
## # A tibble: 150 x 3 ## Sepal.Length Petal.Length Species ## <dbl> <dbl> <fct> ## 1 5.1 1.4 setosa ## 2 4.9 1.4 setosa ## 3 4.7 1.3 setosa ## 4 4.6 1.5 setosa ## # ... with 146 more rows
& and | take the intersection or the union of two selections:
iris %>% select(starts_with("Petal") & ends_with("Width"))
## # A tibble: 150 x 1 ## Petal.Width ## <dbl> ## 1 0.2 ## 2 0.2 ## 3 0.2 ## 4 0.2 ## # ... with 146 more rows
iris %>% select(starts_with("Petal") | ends_with("Width"))
## # A tibble: 150 x 3 ## Petal.Length Petal.Width Sepal.Width ## <dbl> <dbl> <dbl> ## 1 1.4 0.2 3.5 ## 2 1.4 0.2 3 ## 3 1.3 0.2 3.2 ## 4 1.5 0.2 3.1 ## # ... with 146 more rows
To take the difference between two selections, combine the & and
! operators:
iris %>% select(starts_with("Petal") & !ends_with("Width"))
## # A tibble: 150 x 1 ## Petal.Length ## <dbl> ## 1 1.4 ## 2 1.4 ## 3 1.3 ## 4 1.5 ## # ... with 146 more rows
The order of selected columns is determined by the inputs.
all_of(c("foo", "bar")) selects "foo" first.
c(starts_with("c"), starts_with("d")) selects all columns
starting with "c" first, then all columns starting with "d".
Other selection helpers:
all_of(),
everything(),
starts_with()