These selection helpers match variables according to a given pattern.
starts_with()
: Starts with an exact prefix.ends_with()
: Ends with an exact suffix.contains()
: Contains a literal string.matches()
: Matches a regular expression.num_range()
: Matches a numerical range like x01, x02, x03.
Usage
starts_with(match, ignore.case = TRUE, vars = NULL)
ends_with(match, ignore.case = TRUE, vars = NULL)
contains(match, ignore.case = TRUE, vars = NULL)
matches(match, ignore.case = TRUE, perl = TRUE, vars = NULL)
num_range(
prefix,
range,
suffix = "",
width = NULL,
...,
cross = FALSE,
vars = NULL
)
Arguments
- match
A character vector. If length > 1, the union of the matches is taken.
For
starts_with()
,ends_with()
, andcontains()
this is an exact match. Formatches()
this is a regular expression, and can be a stringr pattern.- ignore.case
If
TRUE
, the default, ignores case when matching names.- vars
A character vector of variable names. If not supplied, the variables are taken from the current selection context (as established by functions like
select()
orpivot_longer()
).- perl
Should Perl-compatible regexps be used?
- prefix, suffix
A prefix/suffix added before/after the numeric range.
- range
A sequence of integers, like
1:5
.- width
Optionally, the "width" of the numeric range. For example, a range of 2 gives "01", a range of three "001", etc.
- ...
These dots are for future extensions and must be empty.
- cross
Whether to take the cartesian product of
prefix
,range
, andsuffix
. IfFALSE
, the default, these arguments are recycled using tidyverse rules.
Examples
Selection helpers can be used in functions like dplyr::select()
or tidyr::pivot_longer()
. Let's first attach the tidyverse:
starts_with()
selects all variables matching a prefix and
ends_with()
matches a suffix:
iris %>% select(starts_with("Sepal"))
#> # A tibble: 150 x 2
#> Sepal.Length Sepal.Width
#> <dbl> <dbl>
#> 1 5.1 3.5
#> 2 4.9 3
#> 3 4.7 3.2
#> 4 4.6 3.1
#> # i 146 more rows
iris %>% select(ends_with("Width"))
#> # A tibble: 150 x 2
#> Sepal.Width Petal.Width
#> <dbl> <dbl>
#> 1 3.5 0.2
#> 2 3 0.2
#> 3 3.2 0.2
#> 4 3.1 0.2
#> # i 146 more rows
You can supply multiple prefixes or suffixes. Note how the order of variables depends on the order of the suffixes and prefixes:
iris %>% select(starts_with(c("Petal", "Sepal")))
#> # A tibble: 150 x 4
#> Petal.Length Petal.Width Sepal.Length Sepal.Width
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1.4 0.2 5.1 3.5
#> 2 1.4 0.2 4.9 3
#> 3 1.3 0.2 4.7 3.2
#> 4 1.5 0.2 4.6 3.1
#> # i 146 more rows
iris %>% select(ends_with(c("Width", "Length")))
#> # A tibble: 150 x 4
#> Sepal.Width Petal.Width Sepal.Length Petal.Length
#> <dbl> <dbl> <dbl> <dbl>
#> 1 3.5 0.2 5.1 1.4
#> 2 3 0.2 4.9 1.4
#> 3 3.2 0.2 4.7 1.3
#> 4 3.1 0.2 4.6 1.5
#> # i 146 more rows
contains()
selects columns whose names contain a word:
iris %>% select(contains("al"))
#> # A tibble: 150 x 4
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> <dbl> <dbl> <dbl> <dbl>
#> 1 5.1 3.5 1.4 0.2
#> 2 4.9 3 1.4 0.2
#> 3 4.7 3.2 1.3 0.2
#> 4 4.6 3.1 1.5 0.2
#> # i 146 more rows
starts_with()
, ends_with()
, and contains()
do not use regular expressions. To select with a
regexp use matches()
:
# [pt] is matched literally:
iris %>% select(contains("[pt]al"))
#> # A tibble: 150 x 0
# [pt] is interpreted as a regular expression
iris %>% select(matches("[pt]al"))
#> # A tibble: 150 x 4
#> Sepal.Length Sepal.Width Petal.Length Petal.Width
#> <dbl> <dbl> <dbl> <dbl>
#> 1 5.1 3.5 1.4 0.2
#> 2 4.9 3 1.4 0.2
#> 3 4.7 3.2 1.3 0.2
#> 4 4.6 3.1 1.5 0.2
#> # i 146 more rows
starts_with()
selects all variables starting with a prefix. To
select a range, use num_range()
. Compare:
billboard %>% select(starts_with("wk"))
#> # A tibble: 317 x 76
#> wk1 wk2 wk3 wk4 wk5 wk6 wk7 wk8 wk9 wk10 wk11 wk12 wk13
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 87 82 72 77 87 94 99 NA NA NA NA NA NA
#> 2 91 87 92 NA NA NA NA NA NA NA NA NA NA
#> 3 81 70 68 67 66 57 54 53 51 51 51 51 47
#> 4 76 76 72 69 67 65 55 59 62 61 61 59 61
#> # i 313 more rows
#> # i 63 more variables: wk14 <dbl>, wk15 <dbl>, wk16 <dbl>, wk17 <dbl>,
#> # wk18 <dbl>, wk19 <dbl>, wk20 <dbl>, wk21 <dbl>, ...
billboard %>% select(num_range("wk", 10:15))
#> # A tibble: 317 x 6
#> wk10 wk11 wk12 wk13 wk14 wk15
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 NA NA NA NA NA NA
#> 2 NA NA NA NA NA NA
#> 3 51 51 51 47 44 38
#> 4 61 61 59 61 66 72
#> # i 313 more rows
See also
The selection language page, which includes links to other selection helpers.