Select variables that match a pattern

These selection helpers match variables according to a given pattern.

starts_with(): Starts with an exact prefix.
ends_with(): Ends with an exact suffix.
contains(): Contains a literal string.
matches(): Matches a regular expression.
num_range(): Matches a numerical range like x01, x02, x03.

Usage

starts_with(match, ignore.case = TRUE, vars = NULL)

ends_with(match, ignore.case = TRUE, vars = NULL)

contains(match, ignore.case = TRUE, vars = NULL)

matches(match, ignore.case = TRUE, perl = TRUE, vars = NULL)

num_range(
  prefix,
  range,
  suffix = "",
  width = NULL,
  ...,
  cross = FALSE,
  vars = NULL
)

Arguments

match

A character vector. If length > 1, the union of the matches is taken.

For starts_with(), ends_with(), and contains() this is an exact match. For matches() this is a regular expression, and can be a stringr pattern.

ignore.case

If TRUE, the default, ignores case when matching names.

vars

A character vector of variable names. If not supplied, the variables are taken from the current selection context (as established by functions like select() or pivot_longer()).

perl

Should Perl-compatible regexps be used?

prefix, suffix

A prefix/suffix added before/after the numeric range.

range

A sequence of integers, like 1:5.

width

Optionally, the "width" of the numeric range. For example, a range of 2 gives "01", a range of three "001", etc.

...

These dots are for future extensions and must be empty.

cross

Whether to take the cartesian product of prefix, range, and suffix. If FALSE, the default, these arguments are recycled using tidyverse rules.

Examples

Selection helpers can be used in functions like dplyr::select() or tidyr::pivot_longer(). Let's first attach the tidyverse:

library(tidyverse)

# For better printing
iris <- as_tibble(iris)

starts_with() selects all variables matching a prefix and ends_with() matches a suffix:

iris %>% select(starts_with("Sepal"))
#> # A tibble: 150 x 2
#>   Sepal.Length Sepal.Width
#>          <dbl>       <dbl>
#> 1          5.1         3.5
#> 2          4.9         3
#> 3          4.7         3.2
#> 4          4.6         3.1
#> # i 146 more rows

iris %>% select(ends_with("Width"))
#> # A tibble: 150 x 2
#>   Sepal.Width Petal.Width
#>         <dbl>       <dbl>
#> 1         3.5         0.2
#> 2         3           0.2
#> 3         3.2         0.2
#> 4         3.1         0.2
#> # i 146 more rows

You can supply multiple prefixes or suffixes. Note how the order of variables depends on the order of the suffixes and prefixes:

iris %>% select(starts_with(c("Petal", "Sepal")))
#> # A tibble: 150 x 4
#>   Petal.Length Petal.Width Sepal.Length Sepal.Width
#>          <dbl>       <dbl>        <dbl>       <dbl>
#> 1          1.4         0.2          5.1         3.5
#> 2          1.4         0.2          4.9         3
#> 3          1.3         0.2          4.7         3.2
#> 4          1.5         0.2          4.6         3.1
#> # i 146 more rows

iris %>% select(ends_with(c("Width", "Length")))
#> # A tibble: 150 x 4
#>   Sepal.Width Petal.Width Sepal.Length Petal.Length
#>         <dbl>       <dbl>        <dbl>        <dbl>
#> 1         3.5         0.2          5.1          1.4
#> 2         3           0.2          4.9          1.4
#> 3         3.2         0.2          4.7          1.3
#> 4         3.1         0.2          4.6          1.5
#> # i 146 more rows

contains() selects columns whose names contain a word:

iris %>% select(contains("al"))
#> # A tibble: 150 x 4
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width
#>          <dbl>       <dbl>        <dbl>       <dbl>
#> 1          5.1         3.5          1.4         0.2
#> 2          4.9         3            1.4         0.2
#> 3          4.7         3.2          1.3         0.2
#> 4          4.6         3.1          1.5         0.2
#> # i 146 more rows

starts_with(), ends_with(), and contains() do not use regular expressions. To select with a regexp use matches():

# [pt] is matched literally:
iris %>% select(contains("[pt]al"))
#> # A tibble: 150 x 0

# [pt] is interpreted as a regular expression
iris %>% select(matches("[pt]al"))
#> # A tibble: 150 x 4
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width
#>          <dbl>       <dbl>        <dbl>       <dbl>
#> 1          5.1         3.5          1.4         0.2
#> 2          4.9         3            1.4         0.2
#> 3          4.7         3.2          1.3         0.2
#> 4          4.6         3.1          1.5         0.2
#> # i 146 more rows

starts_with() selects all variables starting with a prefix. To select a range, use num_range(). Compare:

billboard %>% select(starts_with("wk"))
#> # A tibble: 317 x 76
#>     wk1   wk2   wk3   wk4   wk5   wk6   wk7   wk8   wk9  wk10  wk11  wk12  wk13
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1    87    82    72    77    87    94    99    NA    NA    NA    NA    NA    NA
#> 2    91    87    92    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
#> 3    81    70    68    67    66    57    54    53    51    51    51    51    47
#> 4    76    76    72    69    67    65    55    59    62    61    61    59    61
#> # i 313 more rows
#> # i 63 more variables: wk14 <dbl>, wk15 <dbl>, wk16 <dbl>, wk17 <dbl>,
#> #   wk18 <dbl>, wk19 <dbl>, wk20 <dbl>, wk21 <dbl>, ...

billboard %>% select(num_range("wk", 10:15))
#> # A tibble: 317 x 6
#>    wk10  wk11  wk12  wk13  wk14  wk15
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1    NA    NA    NA    NA    NA    NA
#> 2    NA    NA    NA    NA    NA    NA
#> 3    51    51    51    47    44    38
#> 4    61    61    59    61    66    72
#> # i 313 more rows

Usage

Arguments

Examples

See also