Select variables that match a pattern

These selection helpers match variables according to a given pattern.

starts_with(): Starts with an exact prefix.
ends_with(): Ends with an exact suffix.
contains(): Contains a literal string.
matches(): Matches a regular expression.
num_range(): Matches a numerical range like x01, x02, x03.

Usage

starts_with(match, ignore.case = TRUE, vars = NULL)

ends_with(match, ignore.case = TRUE, vars = NULL)

contains(match, ignore.case = TRUE, vars = NULL)

matches(match, ignore.case = TRUE, perl = FALSE, vars = NULL)

num_range(prefix, range, suffix = "", width = NULL, vars = NULL)

Arguments

match

A character vector. If length > 1, the union of the matches is taken.

For starts_with(), ends_with(), and contains() this is an exact match. For matches() this is a regular expression, and can be a stringr pattern.

ignore.case

If TRUE, the default, ignores case when matching names.

vars

A character vector of variable names. If not supplied, the variables are taken from the current selection context (as established by functions like select() or pivot_longer()).

perl

Should Perl-compatible regexps be used?

prefix, suffix

A prefix/suffix added before/after the numeric range.

range

A sequence of integers, like 1:5.

width

Optionally, the "width" of the numeric range. For example, a range of 2 gives "01", a range of three "001", etc.

Examples

Selection helpers can be used in functions like dplyr::select() or tidyr::pivot_longer(). Let's first attach the tidyverse:

library(tidyverse)

# For better printing
iris <- as_tibble(iris)

starts_with() selects all variables matching a prefix and ends_with() matches a suffix:

iris %>% select(starts_with("Sepal"))
#> # A tibble: 150 x 2
#>   Sepal.Length Sepal.Width
#>          <dbl>       <dbl>
#> 1          5.1         3.5
#> 2          4.9         3  
#> 3          4.7         3.2
#> 4          4.6         3.1
#> # i 146 more rows

iris %>% select(ends_with("Width"))
#> # A tibble: 150 x 2
#>   Sepal.Width Petal.Width
#>         <dbl>       <dbl>
#> 1         3.5         0.2
#> 2         3           0.2
#> 3         3.2         0.2
#> 4         3.1         0.2
#> # i 146 more rows

You can supply multiple prefixes or suffixes. Note how the order of variables depends on the order of the suffixes and prefixes:

iris %>% select(starts_with(c("Petal", "Sepal")))
#> # A tibble: 150 x 4
#>   Petal.Length Petal.Width Sepal.Length Sepal.Width
#>          <dbl>       <dbl>        <dbl>       <dbl>
#> 1          1.4         0.2          5.1         3.5
#> 2          1.4         0.2          4.9         3  
#> 3          1.3         0.2          4.7         3.2
#> 4          1.5         0.2          4.6         3.1
#> # i 146 more rows

iris %>% select(ends_with(c("Width", "Length")))
#> # A tibble: 150 x 4
#>   Sepal.Width Petal.Width Sepal.Length Petal.Length
#>         <dbl>       <dbl>        <dbl>        <dbl>
#> 1         3.5         0.2          5.1          1.4
#> 2         3           0.2          4.9          1.4
#> 3         3.2         0.2          4.7          1.3
#> 4         3.1         0.2          4.6          1.5
#> # i 146 more rows

contains() selects columns whose names contain a word:

iris %>% select(contains("al"))
#> # A tibble: 150 x 4
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width
#>          <dbl>       <dbl>        <dbl>       <dbl>
#> 1          5.1         3.5          1.4         0.2
#> 2          4.9         3            1.4         0.2
#> 3          4.7         3.2          1.3         0.2
#> 4          4.6         3.1          1.5         0.2
#> # i 146 more rows

starts_with(), ends_with(), and contains() do not use regular expressions. To select with a regexp use matches():

# [pt] is matched literally:
iris %>% select(contains("[pt]al"))
#> # A tibble: 150 x 0

# [pt] is interpreted as a regular expression
iris %>% select(matches("[pt]al"))
#> # A tibble: 150 x 4
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width
#>          <dbl>       <dbl>        <dbl>       <dbl>
#> 1          5.1         3.5          1.4         0.2
#> 2          4.9         3            1.4         0.2
#> 3          4.7         3.2          1.3         0.2
#> 4          4.6         3.1          1.5         0.2
#> # i 146 more rows

starts_with() selects all variables starting with a prefix. To select a range, use num_range(). Compare:

billboard %>% select(starts_with("wk"))
#> # A tibble: 317 x 76
#>     wk1   wk2   wk3   wk4   wk5   wk6   wk7   wk8   wk9  wk10  wk11  wk12  wk13
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1    87    82    72    77    87    94    99    NA    NA    NA    NA    NA    NA
#> 2    91    87    92    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA
#> 3    81    70    68    67    66    57    54    53    51    51    51    51    47
#> 4    76    76    72    69    67    65    55    59    62    61    61    59    61
#> # i 313 more rows
#> # i 63 more variables: wk14 <dbl>, wk15 <dbl>, wk16 <dbl>, wk17 <dbl>,
#> #   wk18 <dbl>, wk19 <dbl>, wk20 <dbl>, wk21 <dbl>, ...

billboard %>% select(num_range("wk", 10:15))
#> # A tibble: 317 x 6
#>    wk10  wk11  wk12  wk13  wk14  wk15
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1    NA    NA    NA    NA    NA    NA
#> 2    NA    NA    NA    NA    NA    NA
#> 3    51    51    51    47    44    38
#> 4    61    61    59    61    66    72
#> # i 313 more rows

Usage

Arguments

Examples

See also