Keep or drop columns using their names and types — select (2024)

Keep or drop columns using their names and types — select (1)

Source: R/select.R

select.Rd

Select (and optionally rename) variables in a data frame, using a concisemini-language that makes it easy to refer to variables based on their name(e.g. a:f selects all columns from a on the left to f on theright) or type (e.g. where(is.numeric) selects all numeric columns).

Overview of selection features

Tidyverse selections implement a dialect of R where operators makeit easy to select variables:

  • : for selecting a range of consecutive variables.

  • ! for taking the complement of a set of variables.

  • & and | for selecting the intersection or the union of twosets of variables.

  • c() for combining selections.

In addition, you can use selection helpers. Some helpers select specificcolumns:

Other helpers select variables by matching patterns in their names:

Or from variables stored in a character vector:

  • all_of(): Matches variable names in a character vector. Allnames must be present, otherwise an out-of-bounds error isthrown.

  • any_of(): Same as all_of(), except that no error is thrownfor names that don't exist.

Or using a predicate function:

  • where(): Applies a function to all variables and selects thosefor which the function returns TRUE.

Usage

select(.data, ...)

Arguments

.data

A data frame, data frame extension (e.g. a tibble), or alazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, formore details.

...

<tidy-select> One or more unquotedexpressions separated by commas. Variable names can be used as if theywere positions in the data frame, so expressions like x:y canbe used to select a range of variables.

Value

An object of the same type as .data. The output has the followingproperties:

  • Rows are not affected.

  • Output columns are a subset of input columns, potentially with a differentorder. Columns will be renamed if new_name = old_name form is used.

  • Data frame attributes are preserved.

  • Groups are maintained; you can't select off grouping variables.

Methods

This function is a generic, which means that packages can provideimplementations (methods) for other classes. See the documentation ofindividual methods for extra arguments and differences in behaviour.

The following methods are currently available in loaded packages:dbplyr (tbl_lazy), dplyr (data.frame).

Examples

Here we show the usage for the basic selection operators. See thespecific help pages to learn about helpers like starts_with().

The selection language can be used in functions likedplyr::select() or tidyr::pivot_longer(). Let's first attachthe tidyverse:

library(tidyverse)# For better printingiris <- as_tibble(iris)

Select variables by name:

starwars %>% select(height)#> # A tibble: 87 x 1#> height#> <int>#> 1 172#> 2 167#> 3 96#> 4 202#> # i 83 more rowsiris %>% pivot_longer(Sepal.Length)#> # A tibble: 150 x 6#> Sepal.Width Petal.Length Petal.Width Species name value#> <dbl> <dbl> <dbl> <fct> <chr> <dbl>#> 1 3.5 1.4 0.2 setosa Sepal.Length 5.1#> 2 3 1.4 0.2 setosa Sepal.Length 4.9#> 3 3.2 1.3 0.2 setosa Sepal.Length 4.7#> 4 3.1 1.5 0.2 setosa Sepal.Length 4.6#> # i 146 more rows

Select multiple variables by separating them with commas. Note howthe order of columns is determined by the order of inputs:

starwars %>% select(homeworld, height, mass)#> # A tibble: 87 x 3#> homeworld height mass#> <chr> <int> <dbl>#> 1 Tatooine 172 77#> 2 Tatooine 167 75#> 3 Naboo 96 32#> 4 Tatooine 202 136#> # i 83 more rows

Functions like tidyr::pivot_longer() don't take variables withdots. In this case use c() to select multiple variables:

iris %>% pivot_longer(c(Sepal.Length, Petal.Length))#> # A tibble: 300 x 5#> Sepal.Width Petal.Width Species name value#> <dbl> <dbl> <fct> <chr> <dbl>#> 1 3.5 0.2 setosa Sepal.Length 5.1#> 2 3.5 0.2 setosa Petal.Length 1.4#> 3 3 0.2 setosa Sepal.Length 4.9#> 4 3 0.2 setosa Petal.Length 1.4#> # i 296 more rows

Operators:

The : operator selects a range of consecutive variables:

starwars %>% select(name:mass)#> # A tibble: 87 x 3#> name height mass#> <chr> <int> <dbl>#> 1 Luke Skywalker 172 77#> 2 C-3PO 167 75#> 3 R2-D2 96 32#> 4 Darth Vader 202 136#> # i 83 more rows

The ! operator negates a selection:

starwars %>% select(!(name:mass))#> # A tibble: 87 x 11#> hair_color skin_color eye_color birth_year sex gender homeworld species#> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> #> 1 blond fair blue 19 male masculine Tatooine Human #> 2 <NA> gold yellow 112 none masculine Tatooine Droid #> 3 <NA> white, blue red 33 none masculine Naboo Droid #> 4 none white yellow 41.9 male masculine Tatooine Human #> # i 83 more rows#> # i 3 more variables: films <list>, vehicles <list>, starships <list>iris %>% select(!c(Sepal.Length, Petal.Length))#> # A tibble: 150 x 3#> Sepal.Width Petal.Width Species#> <dbl> <dbl> <fct> #> 1 3.5 0.2 setosa #> 2 3 0.2 setosa #> 3 3.2 0.2 setosa #> 4 3.1 0.2 setosa #> # i 146 more rowsiris %>% select(!ends_with("Width"))#> # A tibble: 150 x 3#> Sepal.Length Petal.Length Species#> <dbl> <dbl> <fct> #> 1 5.1 1.4 setosa #> 2 4.9 1.4 setosa #> 3 4.7 1.3 setosa #> 4 4.6 1.5 setosa #> # i 146 more rows

& and | take the intersection or the union of two selections:

iris %>% select(starts_with("Petal") & ends_with("Width"))#> # A tibble: 150 x 1#> Petal.Width#> <dbl>#> 1 0.2#> 2 0.2#> 3 0.2#> 4 0.2#> # i 146 more rowsiris %>% select(starts_with("Petal") | ends_with("Width"))#> # A tibble: 150 x 3#> Petal.Length Petal.Width Sepal.Width#> <dbl> <dbl> <dbl>#> 1 1.4 0.2 3.5#> 2 1.4 0.2 3 #> 3 1.3 0.2 3.2#> 4 1.5 0.2 3.1#> # i 146 more rows

To take the difference between two selections, combine the & and! operators:

iris %>% select(starts_with("Petal") & !ends_with("Width"))#> # A tibble: 150 x 1#> Petal.Length#> <dbl>#> 1 1.4#> 2 1.4#> 3 1.3#> 4 1.5#> # i 146 more rows

See also

Other single table verbs: arrange(),filter(),mutate(),reframe(),rename(),slice(),summarise()

Keep or drop columns using their names and types — select (2024)
Top Articles
Latest Posts
Article information

Author: Nathanial Hackett

Last Updated:

Views: 5935

Rating: 4.1 / 5 (72 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Nathanial Hackett

Birthday: 1997-10-09

Address: Apt. 935 264 Abshire Canyon, South Nerissachester, NM 01800

Phone: +9752624861224

Job: Forward Technology Assistant

Hobby: Listening to music, Shopping, Vacation, Baton twirling, Flower arranging, Blacksmithing, Do it yourself

Introduction: My name is Nathanial Hackett, I am a lovely, curious, smiling, lively, thoughtful, courageous, lively person who loves writing and wants to share my knowledge and understanding with you.