QUESTION:How do I select several columns from a dataframe using brackets [ ]?

Dataframe is commonly used in data analysis, and sometimes we need to select several columns of a dataframe. This can be done with []

Data

We’ll use the “palmerpenguins” packages (https://allisonhorst.github.io/palmerpenguins/) to address this question. You’ll need to install the package with install.packages(“palmerpenguins”) if you have not done so before, call library("“palmerpenguins”), and load the data with data(penguins)

#install.packages("palmerpenguins")
library(palmerpenguins)
## Warning: package 'palmerpenguins' was built under R version 4.1.2
data(penguins)
penguins
## # A tibble: 344 x 8
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # ... with 334 more rows, and 2 more variables: sex <fct>, year <int>

Selecting one column using []

At this point, “penguins” is already a dataframe, and we can use [] to select one single column

penguins[3:4]
## # A tibble: 344 x 2
##    bill_length_mm bill_depth_mm
##             <dbl>         <dbl>
##  1           39.1          18.7
##  2           39.5          17.4
##  3           40.3          18  
##  4           NA            NA  
##  5           36.7          19.3
##  6           39.3          20.6
##  7           38.9          17.8
##  8           39.2          19.6
##  9           34.1          18.1
## 10           42            20.2
## # ... with 334 more rows
penguins[,3:4]
## # A tibble: 344 x 2
##    bill_length_mm bill_depth_mm
##             <dbl>         <dbl>
##  1           39.1          18.7
##  2           39.5          17.4
##  3           40.3          18  
##  4           NA            NA  
##  5           36.7          19.3
##  6           39.3          20.6
##  7           38.9          17.8
##  8           39.2          19.6
##  9           34.1          18.1
## 10           42            20.2
## # ... with 334 more rows

[3:4] and [,3:4] gives the same output. The first one means “column #3 to #4” and the other is saying “every row in column 3 to 4”

if we are selecting seperate columns, use a vector of column indexes

penguins[c(3,5)]
## # A tibble: 344 x 2
##    bill_length_mm flipper_length_mm
##             <dbl>             <int>
##  1           39.1               181
##  2           39.5               186
##  3           40.3               195
##  4           NA                  NA
##  5           36.7               193
##  6           39.3               190
##  7           38.9               181
##  8           39.2               195
##  9           34.1               193
## 10           42                 190
## # ... with 334 more rows

Additional Reading

For more information on this topic, see https://dzone.com/articles/learn-r-how-extract-rows

Keywords

data.frame
[]
“,”
c()
palmerpenguins