Introduction to Data Frames and dplyr

In R, a data frame is a two-dimensional data structure, similar to a table in a database. The dplyr package provides a powerful and intuitive grammar for data manipulation.

# Create a data frame
my_dataframe <- data.frame(
  name = c("Charlie", "David", "Emily"),
  age = c(30, 35, 28),
  city = c("New York", "London", "Paris")
)
print(my_dataframe)
##      name age     city
## 1 Charlie  30 New York
## 2   David  35   London
## 3   Emily  28    Paris

Key dplyr verbs

  • select(): select columns
  • filter(): filter rows
  • mutate(): create new columns
  • arrange(): sort rows
  • summarise(): summarise data
  • group_by(): group data

Exercises

We will be using the mtcars dataset for many of these exercises.

  1. Load the mtcars dataset.
# Your code here
data(mtcars)
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
  1. Select the mpg, cyl, and hp columns from the mtcars dataset.
# Your code here
select(mtcars, mpg, cyl, hp)
##                      mpg cyl  hp
## Mazda RX4           21.0   6 110
## Mazda RX4 Wag       21.0   6 110
## Datsun 710          22.8   4  93
## Hornet 4 Drive      21.4   6 110
## Hornet Sportabout   18.7   8 175
## Valiant             18.1   6 105
## Duster 360          14.3   8 245
## Merc 240D           24.4   4  62
## Merc 230            22.8   4  95
## Merc 280            19.2   6 123
## Merc 280C           17.8   6 123
## Merc 450SE          16.4   8 180
## Merc 450SL          17.3   8 180
## Merc 450SLC         15.2   8 180
## Cadillac Fleetwood  10.4   8 205
## Lincoln Continental 10.4   8 215
## Chrysler Imperial   14.7   8 230
## Fiat 128            32.4   4  66
## Honda Civic         30.4   4  52
## Toyota Corolla      33.9   4  65
## Toyota Corona       21.5   4  97
## Dodge Challenger    15.5   8 150
## AMC Javelin         15.2   8 150
## Camaro Z28          13.3   8 245
## Pontiac Firebird    19.2   8 175
## Fiat X1-9           27.3   4  66
## Porsche 914-2       26.0   4  91
## Lotus Europa        30.4   4 113
## Ford Pantera L      15.8   8 264
## Ferrari Dino        19.7   6 175
## Maserati Bora       15.0   8 335
## Volvo 142E          21.4   4 109
  1. Filter the mtcars dataset to only include cars with more than 6 cylinders.
# Your code here
filter(mtcars, cyl>6)
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
  1. Create a new column in the mtcars dataset called hp_per_cyl which is the horsepower per cylinder.
# Your code here
mutate(mtcars, hp_per_cyl = hp / cyl)
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
##                     hp_per_cyl
## Mazda RX4             18.33333
## Mazda RX4 Wag         18.33333
## Datsun 710            23.25000
## Hornet 4 Drive        18.33333
## Hornet Sportabout     21.87500
## Valiant               17.50000
## Duster 360            30.62500
## Merc 240D             15.50000
## Merc 230              23.75000
## Merc 280              20.50000
## Merc 280C             20.50000
## Merc 450SE            22.50000
## Merc 450SL            22.50000
## Merc 450SLC           22.50000
## Cadillac Fleetwood    25.62500
## Lincoln Continental   26.87500
## Chrysler Imperial     28.75000
## Fiat 128              16.50000
## Honda Civic           13.00000
## Toyota Corolla        16.25000
## Toyota Corona         24.25000
## Dodge Challenger      18.75000
## AMC Javelin           18.75000
## Camaro Z28            30.62500
## Pontiac Firebird      21.87500
## Fiat X1-9             16.50000
## Porsche 914-2         22.75000
## Lotus Europa          28.25000
## Ford Pantera L        33.00000
## Ferrari Dino          29.16667
## Maserati Bora         41.87500
## Volvo 142E            27.25000
  1. Arrange the mtcars dataset by the mpg column in descending order.
# Your code here
arrange(mtcars, desc(mpg))
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
  1. Calculate the average mpg for each cylinder group.
# Your code here
summarise(
  group_by(mtcars, cyl),
  avg_mpg = mean(mpg)
)
## # A tibble: 3 × 2
##     cyl avg_mpg
##   <dbl>   <dbl>
## 1     4    26.7
## 2     6    19.7
## 3     8    15.1
  1. Create an empty data frame named empty_df.
# Your code here
empty_df <- data.frame()
print(empty_df)
## data frame with 0 columns and 0 rows
  1. Create a data frame named students_df from the following vectors:
    • id = c(1, 2, 3)
    • name = c("Alice", "Bob", "Charlie")
    • score = c(85, 92, 78)
# Your code here
students_df <- data.frame(
  id = c(1, 2, 3),
  name = c("Alice", "Bob", "Charlie"),
  score = c(85, 92, 78)
)
print(students_df)
##   id    name score
## 1  1   Alice    85
## 2  2     Bob    92
## 3  3 Charlie    78
  1. Get the structure and a statistical summary of the students_df data frame.
# Your code here
str(students_df)
## 'data.frame':    3 obs. of  3 variables:
##  $ id   : num  1 2 3
##  $ name : chr  "Alice" "Bob" "Charlie"
##  $ score: num  85 92 78
summary(students_df)
##        id          name               score     
##  Min.   :1.0   Length:3           Min.   :78.0  
##  1st Qu.:1.5   Class :character   1st Qu.:81.5  
##  Median :2.0   Mode  :character   Median :85.0  
##  Mean   :2.0                      Mean   :85.0  
##  3rd Qu.:2.5                      3rd Qu.:88.5  
##  Max.   :3.0                      Max.   :92.0
  1. Extract the name column from students_df using the $ operator.
# Your code here
students_df$name
## [1] "Alice"   "Bob"     "Charlie"
  1. Extract the first two rows of students_df.
# Your code here
students_df[1:2, ]
##   id  name score
## 1  1 Alice    85
## 2  2   Bob    92
  1. Extract the third row and the name and score columns from students_df.
# Your code here
students_df[3, c("name", "score")]
##      name score
## 3 Charlie    78
  1. Add a new column grade to students_df where scores >= 90 are “A”, scores >= 80 are “B”, and others are “C”. (Hint: use mutate with case_when).
# Your code here
students_df <- mutate(
  students_df,
  grade = case_when(
    score >= 90 ~ "A",
    score >= 80 ~ "B",
    TRUE ~ "C"
  )
)
print(students_df)
##   id    name score grade
## 1  1   Alice    85     B
## 2  2     Bob    92     A
## 3  3 Charlie    78     C
  1. Remove the score column from students_df.
# Your code here
students_df <- select(students_df, -score)
print(students_df)
##   id    name grade
## 1  1   Alice     B
## 2  2     Bob     A
## 3  3 Charlie     C
  1. Rename the name column in students_df to student_name.
# Your code here
students_df <- rename(students_df, student_name = name)
print(students_df)
##   id student_name grade
## 1  1        Alice     B
## 2  2          Bob     A
## 3  3      Charlie     C
  1. Create another data frame courses_df with id = c(1, 2, 3) and course = c("Math", "Science", "History"). Merge students_df and courses_df by id.
# Your code here
courses_df <- data.frame(
  id = c(1, 2, 3),
  course = c("Math", "Science", "History")
)
merged_df <- left_join(students_df, courses_df, by = "id")
print(merged_df)
##   id student_name grade  course
## 1  1        Alice     B    Math
## 2  2          Bob     A Science
## 3  3      Charlie     C History
  1. Create two data frames, df1 and df2, with identical columns. Combine them row-wise.
# Your code here
df1 <- data.frame(A = 1:2, B = c("X", "Y"))
df2 <- data.frame(A = 3:4, B = c("Z", "W"))
combined_rows_df <- bind_rows(df1, df2)
print(df1)
##   A B
## 1 1 X
## 2 2 Y
print(df2)
##   A B
## 1 3 Z
## 2 4 W
print(combined_rows_df)
##   A B
## 1 1 X
## 2 2 Y
## 3 3 Z
## 4 4 W
  1. Create two data frames, df3 and df4, with identical rows. Combine them column-wise.
# Your code here
df3 <- data.frame(id = 1:2, C = c(10, 20))
df4 <- data.frame(id = 1:2, D = c("P", "Q"))
combined_cols_df <- bind_cols(df3, df4)
## New names:
## • `id` -> `id...1`
## • `id` -> `id...3`
print(df3)
##   id  C
## 1  1 10
## 2  2 20
print(df4)
##   id D
## 1  1 P
## 2  2 Q
print(combined_cols_df)
##   id...1  C id...3 D
## 1      1 10      1 P
## 2      2 20      2 Q
  1. Reshape the mtcars dataset from wide to long format, keeping cyl and vs as id variables and gathering mpg, disp, hp into a metric and value column.
# Your code here
mtcars_long <- mtcars %>%
  pivot_longer(
    cols = c(mpg, disp, hp),
    names_to = "metric",
    values_to = "value"
  )
print(head(mtcars_long))
## # A tibble: 6 × 10
##     cyl  drat    wt  qsec    vs    am  gear  carb metric value
##   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>  <dbl>
## 1     6   3.9  2.62  16.5     0     1     4     4 mpg       21
## 2     6   3.9  2.62  16.5     0     1     4     4 disp     160
## 3     6   3.9  2.62  16.5     0     1     4     4 hp       110
## 4     6   3.9  2.88  17.0     0     1     4     4 mpg       21
## 5     6   3.9  2.88  17.0     0     1     4     4 disp     160
## 6     6   3.9  2.88  17.0     0     1     4     4 hp       110
  1. Calculate the mean of the mpg column in mtcars.
# Your code here
mean(mtcars$mpg)
## [1] 20.09062
  1. Calculate the number of rows and columns in mtcars.
# Your code here
nrow(mtcars)
## [1] 32
ncol(mtcars)
## [1] 11
  1. Calculate the column means of mtcars.
# Your code here
colMeans(mtcars)
##        mpg        cyl       disp         hp       drat         wt       qsec 
##  20.090625   6.187500 230.721875 146.687500   3.596563   3.217250  17.848750 
##         vs         am       gear       carb 
##   0.437500   0.406250   3.687500   2.812500
  1. Calculate the row sums of mtcars.
# Your code here
rowSums(mtcars)
##           Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive 
##             328.980             329.795             259.580             426.135 
##   Hornet Sportabout             Valiant          Duster 360           Merc 240D 
##             590.310             385.540             656.920             270.980 
##            Merc 230            Merc 280           Merc 280C          Merc 450SE 
##             299.570             350.460             349.660             510.740 
##          Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental 
##             511.500             509.850             728.560             726.644 
##   Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla 
##             725.695             213.850             195.165             206.955 
##       Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28 
##             273.775             519.650             506.085             646.280 
##    Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa 
##             631.175             208.215             272.570             273.683 
##      Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E 
##             670.690             379.590             694.710             288.890

Solutions

Click here for the solutions