Introduction

In this exercise document, we will explore the airquality and mtcars datasets, using the dplyr package. Our goal is to demonstrate the power of dplyr functions for data manipulation by applying various operations such as filtering, selecting, grouping, summarizing, and arranging on both datasets. We will dive into the datasets, extract insights, and make meaningful observations using these powerful functions.

Loading the Datasets

Let’s load both the airquality and mtcars datasets into R to begin.

# Load the Needed Libraries
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

# Load the 'airquality' dataset
data(airquality)

# Load the 'mtcars' dataset
data(mtcars)

# Check the data structure
str(mtcars)

## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

str(airquality)

## 'data.frame':    153 obs. of  6 variables:
##  $ Ozone  : int  41 36 12 18 NA 28 23 19 8 NA ...
##  $ Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ...
##  $ Wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
##  $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ...
##  $ Month  : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ Day    : int  1 2 3 4 5 6 7 8 9 10 ...

The Pipe Operator %>%

Before going the dplyr practice, we need to explain what the pipe operator is.

The pipe operator, represented by %>%, is a powerful tool introduced by the dplyr package in R. It allows you to chain multiple operations together in a more readable and efficient manner. With the pipe operator, you can pass the output of one function as the first argument to another function, reducing the need for intermediate assignments and making the code easier to follow.

Syntax:

The syntax of the pipe operator is as follows:

output <- input %>% function1() %>% function2() %>% ... %>% functionN()

Explanation:

input: This is the initial data or object that serves as the input to the first function in the chain.
%>%: The pipe operator takes the output from the left-hand side (LHS) and passes it as the first argument to the function on the right-hand side (RHS).
function1(): This is the first dplyr function that will be applied to the input data.
function2(): This is the second dplyr function, which takes the output of function1() as its input.
...: You can continue chaining as many dplyr functions as needed, and each function will operate on the output of the previous one.
functionN(): This is the last dplyr function in the chain, and its output will be stored in the output variable.

Now we can start with the Filter and Select functions in dplyr.

Filtering and Selecting Columns

Task: Use the filter function to extract data for the month of July from the ‘airquality’ dataset and select specific columns for analysis.

# Filter data for the month of July and select Ozone, Temp, Wind only
july_air_quality <- airquality %>%
  filter(Month == 7) %>%
  select(Ozone, Temp, Wind)

# Show the first 6 rows of the july_air_quality
head(july_air_quality)

##   Ozone Temp Wind
## 1   135   84  4.1
## 2    49   85  9.2
## 3    32   81  9.2
## 4    NA   84 10.9
## 5    64   83  4.6
## 6    40   83 10.9

Task: Filter the records with both month = 7 and temperature over 80.

# Do the same as previous, but for the temperatures over 80
july_air_quality <- airquality %>%
  filter(Month == 7 & Temp >80) %>%
  select(Ozone, Temp, Wind)

# Show the first 6 rows of the july_air_quality
head(july_air_quality)

##   Ozone Temp Wind
## 1   135   84  4.1
## 2    49   85  9.2
## 3    32   81  9.2
## 4    NA   84 10.9
## 5    64   83  4.6
## 6    40   83 10.9

Task: Oh, my bad. I was gonna say month = 7 OR temperature over 80. Sorry. Can you revise it?

# Do the same as previous, but with the difference of 'OR'
july_air_quality <- airquality %>%
  filter(Month == 7 | Temp >80) %>%
  select(Ozone, Temp, Wind)

# Show the first 6 rows of the july_air_quality
head(july_air_quality)

##   Ozone Temp Wind
## 1    45   81 14.9
## 2    NA   84  9.2
## 3    NA   85  8.6
## 4    29   82  9.7
## 5    NA   87  6.9
## 6    71   90 13.8

Perfect! Let’s do more filtering! But this time, on MTCARS!

Task: I’m more interested in Merc 240 and Mazda RX4 cars. Can you filter them in mtcars dataset? I believe they have Merc in their name. Also, you need to convert row names as column.

# 1 Convert row names to a new column 'Car_Name'
mtcars_with_car_name <- mtcars %>%
  mutate(Car_Name = rownames(.))

# Show the first 6 rows
head(mtcars_with_car_name)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
##                            Car_Name
## Mazda RX4                 Mazda RX4
## Mazda RX4 Wag         Mazda RX4 Wag
## Datsun 710               Datsun 710
## Hornet 4 Drive       Hornet 4 Drive
## Hornet Sportabout Hornet Sportabout
## Valiant                     Valiant

# 2 Filter the cars with specific countries in the name
selected_brands <- c("Merc 240D", "Mazda RX4")
Merc_and_RX4 <- mtcars_with_car_name %>%
  filter(Car_Name %in% selected_brands)
print(Merc_and_RX4)

##            mpg cyl  disp  hp drat   wt  qsec vs am gear carb  Car_Name
## Mazda RX4 21.0   6 160.0 110 3.90 2.62 16.46  0  1    4    4 Mazda RX4
## Merc 240D 24.4   4 146.7  62 3.69 3.19 20.00  1  0    4    2 Merc 240D

top_n As A Filter

Task: I want to see the most powerful cars in the data! Filter only the top 3 cars by HP!

# Filter the most powerful ones as the_muscles
the_muscles <- mtcars %>%
  select(hp) %>% 
  top_n(3, hp) 

# Show "the_muscles"
the_muscles

##                 hp
## Duster 360     245
## Camaro Z28     245
## Ford Pantera L 264
## Maserati Bora  335

Using Arrange

Task: Also arrange the top 3 cars by hp!

# Arrange the_muscles ascending
the_muscles %>% arrange(hp)

##                 hp
## Duster 360     245
## Camaro Z28     245
## Ford Pantera L 264
## Maserati Bora  335

Task: No this is what I want! I want them to be arranged in DESCending order!

# Arrange the_muscles descending
the_muscles %>% arrange(desc(hp))

##                 hp
## Maserati Bora  335
## Ford Pantera L 264
## Duster 360     245
## Camaro Z28     245

Wow that was great! You are truly a DPLYR wizard! 🪄

But we are not done yet! I have still tasks for you…

Using Group By and Summarize

Task : Group the cars by the number of cylinders and calculate the average MPG for each group

# Group the cars by the number of cyls and calculate the avg MPG for each
grouped_avg_mpg_by_cyl <- mtcars %>%
  group_by(cyl) %>%
  summarize(avg_mpg = mean(mpg))
# Show grouped_avg_mpg_by_cyl
grouped_avg_mpg_by_cyl

## # A tibble: 3 × 2
##     cyl avg_mpg
##   <dbl>   <dbl>
## 1     4    26.7
## 2     6    19.7
## 3     8    15.1

Task: I know it doesn’t make a lot of sense but, I really wonder the total and average hp of small cars engine cars (4cyl) in our dataset. Can you calculate?

# Calculate and record as mtcars_hps
mtcars_hps <- mtcars %>% 
  filter(cyl == 4) %>% 
  summarise(total_hp=sum(hp), avg_hp = mean(hp))
# Show the result of mtcars_hps
mtcars_hps

##   total_hp   avg_hp
## 1      909 82.63636

Using Mutate

Did you realize that summarize created new columns but in a grouped way. Wha if I just need a new column based on an existing column?

Task: MPG is good for U.S but not too clear for Europeans. Let’s add a column for them and convert MPG to liters per 100kms.

(Use this formula = 235.215 / mpg)

# 5 Convert MPG to L/100km using mutate, record as mtcars_converted
mtcars_converted <- mtcars %>%
  mutate(L_per_100km = 235.215 / mpg)
# Show the mtcars_converted
head(mtcars_converted)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
##                   L_per_100km
## Mazda RX4            11.20071
## Mazda RX4 Wag        11.20071
## Datsun 710           10.31645
## Hornet 4 Drive       10.99136
## Hornet Sportabout    12.57834
## Valiant              12.99530

Nice! Much better for internationals! 👍😎

Using Rename

Yet, L_per_100km is still not so good, not seem so clear to me…

Task: Rename it as Liters_per_100km.

# Rename with the variable name mtcars_renamed
mtcars_renamed <- mtcars_converted %>%
  rename(Liters_per_100km = L_per_100km)
# Show the result
mtcars_renamed

##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
##                     Liters_per_100km
## Mazda RX4                  11.200714
## Mazda RX4 Wag              11.200714
## Datsun 710                 10.316447
## Hornet 4 Drive             10.991355
## Hornet Sportabout          12.578342
## Valiant                    12.995304
## Duster 360                 16.448601
## Merc 240D                   9.639959
## Merc 230                   10.316447
## Merc 280                   12.250781
## Merc 280C                  13.214326
## Merc 450SE                 14.342378
## Merc 450SL                 13.596243
## Merc 450SLC                15.474671
## Cadillac Fleetwood         22.616827
## Lincoln Continental        22.616827
## Chrysler Imperial          16.001020
## Fiat 128                    7.259722
## Honda Civic                 7.737336
## Toyota Corolla              6.938496
## Toyota Corona              10.940233
## Dodge Challenger           15.175161
## AMC Javelin                15.474671
## Camaro Z28                 17.685338
## Pontiac Firebird           12.250781
## Fiat X1-9                   8.615934
## Porsche 914-2               9.046731
## Lotus Europa                7.737336
## Ford Pantera L             14.887025
## Ferrari Dino               11.939848
## Maserati Bora              15.681000
## Volvo 142E                 10.991355

Using Count and Add Count

Task: We want to see the counts of observations in each month, can you count them without making a grouping?

# Add the number of observations for each Month as airquality_with_count
airquality_with_count <- airquality %>%
  add_count(Month)
# Show the first 6 rows of airquality_with_count
airquality_with_count

##     Ozone Solar.R Wind Temp Month Day  n
## 1      41     190  7.4   67     5   1 31
## 2      36     118  8.0   72     5   2 31
## 3      12     149 12.6   74     5   3 31
## 4      18     313 11.5   62     5   4 31
## 5      NA      NA 14.3   56     5   5 31
## 6      28      NA 14.9   66     5   6 31
## 7      23     299  8.6   65     5   7 31
## 8      19      99 13.8   59     5   8 31
## 9       8      19 20.1   61     5   9 31
## 10     NA     194  8.6   69     5  10 31
## 11      7      NA  6.9   74     5  11 31
## 12     16     256  9.7   69     5  12 31
## 13     11     290  9.2   66     5  13 31
## 14     14     274 10.9   68     5  14 31
## 15     18      65 13.2   58     5  15 31
## 16     14     334 11.5   64     5  16 31
## 17     34     307 12.0   66     5  17 31
## 18      6      78 18.4   57     5  18 31
## 19     30     322 11.5   68     5  19 31
## 20     11      44  9.7   62     5  20 31
## 21      1       8  9.7   59     5  21 31
## 22     11     320 16.6   73     5  22 31
## 23      4      25  9.7   61     5  23 31
## 24     32      92 12.0   61     5  24 31
## 25     NA      66 16.6   57     5  25 31
## 26     NA     266 14.9   58     5  26 31
## 27     NA      NA  8.0   57     5  27 31
## 28     23      13 12.0   67     5  28 31
## 29     45     252 14.9   81     5  29 31
## 30    115     223  5.7   79     5  30 31
## 31     37     279  7.4   76     5  31 31
## 32     NA     286  8.6   78     6   1 30
## 33     NA     287  9.7   74     6   2 30
## 34     NA     242 16.1   67     6   3 30
## 35     NA     186  9.2   84     6   4 30
## 36     NA     220  8.6   85     6   5 30
## 37     NA     264 14.3   79     6   6 30
## 38     29     127  9.7   82     6   7 30
## 39     NA     273  6.9   87     6   8 30
## 40     71     291 13.8   90     6   9 30
## 41     39     323 11.5   87     6  10 30
## 42     NA     259 10.9   93     6  11 30
## 43     NA     250  9.2   92     6  12 30
## 44     23     148  8.0   82     6  13 30
## 45     NA     332 13.8   80     6  14 30
## 46     NA     322 11.5   79     6  15 30
## 47     21     191 14.9   77     6  16 30
## 48     37     284 20.7   72     6  17 30
## 49     20      37  9.2   65     6  18 30
## 50     12     120 11.5   73     6  19 30
## 51     13     137 10.3   76     6  20 30
## 52     NA     150  6.3   77     6  21 30
## 53     NA      59  1.7   76     6  22 30
## 54     NA      91  4.6   76     6  23 30
## 55     NA     250  6.3   76     6  24 30
## 56     NA     135  8.0   75     6  25 30
## 57     NA     127  8.0   78     6  26 30
## 58     NA      47 10.3   73     6  27 30
## 59     NA      98 11.5   80     6  28 30
## 60     NA      31 14.9   77     6  29 30
## 61     NA     138  8.0   83     6  30 30
## 62    135     269  4.1   84     7   1 31
## 63     49     248  9.2   85     7   2 31
## 64     32     236  9.2   81     7   3 31
## 65     NA     101 10.9   84     7   4 31
## 66     64     175  4.6   83     7   5 31
## 67     40     314 10.9   83     7   6 31
## 68     77     276  5.1   88     7   7 31
## 69     97     267  6.3   92     7   8 31
## 70     97     272  5.7   92     7   9 31
## 71     85     175  7.4   89     7  10 31
## 72     NA     139  8.6   82     7  11 31
## 73     10     264 14.3   73     7  12 31
## 74     27     175 14.9   81     7  13 31
## 75     NA     291 14.9   91     7  14 31
## 76      7      48 14.3   80     7  15 31
## 77     48     260  6.9   81     7  16 31
## 78     35     274 10.3   82     7  17 31
## 79     61     285  6.3   84     7  18 31
## 80     79     187  5.1   87     7  19 31
## 81     63     220 11.5   85     7  20 31
## 82     16       7  6.9   74     7  21 31
## 83     NA     258  9.7   81     7  22 31
## 84     NA     295 11.5   82     7  23 31
## 85     80     294  8.6   86     7  24 31
## 86    108     223  8.0   85     7  25 31
## 87     20      81  8.6   82     7  26 31
## 88     52      82 12.0   86     7  27 31
## 89     82     213  7.4   88     7  28 31
## 90     50     275  7.4   86     7  29 31
## 91     64     253  7.4   83     7  30 31
## 92     59     254  9.2   81     7  31 31
## 93     39      83  6.9   81     8   1 31
## 94      9      24 13.8   81     8   2 31
## 95     16      77  7.4   82     8   3 31
## 96     78      NA  6.9   86     8   4 31
## 97     35      NA  7.4   85     8   5 31
## 98     66      NA  4.6   87     8   6 31
## 99    122     255  4.0   89     8   7 31
## 100    89     229 10.3   90     8   8 31
## 101   110     207  8.0   90     8   9 31
## 102    NA     222  8.6   92     8  10 31
## 103    NA     137 11.5   86     8  11 31
## 104    44     192 11.5   86     8  12 31
## 105    28     273 11.5   82     8  13 31
## 106    65     157  9.7   80     8  14 31
## 107    NA      64 11.5   79     8  15 31
## 108    22      71 10.3   77     8  16 31
## 109    59      51  6.3   79     8  17 31
## 110    23     115  7.4   76     8  18 31
## 111    31     244 10.9   78     8  19 31
## 112    44     190 10.3   78     8  20 31
## 113    21     259 15.5   77     8  21 31
## 114     9      36 14.3   72     8  22 31
## 115    NA     255 12.6   75     8  23 31
## 116    45     212  9.7   79     8  24 31
## 117   168     238  3.4   81     8  25 31
## 118    73     215  8.0   86     8  26 31
## 119    NA     153  5.7   88     8  27 31
## 120    76     203  9.7   97     8  28 31
## 121   118     225  2.3   94     8  29 31
## 122    84     237  6.3   96     8  30 31
## 123    85     188  6.3   94     8  31 31
## 124    96     167  6.9   91     9   1 30
## 125    78     197  5.1   92     9   2 30
## 126    73     183  2.8   93     9   3 30
## 127    91     189  4.6   93     9   4 30
## 128    47      95  7.4   87     9   5 30
## 129    32      92 15.5   84     9   6 30
## 130    20     252 10.9   80     9   7 30
## 131    23     220 10.3   78     9   8 30
## 132    21     230 10.9   75     9   9 30
## 133    24     259  9.7   73     9  10 30
## 134    44     236 14.9   81     9  11 30
## 135    21     259 15.5   76     9  12 30
## 136    28     238  6.3   77     9  13 30
## 137     9      24 10.9   71     9  14 30
## 138    13     112 11.5   71     9  15 30
## 139    46     237  6.9   78     9  16 30
## 140    18     224 13.8   67     9  17 30
## 141    13      27 10.3   76     9  18 30
## 142    24     238 10.3   68     9  19 30
## 143    16     201  8.0   82     9  20 30
## 144    13     238 12.6   64     9  21 30
## 145    23      14  9.2   71     9  22 30
## 146    36     139 10.3   81     9  23 30
## 147     7      49 10.3   69     9  24 30
## 148    14      20 16.6   63     9  25 30
## 149    30     193  6.9   70     9  26 30
## 150    NA     145 13.2   77     9  27 30
## 151    14     191 14.3   75     9  28 30
## 152    18     131  8.0   76     9  29 30
## 153    20     223 11.5   68     9  30 30

It was nice to add this. But I need also a pivot table with only observation count per month.

Task: Create a summary table which includes counts of observations and monthly average temp.

# Createa a pivot table as monthly_observation_count
monthly_observation_count <- airquality %>%
  group_by(Month) %>%
  count()
# Show the result of monthly_observation_count
monthly_observation_count

## # A tibble: 5 × 2
## # Groups:   Month [5]
##   Month     n
##   <int> <int>
## 1     5    31
## 2     6    30
## 3     7    31
## 4     8    31
## 5     9    30

Task: Can you put the average temp also to the table?

# Calculate and record as monthly_observation_count
monthly_observation_count <- airquality %>%
  group_by(Month) %>%
  summarise(Nr_of_Obs = n(), 
            Avg_Temp = mean(Temp))
# Show the result of monthly_observation_count
monthly_observation_count

## # A tibble: 5 × 3
##   Month Nr_of_Obs Avg_Temp
##   <int>     <int>    <dbl>
## 1     5        31     65.5
## 2     6        30     79.1
## 3     7        31     83.9
## 4     8        31     84.0
## 5     9        30     76.9

Some More Exercise!

Task: Group the July air quality data based on temperature category, calculate the average ozone levels and maximum wind speed for each category, and arrange the data based on average ozone levels in descending order.

# Group by temperature category, summarize average ozone levels and max wind speed, and arrange by average ozone levels as grouped_summarized_arranged_data
grouped_summarized_arranged_data <- july_air_quality %>%
  mutate(Temp_Category = ifelse(Temp >= 80, "Hot", "Moderate")) %>%
  group_by(Temp_Category) %>%
  summarise(avg_ozone = mean(Ozone, na.rm = TRUE), 
            max_wind = max(Wind, na.rm = TRUE)) %>%
  arrange(desc(avg_ozone))
# Show the result of grouped_summarized_arranged_data
grouped_summarized_arranged_data

## # A tibble: 2 × 3
##   Temp_Category avg_ozone max_wind
##   <chr>             <dbl>    <dbl>
## 1 Hot                63.1     15.5
## 2 Moderate           13       14.3

Task: Find the average hp by cylinder for the cars above mpg over 25 and arrange them in descending order.

# Filter cars with high MPG, group by number of cylinders, and calculate average horsepower as avg_hp_by_cyl
avg_hp_by_cyl <- mtcars %>%
  filter(mpg >= 25) %>%
  group_by(cyl) %>%
  summarise(avg_hp = mean(hp)) %>%
  arrange(desc(avg_hp))
# Show avg_hp_by_cyl
avg_hp_by_cyl

## # A tibble: 1 × 2
##     cyl avg_hp
##   <dbl>  <dbl>
## 1     4   75.5

Wrap Up and Conclusion

In this practice file, we practiced the fundamental dplyr functions which are quite useful for data summaries, filtering and transforming the data. Yet, these are not the only functions you can use. You can check this cheat sheet as a handy memory jogger and to see and practice the other functions. Also you should check the dplyr documentation in detail for more…

Why don’t you try those functions in another embedded R dataset called iris ?

Good luck!

Project Dplyr - Basic Dplyr Functions and Practices

Burak Demirtas

2023-06-05