In this exercise document, we will explore the
airquality
and mtcars
datasets, using the
dplyr package. Our goal is to demonstrate the power of dplyr functions
for data manipulation by applying various operations such as filtering,
selecting, grouping, summarizing, and arranging on both datasets. We
will dive into the datasets, extract insights, and make meaningful
observations using these powerful functions.
Let’s load both the airquality and mtcars datasets into R to begin.
# Load the Needed Libraries
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Load the 'airquality' dataset
data(airquality)
# Load the 'mtcars' dataset
data(mtcars)
# Check the data structure
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
str(airquality)
## 'data.frame': 153 obs. of 6 variables:
## $ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
## $ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
## $ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
## $ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
## $ Month : int 5 5 5 5 5 5 5 5 5 5 ...
## $ Day : int 1 2 3 4 5 6 7 8 9 10 ...
Before going the dplyr practice, we need to explain what the pipe operator is.
The pipe operator, represented by %>%, is a powerful tool introduced by the dplyr package in R. It allows you to chain multiple operations together in a more readable and efficient manner. With the pipe operator, you can pass the output of one function as the first argument to another function, reducing the need for intermediate assignments and making the code easier to follow.
The syntax of the pipe operator is as follows:
output <- input %>% function1() %>% function2() %>% ... %>% functionN()
Explanation:
input
: This is the initial data or
object that serves as the input to the first function in the
chain.
%>%
: The pipe operator takes the
output from the left-hand side (LHS) and passes it as the first argument
to the function on the right-hand side (RHS).
function1()
: This is the first
dplyr function that will be applied to the
input
data.
function2()
: This is the second
dplyr function, which takes the output of
function1()
as its input.
...
: You can continue chaining as
many dplyr functions as needed, and each function will operate on the
output of the previous one.
functionN()
: This is the last dplyr
function in the chain, and its output will be stored in the
output
variable.
Now we can start with the Filter and Select functions in dplyr.
Task: Use the filter function to extract data for the month of July from the ‘airquality’ dataset and select specific columns for analysis.
# Filter data for the month of July and select Ozone, Temp, Wind only
july_air_quality <- airquality %>%
filter(Month == 7) %>%
select(Ozone, Temp, Wind)
# Show the first 6 rows of the july_air_quality
head(july_air_quality)
## Ozone Temp Wind
## 1 135 84 4.1
## 2 49 85 9.2
## 3 32 81 9.2
## 4 NA 84 10.9
## 5 64 83 4.6
## 6 40 83 10.9
Task: Filter the records with both month = 7 and temperature over 80.
# Do the same as previous, but for the temperatures over 80
july_air_quality <- airquality %>%
filter(Month == 7 & Temp >80) %>%
select(Ozone, Temp, Wind)
# Show the first 6 rows of the july_air_quality
head(july_air_quality)
## Ozone Temp Wind
## 1 135 84 4.1
## 2 49 85 9.2
## 3 32 81 9.2
## 4 NA 84 10.9
## 5 64 83 4.6
## 6 40 83 10.9
Task: Oh, my bad. I was gonna say month = 7 OR temperature over 80. Sorry. Can you revise it?
# Do the same as previous, but with the difference of 'OR'
july_air_quality <- airquality %>%
filter(Month == 7 | Temp >80) %>%
select(Ozone, Temp, Wind)
# Show the first 6 rows of the july_air_quality
head(july_air_quality)
## Ozone Temp Wind
## 1 45 81 14.9
## 2 NA 84 9.2
## 3 NA 85 8.6
## 4 29 82 9.7
## 5 NA 87 6.9
## 6 71 90 13.8
Perfect! Let’s do more filtering! But this time, on MTCARS!
Task: I’m more interested in Merc
240 and Mazda RX4 cars. Can you
filter them in mtcars dataset? I believe they have Merc
in
their name. Also, you need to convert row names as column.
# 1 Convert row names to a new column 'Car_Name'
mtcars_with_car_name <- mtcars %>%
mutate(Car_Name = rownames(.))
# Show the first 6 rows
head(mtcars_with_car_name)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
## Car_Name
## Mazda RX4 Mazda RX4
## Mazda RX4 Wag Mazda RX4 Wag
## Datsun 710 Datsun 710
## Hornet 4 Drive Hornet 4 Drive
## Hornet Sportabout Hornet Sportabout
## Valiant Valiant
# 2 Filter the cars with specific countries in the name
selected_brands <- c("Merc 240D", "Mazda RX4")
Merc_and_RX4 <- mtcars_with_car_name %>%
filter(Car_Name %in% selected_brands)
print(Merc_and_RX4)
## mpg cyl disp hp drat wt qsec vs am gear carb Car_Name
## Mazda RX4 21.0 6 160.0 110 3.90 2.62 16.46 0 1 4 4 Mazda RX4
## Merc 240D 24.4 4 146.7 62 3.69 3.19 20.00 1 0 4 2 Merc 240D
Task: I want to see the most powerful cars in the data! Filter only the top 3 cars by HP!
# Filter the most powerful ones as the_muscles
the_muscles <- mtcars %>%
select(hp) %>%
top_n(3, hp)
# Show "the_muscles"
the_muscles
## hp
## Duster 360 245
## Camaro Z28 245
## Ford Pantera L 264
## Maserati Bora 335
Task: Also arrange the top 3 cars by hp!
# Arrange the_muscles ascending
the_muscles %>% arrange(hp)
## hp
## Duster 360 245
## Camaro Z28 245
## Ford Pantera L 264
## Maserati Bora 335
Task: No this is what I want! I want them to be arranged in DESCending order!
# Arrange the_muscles descending
the_muscles %>% arrange(desc(hp))
## hp
## Maserati Bora 335
## Ford Pantera L 264
## Duster 360 245
## Camaro Z28 245
Wow that was great! You are truly a DPLYR wizard! 🪄
But we are not done yet! I have still tasks for you…
Task : Group the cars by the number of cylinders and calculate the average MPG for each group
# Group the cars by the number of cyls and calculate the avg MPG for each
grouped_avg_mpg_by_cyl <- mtcars %>%
group_by(cyl) %>%
summarize(avg_mpg = mean(mpg))
# Show grouped_avg_mpg_by_cyl
grouped_avg_mpg_by_cyl
## # A tibble: 3 × 2
## cyl avg_mpg
## <dbl> <dbl>
## 1 4 26.7
## 2 6 19.7
## 3 8 15.1
Task: I know it doesn’t make a lot of sense but, I really wonder the total and average hp of small cars engine cars (4cyl) in our dataset. Can you calculate?
# Calculate and record as mtcars_hps
mtcars_hps <- mtcars %>%
filter(cyl == 4) %>%
summarise(total_hp=sum(hp), avg_hp = mean(hp))
# Show the result of mtcars_hps
mtcars_hps
## total_hp avg_hp
## 1 909 82.63636
Did you realize that summarize created new columns but in a grouped way. Wha if I just need a new column based on an existing column?
Task: MPG is good for U.S but not too clear for Europeans. Let’s add a column for them and convert MPG to liters per 100kms.
(Use this formula = 235.215 / mpg)
# 5 Convert MPG to L/100km using mutate, record as mtcars_converted
mtcars_converted <- mtcars %>%
mutate(L_per_100km = 235.215 / mpg)
# Show the mtcars_converted
head(mtcars_converted)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
## L_per_100km
## Mazda RX4 11.20071
## Mazda RX4 Wag 11.20071
## Datsun 710 10.31645
## Hornet 4 Drive 10.99136
## Hornet Sportabout 12.57834
## Valiant 12.99530
Nice! Much better for internationals! 👍😎
Yet, L_per_100km
is still not so good, not seem so clear
to me…
Task: Rename it as
Liters_per_100km
.
# Rename with the variable name mtcars_renamed
mtcars_renamed <- mtcars_converted %>%
rename(Liters_per_100km = L_per_100km)
# Show the result
mtcars_renamed
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
## Liters_per_100km
## Mazda RX4 11.200714
## Mazda RX4 Wag 11.200714
## Datsun 710 10.316447
## Hornet 4 Drive 10.991355
## Hornet Sportabout 12.578342
## Valiant 12.995304
## Duster 360 16.448601
## Merc 240D 9.639959
## Merc 230 10.316447
## Merc 280 12.250781
## Merc 280C 13.214326
## Merc 450SE 14.342378
## Merc 450SL 13.596243
## Merc 450SLC 15.474671
## Cadillac Fleetwood 22.616827
## Lincoln Continental 22.616827
## Chrysler Imperial 16.001020
## Fiat 128 7.259722
## Honda Civic 7.737336
## Toyota Corolla 6.938496
## Toyota Corona 10.940233
## Dodge Challenger 15.175161
## AMC Javelin 15.474671
## Camaro Z28 17.685338
## Pontiac Firebird 12.250781
## Fiat X1-9 8.615934
## Porsche 914-2 9.046731
## Lotus Europa 7.737336
## Ford Pantera L 14.887025
## Ferrari Dino 11.939848
## Maserati Bora 15.681000
## Volvo 142E 10.991355
Task: We want to see the counts of observations in each month, can you count them without making a grouping?
# Add the number of observations for each Month as airquality_with_count
airquality_with_count <- airquality %>%
add_count(Month)
# Show the first 6 rows of airquality_with_count
airquality_with_count
## Ozone Solar.R Wind Temp Month Day n
## 1 41 190 7.4 67 5 1 31
## 2 36 118 8.0 72 5 2 31
## 3 12 149 12.6 74 5 3 31
## 4 18 313 11.5 62 5 4 31
## 5 NA NA 14.3 56 5 5 31
## 6 28 NA 14.9 66 5 6 31
## 7 23 299 8.6 65 5 7 31
## 8 19 99 13.8 59 5 8 31
## 9 8 19 20.1 61 5 9 31
## 10 NA 194 8.6 69 5 10 31
## 11 7 NA 6.9 74 5 11 31
## 12 16 256 9.7 69 5 12 31
## 13 11 290 9.2 66 5 13 31
## 14 14 274 10.9 68 5 14 31
## 15 18 65 13.2 58 5 15 31
## 16 14 334 11.5 64 5 16 31
## 17 34 307 12.0 66 5 17 31
## 18 6 78 18.4 57 5 18 31
## 19 30 322 11.5 68 5 19 31
## 20 11 44 9.7 62 5 20 31
## 21 1 8 9.7 59 5 21 31
## 22 11 320 16.6 73 5 22 31
## 23 4 25 9.7 61 5 23 31
## 24 32 92 12.0 61 5 24 31
## 25 NA 66 16.6 57 5 25 31
## 26 NA 266 14.9 58 5 26 31
## 27 NA NA 8.0 57 5 27 31
## 28 23 13 12.0 67 5 28 31
## 29 45 252 14.9 81 5 29 31
## 30 115 223 5.7 79 5 30 31
## 31 37 279 7.4 76 5 31 31
## 32 NA 286 8.6 78 6 1 30
## 33 NA 287 9.7 74 6 2 30
## 34 NA 242 16.1 67 6 3 30
## 35 NA 186 9.2 84 6 4 30
## 36 NA 220 8.6 85 6 5 30
## 37 NA 264 14.3 79 6 6 30
## 38 29 127 9.7 82 6 7 30
## 39 NA 273 6.9 87 6 8 30
## 40 71 291 13.8 90 6 9 30
## 41 39 323 11.5 87 6 10 30
## 42 NA 259 10.9 93 6 11 30
## 43 NA 250 9.2 92 6 12 30
## 44 23 148 8.0 82 6 13 30
## 45 NA 332 13.8 80 6 14 30
## 46 NA 322 11.5 79 6 15 30
## 47 21 191 14.9 77 6 16 30
## 48 37 284 20.7 72 6 17 30
## 49 20 37 9.2 65 6 18 30
## 50 12 120 11.5 73 6 19 30
## 51 13 137 10.3 76 6 20 30
## 52 NA 150 6.3 77 6 21 30
## 53 NA 59 1.7 76 6 22 30
## 54 NA 91 4.6 76 6 23 30
## 55 NA 250 6.3 76 6 24 30
## 56 NA 135 8.0 75 6 25 30
## 57 NA 127 8.0 78 6 26 30
## 58 NA 47 10.3 73 6 27 30
## 59 NA 98 11.5 80 6 28 30
## 60 NA 31 14.9 77 6 29 30
## 61 NA 138 8.0 83 6 30 30
## 62 135 269 4.1 84 7 1 31
## 63 49 248 9.2 85 7 2 31
## 64 32 236 9.2 81 7 3 31
## 65 NA 101 10.9 84 7 4 31
## 66 64 175 4.6 83 7 5 31
## 67 40 314 10.9 83 7 6 31
## 68 77 276 5.1 88 7 7 31
## 69 97 267 6.3 92 7 8 31
## 70 97 272 5.7 92 7 9 31
## 71 85 175 7.4 89 7 10 31
## 72 NA 139 8.6 82 7 11 31
## 73 10 264 14.3 73 7 12 31
## 74 27 175 14.9 81 7 13 31
## 75 NA 291 14.9 91 7 14 31
## 76 7 48 14.3 80 7 15 31
## 77 48 260 6.9 81 7 16 31
## 78 35 274 10.3 82 7 17 31
## 79 61 285 6.3 84 7 18 31
## 80 79 187 5.1 87 7 19 31
## 81 63 220 11.5 85 7 20 31
## 82 16 7 6.9 74 7 21 31
## 83 NA 258 9.7 81 7 22 31
## 84 NA 295 11.5 82 7 23 31
## 85 80 294 8.6 86 7 24 31
## 86 108 223 8.0 85 7 25 31
## 87 20 81 8.6 82 7 26 31
## 88 52 82 12.0 86 7 27 31
## 89 82 213 7.4 88 7 28 31
## 90 50 275 7.4 86 7 29 31
## 91 64 253 7.4 83 7 30 31
## 92 59 254 9.2 81 7 31 31
## 93 39 83 6.9 81 8 1 31
## 94 9 24 13.8 81 8 2 31
## 95 16 77 7.4 82 8 3 31
## 96 78 NA 6.9 86 8 4 31
## 97 35 NA 7.4 85 8 5 31
## 98 66 NA 4.6 87 8 6 31
## 99 122 255 4.0 89 8 7 31
## 100 89 229 10.3 90 8 8 31
## 101 110 207 8.0 90 8 9 31
## 102 NA 222 8.6 92 8 10 31
## 103 NA 137 11.5 86 8 11 31
## 104 44 192 11.5 86 8 12 31
## 105 28 273 11.5 82 8 13 31
## 106 65 157 9.7 80 8 14 31
## 107 NA 64 11.5 79 8 15 31
## 108 22 71 10.3 77 8 16 31
## 109 59 51 6.3 79 8 17 31
## 110 23 115 7.4 76 8 18 31
## 111 31 244 10.9 78 8 19 31
## 112 44 190 10.3 78 8 20 31
## 113 21 259 15.5 77 8 21 31
## 114 9 36 14.3 72 8 22 31
## 115 NA 255 12.6 75 8 23 31
## 116 45 212 9.7 79 8 24 31
## 117 168 238 3.4 81 8 25 31
## 118 73 215 8.0 86 8 26 31
## 119 NA 153 5.7 88 8 27 31
## 120 76 203 9.7 97 8 28 31
## 121 118 225 2.3 94 8 29 31
## 122 84 237 6.3 96 8 30 31
## 123 85 188 6.3 94 8 31 31
## 124 96 167 6.9 91 9 1 30
## 125 78 197 5.1 92 9 2 30
## 126 73 183 2.8 93 9 3 30
## 127 91 189 4.6 93 9 4 30
## 128 47 95 7.4 87 9 5 30
## 129 32 92 15.5 84 9 6 30
## 130 20 252 10.9 80 9 7 30
## 131 23 220 10.3 78 9 8 30
## 132 21 230 10.9 75 9 9 30
## 133 24 259 9.7 73 9 10 30
## 134 44 236 14.9 81 9 11 30
## 135 21 259 15.5 76 9 12 30
## 136 28 238 6.3 77 9 13 30
## 137 9 24 10.9 71 9 14 30
## 138 13 112 11.5 71 9 15 30
## 139 46 237 6.9 78 9 16 30
## 140 18 224 13.8 67 9 17 30
## 141 13 27 10.3 76 9 18 30
## 142 24 238 10.3 68 9 19 30
## 143 16 201 8.0 82 9 20 30
## 144 13 238 12.6 64 9 21 30
## 145 23 14 9.2 71 9 22 30
## 146 36 139 10.3 81 9 23 30
## 147 7 49 10.3 69 9 24 30
## 148 14 20 16.6 63 9 25 30
## 149 30 193 6.9 70 9 26 30
## 150 NA 145 13.2 77 9 27 30
## 151 14 191 14.3 75 9 28 30
## 152 18 131 8.0 76 9 29 30
## 153 20 223 11.5 68 9 30 30
It was nice to add this. But I need also a pivot table with only observation count per month.
Task: Create a summary table which includes counts of observations and monthly average temp.
# Createa a pivot table as monthly_observation_count
monthly_observation_count <- airquality %>%
group_by(Month) %>%
count()
# Show the result of monthly_observation_count
monthly_observation_count
## # A tibble: 5 × 2
## # Groups: Month [5]
## Month n
## <int> <int>
## 1 5 31
## 2 6 30
## 3 7 31
## 4 8 31
## 5 9 30
Task: Can you put the average temp also to the table?
# Calculate and record as monthly_observation_count
monthly_observation_count <- airquality %>%
group_by(Month) %>%
summarise(Nr_of_Obs = n(),
Avg_Temp = mean(Temp))
# Show the result of monthly_observation_count
monthly_observation_count
## # A tibble: 5 × 3
## Month Nr_of_Obs Avg_Temp
## <int> <int> <dbl>
## 1 5 31 65.5
## 2 6 30 79.1
## 3 7 31 83.9
## 4 8 31 84.0
## 5 9 30 76.9
Task: Group the July air quality data based on temperature category, calculate the average ozone levels and maximum wind speed for each category, and arrange the data based on average ozone levels in descending order.
# Group by temperature category, summarize average ozone levels and max wind speed, and arrange by average ozone levels as grouped_summarized_arranged_data
grouped_summarized_arranged_data <- july_air_quality %>%
mutate(Temp_Category = ifelse(Temp >= 80, "Hot", "Moderate")) %>%
group_by(Temp_Category) %>%
summarise(avg_ozone = mean(Ozone, na.rm = TRUE),
max_wind = max(Wind, na.rm = TRUE)) %>%
arrange(desc(avg_ozone))
# Show the result of grouped_summarized_arranged_data
grouped_summarized_arranged_data
## # A tibble: 2 × 3
## Temp_Category avg_ozone max_wind
## <chr> <dbl> <dbl>
## 1 Hot 63.1 15.5
## 2 Moderate 13 14.3
Task: Find the average hp by cylinder for the cars above mpg over 25 and arrange them in descending order.
# Filter cars with high MPG, group by number of cylinders, and calculate average horsepower as avg_hp_by_cyl
avg_hp_by_cyl <- mtcars %>%
filter(mpg >= 25) %>%
group_by(cyl) %>%
summarise(avg_hp = mean(hp)) %>%
arrange(desc(avg_hp))
# Show avg_hp_by_cyl
avg_hp_by_cyl
## # A tibble: 1 × 2
## cyl avg_hp
## <dbl> <dbl>
## 1 4 75.5
In this practice file, we practiced the fundamental dplyr functions which are quite useful for data summaries, filtering and transforming the data. Yet, these are not the only functions you can use. You can check this cheat sheet as a handy memory jogger and to see and practice the other functions. Also you should check the dplyr documentation in detail for more…
Why don’t you try those functions in another embedded R dataset
called iris
?
Good luck!