This homework has two parts. Part 1 uses base R to inspect a dataframe. Part 2 uses dplyr to wrangle a different dataset.
Download StudentSurvey.csv from the Datasets folder on
Blackboard. Save it next to this Rmd and set your working directory.
# Load the file
survey <- read.csv("StudentSurvey.csv")
# Q1. Check the head of the dataset
head(survey)
## Year Sex Smoke Award HigherSAT Exercise TV Height Weight Siblings
## 1 Senior M No Olympic Math 10 1 71 180 4
## 2 Sophomore F Yes Academy Math 4 7 66 120 2
## 3 FirstYear M No Nobel Math 14 5 72 208 2
## 4 Junior M No Nobel Math 3 1 63 110 1
## 5 Sophomore F No Nobel Verbal 3 3 65 150 1
## 6 Sophomore F No Nobel Verbal 5 4 65 114 2
## BirthOrder VerbalSAT MathSAT SAT GPA Pulse Piercings
## 1 4 540 670 1210 3.13 54 0
## 2 2 520 630 1150 2.50 66 3
## 3 1 550 560 1110 2.55 130 0
## 4 1 490 630 1120 3.10 78 0
## 5 1 720 450 1170 2.70 40 6
## 6 2 600 550 1150 3.20 80 4
# Q2. Check the dimensions
dim(survey)
## [1] 362 17
# Q3. Create a table of students' sex and HigherSAT
data.frame(survey$Sex, survey$HigherSAT)
## survey.Sex survey.HigherSAT
## 1 M Math
## 2 F Math
## 3 M Math
## 4 M Math
## 5 F Verbal
## 6 F Verbal
## 7 F Math
## 8 M Math
## 9 F Verbal
## 10 F Math
## 11 F Math
## 12 M Math
## 13 M Math
## 14 F Verbal
## 15 M Verbal
## 16 F Math
## 17 F Verbal
## 18 M Verbal
## 19 F Math
## 20 F Verbal
## 21 F Math
## 22 F Math
## 23 M Math
## 24 M Math
## 25 M Math
## 26 M Math
## 27 F Verbal
## 28 M Math
## 29 M Math
## 30 M Verbal
## 31 M Math
## 32 F Math
## 33 M Math
## 34 M Verbal
## 35 F Math
## 36 F Math
## 37 M Math
## 38 F Math
## 39 F Verbal
## 40 F Verbal
## 41 M Verbal
## 42 F Verbal
## 43 F Verbal
## 44 F Math
## 45 M Math
## 46 M Verbal
## 47 M Verbal
## 48 M Math
## 49 M Math
## 50 F Math
## 51 M Verbal
## 52 F Math
## 53 M Math
## 54 M Math
## 55 M Verbal
## 56 M Verbal
## 57 F Math
## 58 F Math
## 59 F Math
## 60 F Verbal
## 61 M Math
## 62 M Math
## 63 F Verbal
## 64 M Verbal
## 65 M Math
## 66 F Math
## 67 M Verbal
## 68 M Verbal
## 69 M Verbal
## 70 M Verbal
## 71 F Math
## 72 F Verbal
## 73 F Verbal
## 74 F Verbal
## 75 F Math
## 76 F Verbal
## 77 F
## 78 F Math
## 79 F Math
## 80 M Math
## 81 M Math
## 82 F Math
## 83 F Math
## 84 F Verbal
## 85 F Math
## 86 F Math
## 87 M Verbal
## 88 F Verbal
## 89 M Verbal
## 90 M Math
## 91 M Math
## 92 F Verbal
## 93 F Verbal
## 94 M Math
## 95 M Math
## 96 M Math
## 97 F Verbal
## 98 M Verbal
## 99 M Math
## 100 M Math
## 101 F Verbal
## 102 F Math
## 103 M Math
## 104 M Math
## 105 M Math
## 106 F Verbal
## 107 M Math
## 108 M Math
## 109 F Math
## 110 M Math
## 111 M Math
## 112 M Math
## 113 F Verbal
## 114 M Math
## 115 F Verbal
## 116 F Verbal
## 117 F Verbal
## 118 M
## 119 M Math
## 120 F Verbal
## 121 F Verbal
## 122 M Math
## 123 M Verbal
## 124 M Math
## 125 F Verbal
## 126 M Math
## 127 M Verbal
## 128 M Math
## 129 F Verbal
## 130 M Verbal
## 131 F Math
## 132 F Verbal
## 133 F Math
## 134 F Verbal
## 135 F Math
## 136 F Math
## 137 M Math
## 138 F Math
## 139 F Verbal
## 140 F Math
## 141 M Math
## 142 F Verbal
## 143 M Math
## 144 M Verbal
## 145 F Verbal
## 146 F Math
## 147 F Math
## 148 M Math
## 149 M Math
## 150 F Math
## 151 F Math
## 152 M Verbal
## 153 F Math
## 154 M Math
## 155 M Verbal
## 156 F Math
## 157 F Math
## 158 F Verbal
## 159 F Verbal
## 160 F Verbal
## 161 M Verbal
## 162 F Verbal
## 163 M Math
## 164 M Math
## 165 M Verbal
## 166 F Verbal
## 167 M Verbal
## 168 M Math
## 169 M Verbal
## 170 M Math
## 171 F Verbal
## 172 F Verbal
## 173 F Math
## 174 F Verbal
## 175 F Verbal
## 176 M Math
## 177 M Math
## 178 F Math
## 179 M Math
## 180 F Math
## 181 M Math
## 182 F Verbal
## 183 F Math
## 184 F Math
## 185 F Verbal
## 186 M Math
## 187 M Math
## 188 F
## 189 M Math
## 190 M Math
## 191 M Math
## 192 F Math
## 193 M Verbal
## 194 F Math
## 195 M Math
## 196 M Math
## 197 F Math
## 198 M Math
## 199 M Math
## 200 F Math
## 201 F Verbal
## 202 M Math
## 203 F Math
## 204 F Math
## 205 F Verbal
## 206 F Math
## 207 M Math
## 208 M Math
## 209 M Math
## 210 M Verbal
## 211 F Math
## 212 M Verbal
## 213 F Math
## 214 F Math
## 215 M Math
## 216 F Verbal
## 217 F Math
## 218 F Math
## 219 M Math
## 220 M Math
## 221 M Math
## 222 M Math
## 223 M Math
## 224 M Verbal
## 225 M Math
## 226 F Math
## 227 F Math
## 228 M Verbal
## 229 F Verbal
## 230 F Math
## 231 F Verbal
## 232 M Math
## 233 M Verbal
## 234 M Math
## 235 F Verbal
## 236 F Verbal
## 237 M Verbal
## 238 M Verbal
## 239 M Math
## 240 M Verbal
## 241 F Verbal
## 242 M Verbal
## 243 M Math
## 244 F Verbal
## 245 M Verbal
## 246 F Verbal
## 247 M Math
## 248 M Verbal
## 249 M Math
## 250 M Math
## 251 M Math
## 252 M Verbal
## 253 M Math
## 254 M Verbal
## 255 M Verbal
## 256 F Verbal
## 257 F Verbal
## 258 M Verbal
## 259 M Math
## 260 M Math
## 261 M Verbal
## 262 M Math
## 263 F Verbal
## 264 M
## 265 F Math
## 266 F Verbal
## 267 M Math
## 268 M Verbal
## 269 M Verbal
## 270 F Verbal
## 271 M Math
## 272 F Verbal
## 273 F Math
## 274 F Verbal
## 275 F Verbal
## 276 F Verbal
## 277 M Math
## 278 M Verbal
## 279 F Math
## 280 M Math
## 281 M Verbal
## 282 F Math
## 283 M Math
## 284 F Verbal
## 285 F Math
## 286 F Verbal
## 287 M Math
## 288 F Verbal
## 289 M Math
## 290 M Verbal
## 291 F Verbal
## 292 M Verbal
## 293 F Math
## 294 F Verbal
## 295 M Math
## 296 F
## 297 M Math
## 298 F Math
## 299 M Verbal
## 300 M Math
## 301 F Verbal
## 302 M Math
## 303 M Math
## 304 M Verbal
## 305 F Verbal
## 306 M Math
## 307 M Verbal
## 308 F
## 309 M Math
## 310 F Verbal
## 311 M Math
## 312 M Math
## 313 F Verbal
## 314 F Math
## 315 F Math
## 316 F Verbal
## 317 M Verbal
## 318 F Verbal
## 319 M Verbal
## 320 F Math
## 321 M Math
## 322 M Verbal
## 323 M Verbal
## 324 F Math
## 325 F Math
## 326 M Math
## 327 F Math
## 328 M Verbal
## 329 M Math
## 330 M Math
## 331 M Math
## 332 M Math
## 333 F Math
## 334 M Verbal
## 335 M Verbal
## 336 M Math
## 337 F Verbal
## 338 M Verbal
## 339 M
## 340 M Verbal
## 341 F Verbal
## 342 M Verbal
## 343 M Math
## 344 M Math
## 345 M Math
## 346 F Verbal
## 347 M Math
## 348 M Math
## 349 F Verbal
## 350 M Math
## 351 F Math
## 352 M Math
## 353 F Math
## 354 M Math
## 355 F Math
## 356 F Math
## 357 M Verbal
## 358 F Verbal
## 359 M Math
## 360 F Verbal
## 361 M Verbal
## 362 F Math
# Q4. Display summary statistics for VerbalSAT
summary(survey$VerbalSAT)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 390.0 550.0 600.0 594.2 640.0 800.0
# Q5. Find the average GPA of students
mean(survey$GPA)
## [1] NA
# Q6. Create a new dataframe called column_df that contains students' weight
# and number of hours they exercise.
column_df2 <- survey["Weight", "Exercise"]
# Q7. Access the fourth element in the first column of the StudentSurvey dataset.
survey[4,1]
## [1] "Junior"
Don’t change this chunk — it loads and filters the dataset.
olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')
olympic_gymnasts <- olympics |>
filter(!is.na(age)) |>
filter(sport == "Gymnastics") |>
mutate(
medalist = case_when(
is.na(medal) ~ FALSE,
!is.na(medal) ~ TRUE
)
)
More info on the data: https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md
# Q8. Create a subset dataframe with these columns only: name, sex, age, team, year, medalist.
# Call it df.
df <- olympic_gymnasts |> select(name, sex, age, team, year, medal)
# Q9. From df, create df2 that only has the years 2008, 2012, and 2016.
df2 <-df |> filter(year %in% c(2008, 2012, 2016))
# Q10. Group by those three years and summarize the mean age in each group.
olympic_gymnasts |> group_by(year) |> filter(year %in% c(2008, 2012, 2016)) |> summarize(mean(age))
## # A tibble: 3 × 2
## year `mean(age)`
## <dbl> <dbl>
## 1 2008 21.6
## 2 2012 21.9
## 3 2016 22.2
# Q11. Using the full olympic_gymnasts dataset, group by year and find the mean age
# for each year. Call this oly_year.
# (Bonus: find the minimum average age across years.)
olympic_gymnasts |> group_by(year) |> summarize(mean(age))
## # A tibble: 29 × 2
## year `mean(age)`
## <dbl> <dbl>
## 1 1896 24.3
## 2 1900 22.2
## 3 1904 25.1
## 4 1906 24.7
## 5 1908 23.2
## 6 1912 24.2
## 7 1920 26.7
## 8 1924 27.6
## 9 1928 25.6
## 10 1932 23.9
## # ℹ 19 more rows
# Q12. Open-ended: come up with a question that requires at least TWO dplyr verbs.
# Write the question, then the code that answers it. Below the chunk, briefly
# explain why you chose this question.
olympic_gymnasts |> group_by(team) |> filter(medal %in% c("Bronze"))
## # A tibble: 675 × 16
## # Groups: team [33]
## id name sex age height weight team noc games year season city
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 17 Paavo J… M 28 175 64 Finl… FIN 1948… 1948 Summer Lond…
## 2 17 Paavo J… M 32 175 64 Finl… FIN 1952… 1952 Summer Hels…
## 3 455 Denis M… M 19 161 62 Russ… RUS 2012… 2012 Summer Lond…
## 4 455 Denis M… M 24 161 62 Russ… RUS 2016… 2016 Summer Rio …
## 5 610 Ginko A… F 26 148 46 Japan JPN 1964… 1964 Summer Tokyo
## 6 627 Andreea… F 16 150 40 Roma… ROU 2008… 2008 Summer Beij…
## 7 1109 Lavinia… F 16 148 40 Roma… ROU 1984… 1984 Summer Los …
## 8 1485 Yutaka … M 21 156 55 Japan JPN 1992… 1992 Summer Barc…
## 9 3249 Maksim … M 21 166 63 Russ… RUS 2000… 2000 Summer Sydn…
## 10 3281 Simona … F 16 158 44 Roma… ROU 1996… 1996 Summer Atla…
## # ℹ 665 more rows
## # ℹ 4 more variables: sport <chr>, event <chr>, medal <chr>, medalist <lgl>
Your question and reflection: ** My question is what teams won the Bronze medal? This could help extrapolate specific data and examine the compositon of a team.