This homework has two parts. Part 1 uses base R to inspect a dataframe. Part 2 uses dplyr to wrangle a different dataset.


Part 1 — Student Survey (dataframe basics)

Download StudentSurvey.csv from the Datasets folder on Blackboard. Save it next to this Rmd and set your working directory.

# Load the file
survey <- read.csv("StudentSurvey.csv")
# Q1. Check the head of the dataset
# Used the head function.
head(survey, )
##        Year Sex Smoke   Award HigherSAT Exercise TV Height Weight Siblings
## 1    Senior   M    No Olympic      Math       10  1     71    180        4
## 2 Sophomore   F   Yes Academy      Math        4  7     66    120        2
## 3 FirstYear   M    No   Nobel      Math       14  5     72    208        2
## 4    Junior   M    No   Nobel      Math        3  1     63    110        1
## 5 Sophomore   F    No   Nobel    Verbal        3  3     65    150        1
## 6 Sophomore   F    No   Nobel    Verbal        5  4     65    114        2
##   BirthOrder VerbalSAT MathSAT  SAT  GPA Pulse Piercings
## 1          4       540     670 1210 3.13    54         0
## 2          2       520     630 1150 2.50    66         3
## 3          1       550     560 1110 2.55   130         0
## 4          1       490     630 1120 3.10    78         0
## 5          1       720     450 1170 2.70    40         6
## 6          2       600     550 1150 3.20    80         4
# Q2. Check the dimensions
# Used the dim function.
dim(survey)
## [1] 362  17
# Q3. Create a table of students' sex and HigherSAT
# Subset all rows and the vector of "Sex" and "HigherSAT" columns.
survey[,c("Sex", "HigherSAT")]
##     Sex HigherSAT
## 1     M      Math
## 2     F      Math
## 3     M      Math
## 4     M      Math
## 5     F    Verbal
## 6     F    Verbal
## 7     F      Math
## 8     M      Math
## 9     F    Verbal
## 10    F      Math
## 11    F      Math
## 12    M      Math
## 13    M      Math
## 14    F    Verbal
## 15    M    Verbal
## 16    F      Math
## 17    F    Verbal
## 18    M    Verbal
## 19    F      Math
## 20    F    Verbal
## 21    F      Math
## 22    F      Math
## 23    M      Math
## 24    M      Math
## 25    M      Math
## 26    M      Math
## 27    F    Verbal
## 28    M      Math
## 29    M      Math
## 30    M    Verbal
## 31    M      Math
## 32    F      Math
## 33    M      Math
## 34    M    Verbal
## 35    F      Math
## 36    F      Math
## 37    M      Math
## 38    F      Math
## 39    F    Verbal
## 40    F    Verbal
## 41    M    Verbal
## 42    F    Verbal
## 43    F    Verbal
## 44    F      Math
## 45    M      Math
## 46    M    Verbal
## 47    M    Verbal
## 48    M      Math
## 49    M      Math
## 50    F      Math
## 51    M    Verbal
## 52    F      Math
## 53    M      Math
## 54    M      Math
## 55    M    Verbal
## 56    M    Verbal
## 57    F      Math
## 58    F      Math
## 59    F      Math
## 60    F    Verbal
## 61    M      Math
## 62    M      Math
## 63    F    Verbal
## 64    M    Verbal
## 65    M      Math
## 66    F      Math
## 67    M    Verbal
## 68    M    Verbal
## 69    M    Verbal
## 70    M    Verbal
## 71    F      Math
## 72    F    Verbal
## 73    F    Verbal
## 74    F    Verbal
## 75    F      Math
## 76    F    Verbal
## 77    F          
## 78    F      Math
## 79    F      Math
## 80    M      Math
## 81    M      Math
## 82    F      Math
## 83    F      Math
## 84    F    Verbal
## 85    F      Math
## 86    F      Math
## 87    M    Verbal
## 88    F    Verbal
## 89    M    Verbal
## 90    M      Math
## 91    M      Math
## 92    F    Verbal
## 93    F    Verbal
## 94    M      Math
## 95    M      Math
## 96    M      Math
## 97    F    Verbal
## 98    M    Verbal
## 99    M      Math
## 100   M      Math
## 101   F    Verbal
## 102   F      Math
## 103   M      Math
## 104   M      Math
## 105   M      Math
## 106   F    Verbal
## 107   M      Math
## 108   M      Math
## 109   F      Math
## 110   M      Math
## 111   M      Math
## 112   M      Math
## 113   F    Verbal
## 114   M      Math
## 115   F    Verbal
## 116   F    Verbal
## 117   F    Verbal
## 118   M          
## 119   M      Math
## 120   F    Verbal
## 121   F    Verbal
## 122   M      Math
## 123   M    Verbal
## 124   M      Math
## 125   F    Verbal
## 126   M      Math
## 127   M    Verbal
## 128   M      Math
## 129   F    Verbal
## 130   M    Verbal
## 131   F      Math
## 132   F    Verbal
## 133   F      Math
## 134   F    Verbal
## 135   F      Math
## 136   F      Math
## 137   M      Math
## 138   F      Math
## 139   F    Verbal
## 140   F      Math
## 141   M      Math
## 142   F    Verbal
## 143   M      Math
## 144   M    Verbal
## 145   F    Verbal
## 146   F      Math
## 147   F      Math
## 148   M      Math
## 149   M      Math
## 150   F      Math
## 151   F      Math
## 152   M    Verbal
## 153   F      Math
## 154   M      Math
## 155   M    Verbal
## 156   F      Math
## 157   F      Math
## 158   F    Verbal
## 159   F    Verbal
## 160   F    Verbal
## 161   M    Verbal
## 162   F    Verbal
## 163   M      Math
## 164   M      Math
## 165   M    Verbal
## 166   F    Verbal
## 167   M    Verbal
## 168   M      Math
## 169   M    Verbal
## 170   M      Math
## 171   F    Verbal
## 172   F    Verbal
## 173   F      Math
## 174   F    Verbal
## 175   F    Verbal
## 176   M      Math
## 177   M      Math
## 178   F      Math
## 179   M      Math
## 180   F      Math
## 181   M      Math
## 182   F    Verbal
## 183   F      Math
## 184   F      Math
## 185   F    Verbal
## 186   M      Math
## 187   M      Math
## 188   F          
## 189   M      Math
## 190   M      Math
## 191   M      Math
## 192   F      Math
## 193   M    Verbal
## 194   F      Math
## 195   M      Math
## 196   M      Math
## 197   F      Math
## 198   M      Math
## 199   M      Math
## 200   F      Math
## 201   F    Verbal
## 202   M      Math
## 203   F      Math
## 204   F      Math
## 205   F    Verbal
## 206   F      Math
## 207   M      Math
## 208   M      Math
## 209   M      Math
## 210   M    Verbal
## 211   F      Math
## 212   M    Verbal
## 213   F      Math
## 214   F      Math
## 215   M      Math
## 216   F    Verbal
## 217   F      Math
## 218   F      Math
## 219   M      Math
## 220   M      Math
## 221   M      Math
## 222   M      Math
## 223   M      Math
## 224   M    Verbal
## 225   M      Math
## 226   F      Math
## 227   F      Math
## 228   M    Verbal
## 229   F    Verbal
## 230   F      Math
## 231   F    Verbal
## 232   M      Math
## 233   M    Verbal
## 234   M      Math
## 235   F    Verbal
## 236   F    Verbal
## 237   M    Verbal
## 238   M    Verbal
## 239   M      Math
## 240   M    Verbal
## 241   F    Verbal
## 242   M    Verbal
## 243   M      Math
## 244   F    Verbal
## 245   M    Verbal
## 246   F    Verbal
## 247   M      Math
## 248   M    Verbal
## 249   M      Math
## 250   M      Math
## 251   M      Math
## 252   M    Verbal
## 253   M      Math
## 254   M    Verbal
## 255   M    Verbal
## 256   F    Verbal
## 257   F    Verbal
## 258   M    Verbal
## 259   M      Math
## 260   M      Math
## 261   M    Verbal
## 262   M      Math
## 263   F    Verbal
## 264   M          
## 265   F      Math
## 266   F    Verbal
## 267   M      Math
## 268   M    Verbal
## 269   M    Verbal
## 270   F    Verbal
## 271   M      Math
## 272   F    Verbal
## 273   F      Math
## 274   F    Verbal
## 275   F    Verbal
## 276   F    Verbal
## 277   M      Math
## 278   M    Verbal
## 279   F      Math
## 280   M      Math
## 281   M    Verbal
## 282   F      Math
## 283   M      Math
## 284   F    Verbal
## 285   F      Math
## 286   F    Verbal
## 287   M      Math
## 288   F    Verbal
## 289   M      Math
## 290   M    Verbal
## 291   F    Verbal
## 292   M    Verbal
## 293   F      Math
## 294   F    Verbal
## 295   M      Math
## 296   F          
## 297   M      Math
## 298   F      Math
## 299   M    Verbal
## 300   M      Math
## 301   F    Verbal
## 302   M      Math
## 303   M      Math
## 304   M    Verbal
## 305   F    Verbal
## 306   M      Math
## 307   M    Verbal
## 308   F          
## 309   M      Math
## 310   F    Verbal
## 311   M      Math
## 312   M      Math
## 313   F    Verbal
## 314   F      Math
## 315   F      Math
## 316   F    Verbal
## 317   M    Verbal
## 318   F    Verbal
## 319   M    Verbal
## 320   F      Math
## 321   M      Math
## 322   M    Verbal
## 323   M    Verbal
## 324   F      Math
## 325   F      Math
## 326   M      Math
## 327   F      Math
## 328   M    Verbal
## 329   M      Math
## 330   M      Math
## 331   M      Math
## 332   M      Math
## 333   F      Math
## 334   M    Verbal
## 335   M    Verbal
## 336   M      Math
## 337   F    Verbal
## 338   M    Verbal
## 339   M          
## 340   M    Verbal
## 341   F    Verbal
## 342   M    Verbal
## 343   M      Math
## 344   M      Math
## 345   M      Math
## 346   F    Verbal
## 347   M      Math
## 348   M      Math
## 349   F    Verbal
## 350   M      Math
## 351   F      Math
## 352   M      Math
## 353   F      Math
## 354   M      Math
## 355   F      Math
## 356   F      Math
## 357   M    Verbal
## 358   F    Verbal
## 359   M      Math
## 360   F    Verbal
## 361   M    Verbal
## 362   F      Math
# Q4. Display summary statistics for VerbalSAT
# Selected for only the verbalSAT and then used the summary function.
summary(survey$VerbalSAT)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   390.0   550.0   600.0   594.2   640.0   800.0
# Q5. Find the average GPA of students
# Selected for only GPA and then used the mean function. Set to remove NA values and round to 2 decimal places.
round(mean(survey$GPA, na.rm = TRUE),2)
## [1] 3.16
# Q6. Create a new dataframe called column_df that contains students' weight
#     and number of hours they exercise.
#Created a new dataframe by subsetting all rows and the vector of "Weight" and "Exercise" columns.
column_df <- survey[,c("Weight", "Exercise")]
column_df
##     Weight Exercise
## 1      180     10.0
## 2      120      4.0
## 3      208     14.0
## 4      110      3.0
## 5      150      3.0
## 6      114      5.0
## 7      128     10.0
## 8      235     13.0
## 9       NA      3.0
## 10     115     12.0
## 11     140     12.0
## 12     200     10.0
## 13     162     12.0
## 14     135      6.0
## 15     193      9.0
## 16     110     10.0
## 17      99      3.0
## 18     165      7.0
## 19     120      2.0
## 20     154     14.0
## 21     110     10.0
## 22     145     14.0
## 23     195     20.0
## 24     200      7.0
## 25     167     12.0
## 26     175     10.0
## 27     155      6.0
## 28     185     14.0
## 29     190     12.0
## 30     165     10.0
## 31     175      8.0
## 32     126      0.0
## 33     187     10.0
## 34     170      6.0
## 35     158      5.0
## 36     119     24.0
## 37     205      2.0
## 38     129     10.0
## 39     145      6.0
## 40     130      5.0
## 41     215      5.0
## 42     135     12.0
## 43     145      2.0
## 44      98      7.0
## 45     150     15.0
## 46     159      5.0
## 47     174      7.0
## 48     160     15.0
## 49     165      8.0
## 50     161     14.0
## 51     160     14.0
## 52     130      4.0
## 53     175     15.0
## 54     255      4.0
## 55     160     15.0
## 56     160      3.0
## 57      95      3.0
## 58     115     15.0
## 59     120     20.0
## 60     135      3.0
## 61     180      6.0
## 62     155     12.0
## 63     110      4.0
## 64     215     20.0
## 65     175     15.0
## 66     140     10.0
## 67     195     10.0
## 68     185      4.0
## 69     185      9.0
## 70     209     12.0
## 71     145      2.0
## 72     140     15.0
## 73     146     10.0
## 74     130      7.0
## 75     140      3.0
## 76     130      4.0
## 77     140     15.0
## 78     160      8.0
## 79     120      5.0
## 80     150     10.0
## 81     155     15.0
## 82     128      4.0
## 83     143      5.0
## 84     155      6.0
## 85     119     18.0
## 86     138     16.0
## 87     240      4.0
## 88     160      3.0
## 89     191     20.0
## 90     165      5.0
## 91     200     10.0
## 92     125      2.0
## 93     140      4.0
## 94     206     14.0
## 95     275      7.0
## 96     142     12.0
## 97     140     14.0
## 98     145      3.0
## 99     128      5.0
## 100    165     15.0
## 101    140      5.0
## 102    130      4.0
## 103    170      8.0
## 104    160     14.0
## 105    165      5.0
## 106    145     12.0
## 107    155     15.0
## 108    155      3.0
## 109    113     12.0
## 110    155     12.0
## 111    173      6.0
## 112    195      2.0
## 113    120      3.0
## 114    225     24.0
## 115    160      3.0
## 116    120      6.0
## 117    138     12.0
## 118    260     18.0
## 119    150      2.0
## 120    135     12.0
## 121    165      5.0
## 122    142      3.0
## 123    210      3.0
## 124    171     21.0
## 125    150     12.0
## 126    188      3.0
## 127    195     20.0
## 128    230      8.0
## 129    140      2.0
## 130    200     15.0
## 131    140      6.0
## 132    180      2.0
## 133    160      1.0
## 134    135     13.0
## 135    140     12.0
## 136    155     10.0
## 137    235     10.0
## 138    140     10.0
## 139    130      3.0
## 140    125      2.0
## 141    222     10.0
## 142    128      4.0
## 143    183      8.0
## 144    175     18.0
## 145    125      5.0
## 146    156      4.0
## 147    145     20.0
## 148    195     14.0
## 149    185      8.0
## 150    150      4.0
## 151    140      5.0
## 152    150     14.0
## 153    150      4.0
## 154    220      4.0
## 155    195     15.0
## 156    140     12.0
## 157    135     10.0
## 158    138      6.0
## 159    170     12.0
## 160    145     10.0
## 161    135     14.0
## 162    140      5.0
## 163    155     18.0
## 164    155     10.0
## 165    155      8.0
## 166    135      8.0
## 167    165      3.0
## 168    160      5.0
## 169    175      4.0
## 170    183      8.0
## 171    140      3.0
## 172    140     12.0
## 173    110      5.0
## 174    125      5.0
## 175    125     15.0
## 176    180     10.0
## 177    195     10.0
## 178    128     10.0
## 179    172     14.0
## 180    150      6.0
## 181    188      4.0
## 182    110     14.0
## 183    125      5.0
## 184    135     16.0
## 185    130     14.0
## 186    230     10.0
## 187    198     11.0
## 188    127      4.0
## 189    150     12.0
## 190    175      5.0
## 191    160      8.0
## 192    140      8.0
## 193    200      3.0
## 194    130      8.0
## 195    210     20.0
## 196    210     10.0
## 197    145      3.0
## 198    156      4.0
## 199    207      8.0
## 200    150     15.0
## 201    137     14.0
## 202    153     14.0
## 203    175     12.0
## 204    136      6.0
## 205    104      3.0
## 206    122     12.0
## 207    192      5.0
## 208    218      8.0
## 209    170      1.0
## 210    220     10.0
## 211    150      7.0
## 212    135      5.0
## 213    155     10.0
## 214    135     12.0
## 215    140     15.0
## 216    120     10.0
## 217    135      7.0
## 218    135       NA
## 219    165     15.0
## 220    180     10.0
## 221    165     15.0
## 222    175     12.0
## 223    200     10.0
## 224    138     12.0
## 225    265      5.0
## 226    105     14.0
## 227    130      3.0
## 228    165     12.0
## 229    145     15.0
## 230    138      3.0
## 231    193     25.0
## 232    170      5.0
## 233    155      6.0
## 234    155     27.0
## 235    140      8.0
## 236    130      3.0
## 237    180     17.0
## 238    140     40.0
## 239    210      5.0
## 240    150      0.0
## 241    115     27.0
## 242    225      3.0
## 243    170      3.0
## 244    120     11.0
## 245    170      2.0
## 246    130      9.0
## 247    182     12.0
## 248    138     15.0
## 249    180      2.0
## 250    230     20.0
## 251    222     12.0
## 252    145      5.0
## 253    150     10.0
## 254    190      2.0
## 255    195     12.0
## 256    115      5.0
## 257    140      3.0
## 258    155     12.0
## 259    150      8.0
## 260    155      8.0
## 261    130      5.0
## 262    210      6.0
## 263    105      8.0
## 264    192     10.0
## 265    220     18.0
## 266    134     12.0
## 267    160     10.0
## 268    135     10.0
## 269    140      5.0
## 270    135      7.0
## 271    145      3.0
## 272    140      7.0
## 273     NA      3.0
## 274    138      4.0
## 275     NA      8.0
## 276    130      7.0
## 277    210      7.0
## 278    233      3.0
## 279    180      7.0
## 280    185     14.0
## 281    189      5.0
## 282    127     14.0
## 283    135      5.0
## 284    140      5.0
## 285    162      2.0
## 286    215      1.0
## 287    200     15.0
## 288    120      5.0
## 289    165      5.0
## 290    175     15.0
## 291    160      5.0
## 292    145     20.0
## 293    180      3.0
## 294    170      7.0
## 295    235     20.0
## 296    155     10.0
## 297    140      6.0
## 298    145      6.0
## 299    175     10.0
## 300    210      9.0
## 301    130      1.5
## 302    195     20.0
## 303    165      8.0
## 304    180     21.0
## 305    145     10.0
## 306    163      8.0
## 307    160     10.0
## 308    150     18.0
## 309    170     15.0
## 310    110      8.0
## 311    170      6.0
## 312    145     12.0
## 313    160     10.0
## 314    130      7.0
## 315    155      5.0
## 316     NA      6.0
## 317    160      5.0
## 318    151      5.0
## 319    180     30.0
## 320    130      5.0
## 321    185     12.0
## 322    198      0.0
## 323    175     12.0
## 324    123      0.0
## 325    145     15.0
## 326    190     18.0
## 327    130     12.0
## 328    185      2.0
## 329    165     12.0
## 330    165      3.0
## 331    150      3.0
## 332    160      4.0
## 333    142     12.0
## 334    165     25.0
## 335    175      3.0
## 336    175      6.0
## 337    190      7.0
## 338    180      7.0
## 339    185      6.0
## 340     NA      6.0
## 341    135     13.0
## 342    195     25.0
## 343    175      8.0
## 344    165      5.0
## 345    135     11.0
## 346    140     18.0
## 347    182     10.0
## 348    155      6.0
## 349    180      2.0
## 350    170      5.0
## 351    135      5.0
## 352    165      6.0
## 353    137     10.0
## 354    147      4.0
## 355    150      5.0
## 356    155     17.0
## 357    160      7.0
## 358    130      2.0
## 359    180      8.0
## 360    150      1.0
## 361    205     14.0
## 362    115     12.0
# Q7. Access the fourth element in the first column of the StudentSurvey dataset.
#Subset just row 4, column 1.
survey[4,1]
## [1] "Junior"

Part 2 — Olympic Gymnasts (dplyr)

Don’t change this chunk — it loads and filters the dataset.

olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')

olympic_gymnasts <- olympics |>
  filter(!is.na(age)) |>
  filter(sport == "Gymnastics") |>
  mutate(
    medalist = case_when(
      is.na(medal) ~ FALSE,
      !is.na(medal) ~ TRUE
    )
  )

More info on the data: https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md

# Q8. Create a subset dataframe with these columns only: name, sex, age, team, year, medalist.
#     Call it df.
df <- olympics %>%
  select(name, sex, age, team, year, medal)

head(df)
## # A tibble: 6 × 6
##   name                     sex     age team            year medal
##   <chr>                    <chr> <dbl> <chr>          <dbl> <chr>
## 1 A Dijiang                M        24 China           1992 <NA> 
## 2 A Lamusi                 M        23 China           2012 <NA> 
## 3 Gunnar Nielsen Aaby      M        24 Denmark         1920 <NA> 
## 4 Edgar Lindenau Aabye     M        34 Denmark/Sweden  1900 Gold 
## 5 Christine Jacoba Aaftink F        21 Netherlands     1988 <NA> 
## 6 Christine Jacoba Aaftink F        21 Netherlands     1988 <NA>
# Q9. From df, create df2 that only has the years 2008, 2012, and 2016.
df2 <- df %>%
  filter(year%in%c(2008, 2012, 2016))

df2
## # A tibble: 40,210 × 6
##    name                               sex     age team     year medal
##    <chr>                              <chr> <dbl> <chr>   <dbl> <chr>
##  1 A Lamusi                           M        23 China    2012 <NA> 
##  2 Ragnhild Margrethe Aamodt          F        27 Norway   2008 Gold 
##  3 Andreea Aanei                      F        22 Romania  2016 <NA> 
##  4 Jamale (Djamel-) Aarrass (Ahrass-) M        30 France   2012 <NA> 
##  5 Abdelhak Aatakni                   M        24 Morocco  2012 <NA> 
##  6 Moonika Aava                       F        28 Estonia  2008 <NA> 
##  7 Nstor Abad Sanjun                  M        23 Spain    2016 <NA> 
##  8 Nstor Abad Sanjun                  M        23 Spain    2016 <NA> 
##  9 Nstor Abad Sanjun                  M        23 Spain    2016 <NA> 
## 10 Nstor Abad Sanjun                  M        23 Spain    2016 <NA> 
## # ℹ 40,200 more rows
# Q10. Group by those three years and summarize the mean age in each group.
df2 %>%
  group_by(year) %>%
  summarize(avg_age=(round(mean(age, na.rm=TRUE),2)))
## # A tibble: 3 × 2
##    year avg_age
##   <dbl>   <dbl>
## 1  2008    25.7
## 2  2012    26.0
## 3  2016    26.2
# Q11. Using the full olympic_gymnasts dataset, group by year and find the mean age
#      for each year. Call this oly_year.
#      (Bonus: find the minimum average age across years.)
oly_year <- olympics %>%
  group_by(year) %>%
  summarize(avg_age=(round(mean(age, na.rm=TRUE),2))) %>%
  arrange(avg_age)
oly_year
## # A tibble: 35 × 2
##     year avg_age
##    <dbl>   <dbl>
##  1  1896    23.6
##  2  1980    23.7
##  3  1976    23.8
##  4  1984    23.9
##  5  1988    24.1
##  6  1968    24.2
##  7  1972    24.3
##  8  1992    24.3
##  9  1994    24.4
## 10  1996    24.9
## # ℹ 25 more rows
# Q12. Open-ended: come up with a question that requires at least TWO dplyr verbs.
#      Write the question, then the code that answers it. Below the chunk, briefly
#      explain why you chose this question.

olympics %>% 
  group_by(medal) %>%
  summarize(avg_age=(round(mean(age, na.rm=TRUE),2))) %>%
  arrange(desc(avg_age))
## # A tibble: 4 × 2
##   medal  avg_age
##   <chr>    <dbl>
## 1 Silver    26  
## 2 Gold      25.9
## 3 Bronze    25.9
## 4 <NA>      25.5

Your question and reflection: Question: What is the average age of olympic medalists by medal color (or no medal)?

This question allowed me to use the group_by, summarize, and arrange dplyr functions.

I chose this question because I am curious about if older athletes are more or less likely to win goal than younger athletes. I found almost no difference, which I thought was interesting.