dplyr
basicsdplyrDuring ANLY 512 we will be studying the theory and practice of
data visualization. We will be using R and the
packages within R to assemble data and construct many
different types of visualizations. Before we begin studying data
visualizations we need to develop some data wrangling skills. We will
use these skills to wrangle our data into a form that we can use for
visualizations.
The objective of this assignment is to introduce you to R Studio,
Rmarkdown, the tidyverse and more specifically the dplyr
package.
Each question is worth 5 points.
To submit this homework you will create the document in Rstudio, using the knitr package (button included in Rstudio) and then submit the document to your Rpubs account. Once uploaded you will submit the link to that document on Canvas. Please make sure that this link is hyper linked and that I can see the visualization and the code required to create it.
Question #1
Use the nycflights13 package and the flights data frame to answer the following questions: a.What month had the highest proportion of cancelled flights? February has the highest proportion of cancelled flights. b.What month had the lowest? October had the lowest proportion of cancelled flights.
library(nycflights13)
summary(flights)
## year month day dep_time sched_dep_time
## Min. :2013 Min. : 1.000 Min. : 1.00 Min. : 1 Min. : 106
## 1st Qu.:2013 1st Qu.: 4.000 1st Qu.: 8.00 1st Qu.: 907 1st Qu.: 906
## Median :2013 Median : 7.000 Median :16.00 Median :1401 Median :1359
## Mean :2013 Mean : 6.549 Mean :15.71 Mean :1349 Mean :1344
## 3rd Qu.:2013 3rd Qu.:10.000 3rd Qu.:23.00 3rd Qu.:1744 3rd Qu.:1729
## Max. :2013 Max. :12.000 Max. :31.00 Max. :2400 Max. :2359
## NA's :8255
## dep_delay arr_time sched_arr_time arr_delay
## Min. : -43.00 Min. : 1 Min. : 1 Min. : -86.000
## 1st Qu.: -5.00 1st Qu.:1104 1st Qu.:1124 1st Qu.: -17.000
## Median : -2.00 Median :1535 Median :1556 Median : -5.000
## Mean : 12.64 Mean :1502 Mean :1536 Mean : 6.895
## 3rd Qu.: 11.00 3rd Qu.:1940 3rd Qu.:1945 3rd Qu.: 14.000
## Max. :1301.00 Max. :2400 Max. :2359 Max. :1272.000
## NA's :8255 NA's :8713 NA's :9430
## carrier flight tailnum origin
## Length:336776 Min. : 1 Length:336776 Length:336776
## Class :character 1st Qu.: 553 Class :character Class :character
## Mode :character Median :1496 Mode :character Mode :character
## Mean :1972
## 3rd Qu.:3465
## Max. :8500
##
## dest air_time distance hour
## Length:336776 Min. : 20.0 Min. : 17 Min. : 1.00
## Class :character 1st Qu.: 82.0 1st Qu.: 502 1st Qu.: 9.00
## Mode :character Median :129.0 Median : 872 Median :13.00
## Mean :150.7 Mean :1040 Mean :13.18
## 3rd Qu.:192.0 3rd Qu.:1389 3rd Qu.:17.00
## Max. :695.0 Max. :4983 Max. :23.00
## NA's :9430
## minute time_hour
## Min. : 0.00 Min. :2013-01-01 05:00:00.00
## 1st Qu.: 8.00 1st Qu.:2013-04-04 13:00:00.00
## Median :29.00 Median :2013-07-03 10:00:00.00
## Mean :26.23 Mean :2013-07-03 05:22:54.64
## 3rd Qu.:44.00 3rd Qu.:2013-10-01 07:00:00.00
## Max. :59.00 Max. :2013-12-31 23:00:00.00
##
# 1a. and 1b.
flights_cancelled=flights %>%
group_by(month) %>%
summarize(cancelled_num = sum(is.na(arr_delay)), total_num = n(), cancelled_prop = cancelled_num/total_num) %>%
arrange(desc(cancelled_prop))
flights_cancelled
## # A tibble: 12 × 4
## month cancelled_num total_num cancelled_prop
## <int> <int> <int> <dbl>
## 1 2 1340 24951 0.0537
## 2 6 1168 28243 0.0414
## 3 12 1115 28135 0.0396
## 4 7 1132 29425 0.0385
## 5 3 932 28834 0.0323
## 6 4 766 28330 0.0270
## 7 5 668 28796 0.0232
## 8 1 606 27004 0.0224
## 9 9 564 27574 0.0205
## 10 8 571 29327 0.0195
## 11 11 297 27268 0.0109
## 12 10 271 28889 0.00938
Question #2
Consider the following pipeline:
library(tidyverse)
mtcars %>%
group_by(cyl) %>%
summarize(avg_mpg = mean(mpg)) %>%
filter(am == 1)
summary(mtcars)
mtcars %>%
filter(am == 1)%>%
group_by(cyl)%>%
summarize(avg_mph = mean(mpg))
What is the problem with this pipeline?
In this pipeline, the filter should come first. The error message appears to be occurring due to “am==1” being filtered out by the group_by and summarize functions.
Question #3
Define two new variables in the Teams data frame in the
pkg Lahman() package.
batting average (BA). Batting average is the ratio of hits (H) to at-bats (AB)
slugging percentage (SLG). Slugging percentage is total bases divided by at-bats (AB). To compute total bases, you get 1 for a single, 2 for a double, 3 for a triple, and 4 for a home run.
library(Lahman)
summary(Teams)
## yearID lgID teamID franchID divID
## Min. :1871 AA: 85 CHN : 146 ATL : 146 Length:2985
## 1st Qu.:1922 AL:1295 PHI : 139 CHC : 146 Class :character
## Median :1967 FL: 16 PIT : 135 CIN : 140 Mode :character
## Mean :1959 NA: 50 CIN : 132 PIT : 140
## 3rd Qu.:1997 NL:1519 SLN : 130 STL : 140
## Max. :2021 PL: 8 BOS : 121 PHI : 139
## UA: 12 (Other):2182 (Other):2134
## Rank G Ghome W
## Min. : 1.000 Min. : 6 Min. :24.00 Min. : 0.00
## 1st Qu.: 2.000 1st Qu.:154 1st Qu.:77.00 1st Qu.: 66.00
## Median : 4.000 Median :159 Median :81.00 Median : 77.00
## Mean : 4.039 Mean :150 Mean :78.05 Mean : 74.61
## 3rd Qu.: 6.000 3rd Qu.:162 3rd Qu.:81.00 3rd Qu.: 87.00
## Max. :13.000 Max. :165 Max. :84.00 Max. :116.00
## NA's :399
## L DivWin WCWin LgWin
## Min. : 4.00 Length:2985 Length:2985 Length:2985
## 1st Qu.: 65.00 Class :character Class :character Class :character
## Median : 76.00 Mode :character Mode :character Mode :character
## Mean : 74.61
## 3rd Qu.: 87.00
## Max. :134.00
##
## WSWin R AB H
## Length:2985 Min. : 24 Min. : 211 Min. : 33
## Class :character 1st Qu.: 614 1st Qu.:5135 1st Qu.:1299
## Mode :character Median : 691 Median :5402 Median :1390
## Mean : 681 Mean :5129 Mean :1339
## 3rd Qu.: 764 3rd Qu.:5519 3rd Qu.:1465
## Max. :1220 Max. :5781 Max. :1783
##
## X2B X3B HR BB
## Min. : 1.0 Min. : 0.00 Min. : 0.0 Min. : 1.0
## 1st Qu.:194.0 1st Qu.: 29.00 1st Qu.: 45.0 1st Qu.:425.8
## Median :234.0 Median : 40.00 Median :110.0 Median :494.0
## Mean :228.7 Mean : 45.67 Mean :105.9 Mean :473.6
## 3rd Qu.:272.0 3rd Qu.: 59.00 3rd Qu.:155.0 3rd Qu.:554.2
## Max. :376.0 Max. :150.00 Max. :307.0 Max. :835.0
## NA's :1
## SO SB CS HBP
## Min. : 3.0 Min. : 1.0 Min. : 3.00 Min. : 7.00
## 1st Qu.: 516.0 1st Qu.: 62.5 1st Qu.: 33.00 1st Qu.: 32.00
## Median : 761.0 Median : 93.0 Median : 44.00 Median : 43.00
## Mean : 762.1 Mean :109.4 Mean : 46.55 Mean : 45.82
## 3rd Qu.: 990.0 3rd Qu.:137.0 3rd Qu.: 56.00 3rd Qu.: 57.00
## Max. :1596.0 Max. :581.0 Max. :191.00 Max. :160.00
## NA's :16 NA's :126 NA's :832 NA's :1158
## SF RA ER ERA
## Min. : 7.00 Min. : 34 Min. : 23.0 Min. :1.220
## 1st Qu.:38.00 1st Qu.: 610 1st Qu.: 503.0 1st Qu.:3.370
## Median :44.00 Median : 689 Median : 594.0 Median :3.840
## Mean :44.11 Mean : 681 Mean : 573.4 Mean :3.841
## 3rd Qu.:50.00 3rd Qu.: 766 3rd Qu.: 671.0 3rd Qu.:4.330
## Max. :77.00 Max. :1252 Max. :1023.0 Max. :8.000
## NA's :1541
## CG SHO SV IPouts
## Min. : 0.00 Min. : 0.000 Min. : 0.00 Min. : 162
## 1st Qu.: 9.00 1st Qu.: 6.000 1st Qu.:10.00 1st Qu.:4080
## Median : 41.00 Median : 9.000 Median :25.00 Median :4252
## Mean : 47.55 Mean : 9.588 Mean :24.42 Mean :4013
## 3rd Qu.: 76.00 3rd Qu.:12.000 3rd Qu.:39.00 3rd Qu.:4341
## Max. :148.00 Max. :32.000 Max. :68.00 Max. :4518
##
## HA HRA BBA SOA
## Min. : 49 Min. : 0.0 Min. : 1.0 Min. : 0.0
## 1st Qu.:1287 1st Qu.: 51.0 1st Qu.:429.0 1st Qu.: 511.0
## Median :1389 Median :113.0 Median :495.0 Median : 762.0
## Mean :1339 Mean :105.9 Mean :473.7 Mean : 761.6
## 3rd Qu.:1468 3rd Qu.:153.0 3rd Qu.:554.0 3rd Qu.: 997.0
## Max. :1993 Max. :305.0 Max. :827.0 Max. :1687.0
##
## E DP FP name
## Min. : 20.0 Min. : 0.0 Min. :0.7610 Length:2985
## 1st Qu.:111.0 1st Qu.:116.0 1st Qu.:0.9660 Class :character
## Median :141.0 Median :140.0 Median :0.9770 Mode :character
## Mean :180.8 Mean :132.6 Mean :0.9664
## 3rd Qu.:207.0 3rd Qu.:157.0 3rd Qu.:0.9810
## Max. :639.0 Max. :217.0 Max. :0.9910
##
## park attendance BPF PPF
## Length:2985 Min. : 0 Min. : 60.0 Min. : 60.0
## Class :character 1st Qu.: 538461 1st Qu.: 97.0 1st Qu.: 97.0
## Mode :character Median :1190886 Median :100.0 Median :100.0
## Mean :1376599 Mean :100.2 Mean :100.2
## 3rd Qu.:2066598 3rd Qu.:103.0 3rd Qu.:103.0
## Max. :4483350 Max. :129.0 Max. :141.0
## NA's :279
## teamIDBR teamIDlahman45 teamIDretro
## Length:2985 Length:2985 Length:2985
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
Teams$BA=Teams$H/Teams$AB
Teams$SLG=(Teams$H+2*Teams$X2B+3*Teams$X3B+4*Teams$HR)/Teams$AB
summary(Teams)
## yearID lgID teamID franchID divID
## Min. :1871 AA: 85 CHN : 146 ATL : 146 Length:2985
## 1st Qu.:1922 AL:1295 PHI : 139 CHC : 146 Class :character
## Median :1967 FL: 16 PIT : 135 CIN : 140 Mode :character
## Mean :1959 NA: 50 CIN : 132 PIT : 140
## 3rd Qu.:1997 NL:1519 SLN : 130 STL : 140
## Max. :2021 PL: 8 BOS : 121 PHI : 139
## UA: 12 (Other):2182 (Other):2134
## Rank G Ghome W
## Min. : 1.000 Min. : 6 Min. :24.00 Min. : 0.00
## 1st Qu.: 2.000 1st Qu.:154 1st Qu.:77.00 1st Qu.: 66.00
## Median : 4.000 Median :159 Median :81.00 Median : 77.00
## Mean : 4.039 Mean :150 Mean :78.05 Mean : 74.61
## 3rd Qu.: 6.000 3rd Qu.:162 3rd Qu.:81.00 3rd Qu.: 87.00
## Max. :13.000 Max. :165 Max. :84.00 Max. :116.00
## NA's :399
## L DivWin WCWin LgWin
## Min. : 4.00 Length:2985 Length:2985 Length:2985
## 1st Qu.: 65.00 Class :character Class :character Class :character
## Median : 76.00 Mode :character Mode :character Mode :character
## Mean : 74.61
## 3rd Qu.: 87.00
## Max. :134.00
##
## WSWin R AB H
## Length:2985 Min. : 24 Min. : 211 Min. : 33
## Class :character 1st Qu.: 614 1st Qu.:5135 1st Qu.:1299
## Mode :character Median : 691 Median :5402 Median :1390
## Mean : 681 Mean :5129 Mean :1339
## 3rd Qu.: 764 3rd Qu.:5519 3rd Qu.:1465
## Max. :1220 Max. :5781 Max. :1783
##
## X2B X3B HR BB
## Min. : 1.0 Min. : 0.00 Min. : 0.0 Min. : 1.0
## 1st Qu.:194.0 1st Qu.: 29.00 1st Qu.: 45.0 1st Qu.:425.8
## Median :234.0 Median : 40.00 Median :110.0 Median :494.0
## Mean :228.7 Mean : 45.67 Mean :105.9 Mean :473.6
## 3rd Qu.:272.0 3rd Qu.: 59.00 3rd Qu.:155.0 3rd Qu.:554.2
## Max. :376.0 Max. :150.00 Max. :307.0 Max. :835.0
## NA's :1
## SO SB CS HBP
## Min. : 3.0 Min. : 1.0 Min. : 3.00 Min. : 7.00
## 1st Qu.: 516.0 1st Qu.: 62.5 1st Qu.: 33.00 1st Qu.: 32.00
## Median : 761.0 Median : 93.0 Median : 44.00 Median : 43.00
## Mean : 762.1 Mean :109.4 Mean : 46.55 Mean : 45.82
## 3rd Qu.: 990.0 3rd Qu.:137.0 3rd Qu.: 56.00 3rd Qu.: 57.00
## Max. :1596.0 Max. :581.0 Max. :191.00 Max. :160.00
## NA's :16 NA's :126 NA's :832 NA's :1158
## SF RA ER ERA
## Min. : 7.00 Min. : 34 Min. : 23.0 Min. :1.220
## 1st Qu.:38.00 1st Qu.: 610 1st Qu.: 503.0 1st Qu.:3.370
## Median :44.00 Median : 689 Median : 594.0 Median :3.840
## Mean :44.11 Mean : 681 Mean : 573.4 Mean :3.841
## 3rd Qu.:50.00 3rd Qu.: 766 3rd Qu.: 671.0 3rd Qu.:4.330
## Max. :77.00 Max. :1252 Max. :1023.0 Max. :8.000
## NA's :1541
## CG SHO SV IPouts
## Min. : 0.00 Min. : 0.000 Min. : 0.00 Min. : 162
## 1st Qu.: 9.00 1st Qu.: 6.000 1st Qu.:10.00 1st Qu.:4080
## Median : 41.00 Median : 9.000 Median :25.00 Median :4252
## Mean : 47.55 Mean : 9.588 Mean :24.42 Mean :4013
## 3rd Qu.: 76.00 3rd Qu.:12.000 3rd Qu.:39.00 3rd Qu.:4341
## Max. :148.00 Max. :32.000 Max. :68.00 Max. :4518
##
## HA HRA BBA SOA
## Min. : 49 Min. : 0.0 Min. : 1.0 Min. : 0.0
## 1st Qu.:1287 1st Qu.: 51.0 1st Qu.:429.0 1st Qu.: 511.0
## Median :1389 Median :113.0 Median :495.0 Median : 762.0
## Mean :1339 Mean :105.9 Mean :473.7 Mean : 761.6
## 3rd Qu.:1468 3rd Qu.:153.0 3rd Qu.:554.0 3rd Qu.: 997.0
## Max. :1993 Max. :305.0 Max. :827.0 Max. :1687.0
##
## E DP FP name
## Min. : 20.0 Min. : 0.0 Min. :0.7610 Length:2985
## 1st Qu.:111.0 1st Qu.:116.0 1st Qu.:0.9660 Class :character
## Median :141.0 Median :140.0 Median :0.9770 Mode :character
## Mean :180.8 Mean :132.6 Mean :0.9664
## 3rd Qu.:207.0 3rd Qu.:157.0 3rd Qu.:0.9810
## Max. :639.0 Max. :217.0 Max. :0.9910
##
## park attendance BPF PPF
## Length:2985 Min. : 0 Min. : 60.0 Min. : 60.0
## Class :character 1st Qu.: 538461 1st Qu.: 97.0 1st Qu.: 97.0
## Mode :character Median :1190886 Median :100.0 Median :100.0
## Mean :1376599 Mean :100.2 Mean :100.2
## 3rd Qu.:2066598 3rd Qu.:103.0 3rd Qu.:103.0
## Max. :4483350 Max. :129.0 Max. :141.0
## NA's :279
## teamIDBR teamIDlahman45 teamIDretro BA
## Length:2985 Length:2985 Length:2985 Min. :0.1564
## Class :character Class :character Class :character 1st Qu.:0.2494
## Mode :character Mode :character Mode :character Median :0.2600
## Mean :0.2607
## 3rd Qu.:0.2708
## Max. :0.3498
##
## SLG
## Min. :0.1659
## 1st Qu.:0.4192
## Median :0.4596
## Mean :0.4561
## 3rd Qu.:0.4950
## Max. :0.6093
##
head(select(Teams,BA,SLG))
## BA SLG
## 1 0.3104956 0.5021866
## 2 0.2700669 0.4431438
## 3 0.2765599 0.4603710
## 4 0.2386059 0.3324397
## 5 0.2870370 0.3960114
## 6 0.3200625 0.5144418
Question #4
Using the Teams data frame in the
pkg Lahman() package. display the top-5 teams ranked in
terms of slugging percentage (SLG) in Major League Baseball history.
Repeat this using teams since 1969. Slugging percentage is total bases
divided by at-bats.To compute total bases, you get 1 for a single, 2 for
a double, 3 for a triple, and 4 for a home run.
library(Lahman)
summary(Teams)
## yearID lgID teamID franchID divID
## Min. :1871 AA: 85 CHN : 146 ATL : 146 Length:2985
## 1st Qu.:1922 AL:1295 PHI : 139 CHC : 146 Class :character
## Median :1967 FL: 16 PIT : 135 CIN : 140 Mode :character
## Mean :1959 NA: 50 CIN : 132 PIT : 140
## 3rd Qu.:1997 NL:1519 SLN : 130 STL : 140
## Max. :2021 PL: 8 BOS : 121 PHI : 139
## UA: 12 (Other):2182 (Other):2134
## Rank G Ghome W
## Min. : 1.000 Min. : 6 Min. :24.00 Min. : 0.00
## 1st Qu.: 2.000 1st Qu.:154 1st Qu.:77.00 1st Qu.: 66.00
## Median : 4.000 Median :159 Median :81.00 Median : 77.00
## Mean : 4.039 Mean :150 Mean :78.05 Mean : 74.61
## 3rd Qu.: 6.000 3rd Qu.:162 3rd Qu.:81.00 3rd Qu.: 87.00
## Max. :13.000 Max. :165 Max. :84.00 Max. :116.00
## NA's :399
## L DivWin WCWin LgWin
## Min. : 4.00 Length:2985 Length:2985 Length:2985
## 1st Qu.: 65.00 Class :character Class :character Class :character
## Median : 76.00 Mode :character Mode :character Mode :character
## Mean : 74.61
## 3rd Qu.: 87.00
## Max. :134.00
##
## WSWin R AB H
## Length:2985 Min. : 24 Min. : 211 Min. : 33
## Class :character 1st Qu.: 614 1st Qu.:5135 1st Qu.:1299
## Mode :character Median : 691 Median :5402 Median :1390
## Mean : 681 Mean :5129 Mean :1339
## 3rd Qu.: 764 3rd Qu.:5519 3rd Qu.:1465
## Max. :1220 Max. :5781 Max. :1783
##
## X2B X3B HR BB
## Min. : 1.0 Min. : 0.00 Min. : 0.0 Min. : 1.0
## 1st Qu.:194.0 1st Qu.: 29.00 1st Qu.: 45.0 1st Qu.:425.8
## Median :234.0 Median : 40.00 Median :110.0 Median :494.0
## Mean :228.7 Mean : 45.67 Mean :105.9 Mean :473.6
## 3rd Qu.:272.0 3rd Qu.: 59.00 3rd Qu.:155.0 3rd Qu.:554.2
## Max. :376.0 Max. :150.00 Max. :307.0 Max. :835.0
## NA's :1
## SO SB CS HBP
## Min. : 3.0 Min. : 1.0 Min. : 3.00 Min. : 7.00
## 1st Qu.: 516.0 1st Qu.: 62.5 1st Qu.: 33.00 1st Qu.: 32.00
## Median : 761.0 Median : 93.0 Median : 44.00 Median : 43.00
## Mean : 762.1 Mean :109.4 Mean : 46.55 Mean : 45.82
## 3rd Qu.: 990.0 3rd Qu.:137.0 3rd Qu.: 56.00 3rd Qu.: 57.00
## Max. :1596.0 Max. :581.0 Max. :191.00 Max. :160.00
## NA's :16 NA's :126 NA's :832 NA's :1158
## SF RA ER ERA
## Min. : 7.00 Min. : 34 Min. : 23.0 Min. :1.220
## 1st Qu.:38.00 1st Qu.: 610 1st Qu.: 503.0 1st Qu.:3.370
## Median :44.00 Median : 689 Median : 594.0 Median :3.840
## Mean :44.11 Mean : 681 Mean : 573.4 Mean :3.841
## 3rd Qu.:50.00 3rd Qu.: 766 3rd Qu.: 671.0 3rd Qu.:4.330
## Max. :77.00 Max. :1252 Max. :1023.0 Max. :8.000
## NA's :1541
## CG SHO SV IPouts
## Min. : 0.00 Min. : 0.000 Min. : 0.00 Min. : 162
## 1st Qu.: 9.00 1st Qu.: 6.000 1st Qu.:10.00 1st Qu.:4080
## Median : 41.00 Median : 9.000 Median :25.00 Median :4252
## Mean : 47.55 Mean : 9.588 Mean :24.42 Mean :4013
## 3rd Qu.: 76.00 3rd Qu.:12.000 3rd Qu.:39.00 3rd Qu.:4341
## Max. :148.00 Max. :32.000 Max. :68.00 Max. :4518
##
## HA HRA BBA SOA
## Min. : 49 Min. : 0.0 Min. : 1.0 Min. : 0.0
## 1st Qu.:1287 1st Qu.: 51.0 1st Qu.:429.0 1st Qu.: 511.0
## Median :1389 Median :113.0 Median :495.0 Median : 762.0
## Mean :1339 Mean :105.9 Mean :473.7 Mean : 761.6
## 3rd Qu.:1468 3rd Qu.:153.0 3rd Qu.:554.0 3rd Qu.: 997.0
## Max. :1993 Max. :305.0 Max. :827.0 Max. :1687.0
##
## E DP FP name
## Min. : 20.0 Min. : 0.0 Min. :0.7610 Length:2985
## 1st Qu.:111.0 1st Qu.:116.0 1st Qu.:0.9660 Class :character
## Median :141.0 Median :140.0 Median :0.9770 Mode :character
## Mean :180.8 Mean :132.6 Mean :0.9664
## 3rd Qu.:207.0 3rd Qu.:157.0 3rd Qu.:0.9810
## Max. :639.0 Max. :217.0 Max. :0.9910
##
## park attendance BPF PPF
## Length:2985 Min. : 0 Min. : 60.0 Min. : 60.0
## Class :character 1st Qu.: 538461 1st Qu.: 97.0 1st Qu.: 97.0
## Mode :character Median :1190886 Median :100.0 Median :100.0
## Mean :1376599 Mean :100.2 Mean :100.2
## 3rd Qu.:2066598 3rd Qu.:103.0 3rd Qu.:103.0
## Max. :4483350 Max. :129.0 Max. :141.0
## NA's :279
## teamIDBR teamIDlahman45 teamIDretro BA
## Length:2985 Length:2985 Length:2985 Min. :0.1564
## Class :character Class :character Class :character 1st Qu.:0.2494
## Mode :character Mode :character Mode :character Median :0.2600
## Mean :0.2607
## 3rd Qu.:0.2708
## Max. :0.3498
##
## SLG
## Min. :0.1659
## 1st Qu.:0.4192
## Median :0.4596
## Mean :0.4561
## 3rd Qu.:0.4950
## Max. :0.6093
##
#All Teams
Teams %>%
group_by(teamID)%>%
arrange(desc(SLG))%>%
head(5)
## # A tibble: 5 × 50
## # Groups: teamID [5]
## yearID lgID teamID franchID divID Rank G Ghome W L DivWin WCWin
## <int> <fct> <fct> <fct> <chr> <int> <int> <int> <int> <int> <chr> <chr>
## 1 2019 AL HOU HOU W 1 162 81 107 55 Y N
## 2 2019 AL MIN MIN C 1 162 81 101 61 Y N
## 3 2003 AL BOS BOS E 2 162 81 95 67 N Y
## 4 2019 AL NYA NYY E 1 162 81 103 59 Y N
## 5 2020 NL ATL ATL E 1 60 30 35 25 Y N
## # … with 38 more variables: LgWin <chr>, WSWin <chr>, R <int>, AB <int>,
## # H <int>, X2B <int>, X3B <int>, HR <int>, BB <int>, SO <int>, SB <int>,
## # CS <int>, HBP <int>, SF <int>, RA <int>, ER <int>, ERA <dbl>, CG <int>,
## # SHO <int>, SV <int>, IPouts <int>, HA <int>, HRA <int>, BBA <int>,
## # SOA <int>, E <int>, DP <int>, FP <dbl>, name <chr>, park <chr>,
## # attendance <int>, BPF <int>, PPF <int>, teamIDBR <chr>,
## # teamIDlahman45 <chr>, teamIDretro <chr>, BA <dbl>, SLG <dbl>
#Teams since 1969
Teams %>%
filter(yearID>=1969)%>%
group_by(teamID)%>%
arrange(desc(SLG))%>%
head(5)
## # A tibble: 5 × 50
## # Groups: teamID [5]
## yearID lgID teamID franchID divID Rank G Ghome W L DivWin WCWin
## <int> <fct> <fct> <fct> <chr> <int> <int> <int> <int> <int> <chr> <chr>
## 1 2019 AL HOU HOU W 1 162 81 107 55 Y N
## 2 2019 AL MIN MIN C 1 162 81 101 61 Y N
## 3 2003 AL BOS BOS E 2 162 81 95 67 N Y
## 4 2019 AL NYA NYY E 1 162 81 103 59 Y N
## 5 2020 NL ATL ATL E 1 60 30 35 25 Y N
## # … with 38 more variables: LgWin <chr>, WSWin <chr>, R <int>, AB <int>,
## # H <int>, X2B <int>, X3B <int>, HR <int>, BB <int>, SO <int>, SB <int>,
## # CS <int>, HBP <int>, SF <int>, RA <int>, ER <int>, ERA <dbl>, CG <int>,
## # SHO <int>, SV <int>, IPouts <int>, HA <int>, HRA <int>, BBA <int>,
## # SOA <int>, E <int>, DP <int>, FP <dbl>, name <chr>, park <chr>,
## # attendance <int>, BPF <int>, PPF <int>, teamIDBR <chr>,
## # teamIDlahman45 <chr>, teamIDretro <chr>, BA <dbl>, SLG <dbl>
Question #5
Use the Batting, Pitching, and
People tables in the pkg Lahman() package to
answer the following questions.
a.Name every player in baseball history who has accumulated at least 300 home runs (HR) and at least 300 stolen bases (SB). You can find the first and last name of the player in the Master data frame. Join this to your result along with the total home runs and total bases stolen for each of these elite players.
Similarly, name every pitcher in baseball history who has accumulated at least 300 wins (W) and at least 3,000 strikeouts (SO).
Identify the name and year of every player who has hit at least 50 home runs in a single season. Which player had the lowest batting average in that season? Pete Alonso has the lowest batting average in that season.
library(Lahman)
summary(Batting)
## playerID yearID stint teamID lgID
## Length:110495 Min. :1871 Min. :1.00 CHN : 5129 AA: 1893
## Class :character 1st Qu.:1938 1st Qu.:1.00 PHI : 5026 AL:50965
## Mode :character Median :1977 Median :1.00 PIT : 4984 FL: 472
## Mean :1968 Mean :1.08 SLN : 4904 NA: 737
## 3rd Qu.:2002 3rd Qu.:1.00 CIN : 4786 NL:55945
## Max. :2021 Max. :5.00 CLE : 4731 PL: 149
## (Other):80935 UA: 334
## G AB R H
## Min. : 1.00 Min. : 0.0 Min. : 0.0 Min. : 0.00
## 1st Qu.: 12.00 1st Qu.: 3.0 1st Qu.: 0.0 1st Qu.: 0.00
## Median : 34.00 Median : 45.0 Median : 4.0 Median : 8.00
## Mean : 50.61 Mean :138.6 Mean : 18.4 Mean : 36.18
## 3rd Qu.: 78.00 3rd Qu.:222.0 3rd Qu.: 26.0 3rd Qu.: 55.00
## Max. :165.00 Max. :716.0 Max. :198.0 Max. :262.00
##
## X2B X3B HR RBI
## Min. : 0.000 Min. : 0.000 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 1.000 Median : 0.000 Median : 0.00 Median : 3.00
## Mean : 6.177 Mean : 1.234 Mean : 2.86 Mean : 16.72
## 3rd Qu.: 9.000 3rd Qu.: 1.000 3rd Qu.: 2.00 3rd Qu.: 24.00
## Max. :67.000 Max. :36.000 Max. :73.00 Max. :191.00
## NA's :756
## SB CS BB SO
## Min. : 0.000 Min. : 0.000 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 1.00
## Median : 0.000 Median : 0.000 Median : 2.00 Median : 9.00
## Mean : 2.893 Mean : 1.162 Mean : 12.79 Mean : 20.62
## 3rd Qu.: 2.000 3rd Qu.: 1.000 3rd Qu.: 18.00 3rd Qu.: 29.00
## Max. :138.000 Max. :42.000 Max. :232.00 Max. :223.00
## NA's :2368 NA's :23541 NA's :2100
## IBB HBP SH SF
## Min. : 0.00 Min. : 0.000 Min. : 0.000 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.00
## Median : 0.00 Median : 0.000 Median : 0.000 Median : 0.00
## Mean : 1.04 Mean : 1.061 Mean : 2.169 Mean : 1.01
## 3rd Qu.: 1.00 3rd Qu.: 1.000 3rd Qu.: 3.000 3rd Qu.: 1.00
## Max. :120.00 Max. :51.000 Max. :67.000 Max. :19.00
## NA's :36650 NA's :2816 NA's :6068 NA's :36103
## GIDP
## Min. : 0.000
## 1st Qu.: 0.000
## Median : 0.000
## Mean : 2.875
## 3rd Qu.: 4.000
## Max. :36.000
## NA's :25441
summary(Pitching)
## playerID yearID stint teamID lgID
## Length:49430 Min. :1871 Min. :1.000 PHI : 2277 AA: 657
## Class :character 1st Qu.:1946 1st Qu.:1.000 CHN : 2225 AL:23279
## Mode :character Median :1984 Median :1.000 PIT : 2192 FL: 173
## Mean :1973 Mean :1.082 SLN : 2187 NA: 132
## 3rd Qu.:2006 3rd Qu.:1.000 CIN : 2119 NL:25035
## Max. :2021 Max. :5.000 CLE : 2096 PL: 58
## (Other):36334 UA: 96
## W L G GS
## Min. : 0.000 Min. : 0.000 Min. : 1.00 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 1.000 1st Qu.: 7.00 1st Qu.: 0.000
## Median : 2.000 Median : 3.000 Median : 21.00 Median : 2.000
## Mean : 4.504 Mean : 4.504 Mean : 23.42 Mean : 9.059
## 3rd Qu.: 7.000 3rd Qu.: 7.000 3rd Qu.: 34.00 3rd Qu.:16.000
## Max. :60.000 Max. :48.000 Max. :106.00 Max. :75.000
##
## CG SHO SV IPouts
## Min. : 0.000 Min. : 0.0000 Min. : 0.000 Min. : 0.0
## 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 46.0
## Median : 0.000 Median : 0.0000 Median : 0.000 Median : 155.0
## Mean : 2.871 Mean : 0.4104 Mean : 1.475 Mean : 242.4
## 3rd Qu.: 2.000 3rd Qu.: 0.0000 3rd Qu.: 1.000 3rd Qu.: 370.0
## Max. :75.000 Max. :16.0000 Max. :62.000 Max. :2040.0
##
## H ER HR BB
## Min. : 0.00 Min. : 0.00 Min. : 0.000 Min. : 0.00
## 1st Qu.: 17.00 1st Qu.: 9.00 1st Qu.: 1.000 1st Qu.: 7.00
## Median : 51.00 Median : 23.00 Median : 4.000 Median : 20.00
## Mean : 80.87 Mean : 34.67 Mean : 6.393 Mean : 28.61
## 3rd Qu.:125.00 3rd Qu.: 55.00 3rd Qu.: 9.000 3rd Qu.: 43.00
## Max. :772.00 Max. :291.00 Max. :50.000 Max. :289.00
##
## SO BAOpp ERA IBB
## Min. : 0.00 Min. :0.000 Min. : 0.000 Min. : 0.000
## 1st Qu.: 8.00 1st Qu.:0.238 1st Qu.: 3.150 1st Qu.: 0.000
## Median : 30.00 Median :0.265 Median : 4.160 Median : 1.000
## Mean : 45.99 Mean :0.311 Mean : 5.152 Mean : 2.205
## 3rd Qu.: 67.00 3rd Qu.:0.300 3rd Qu.: 5.580 3rd Qu.: 3.000
## Max. :513.00 Max. :9.990 Max. :189.000 Max. :23.000
## NA's :4441 NA's :97 NA's :14578
## WP HBP BK BFP
## Min. : 0.000 Min. : 0.000 Min. : 0.0000 Min. : 0.0
## 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 70.0
## Median : 1.000 Median : 1.000 Median : 0.0000 Median : 225.0
## Mean : 2.559 Mean : 2.351 Mean : 0.2896 Mean : 345.8
## 3rd Qu.: 4.000 3rd Qu.: 3.000 3rd Qu.: 0.0000 3rd Qu.: 529.0
## Max. :83.000 Max. :54.000 Max. :16.0000 Max. :2906.0
## NA's :734 NA's :3
## GF R SH SF
## Min. : 0.000 Min. : 0.00 Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 10.00 1st Qu.: 0.000 1st Qu.: 0.000
## Median : 2.000 Median : 26.00 Median : 1.000 Median : 1.000
## Mean : 6.184 Mean : 41.12 Mean : 2.595 Mean : 2.106
## 3rd Qu.: 8.000 3rd Qu.: 64.00 3rd Qu.: 4.000 3rd Qu.: 3.000
## Max. :84.000 Max. :519.00 Max. :27.000 Max. :17.000
## NA's :19187 NA's :19187
## GIDP
## Min. : 0.000
## 1st Qu.: 1.000
## Median : 3.000
## Mean : 5.664
## 3rd Qu.: 8.000
## Max. :47.000
## NA's :20318
#HR and SB
Batting%>%
group_by(playerID)%>%
summarize(Homerun=sum(HR),stolenbases=sum(SB))%>%
filter(Homerun>=300 & stolenbases >=300)%>%
inner_join(People, by=c("playerID"="playerID"))%>%
select(nameFirst,nameLast,nameGiven,Homerun,stolenbases)
## # A tibble: 8 × 5
## nameFirst nameLast nameGiven Homerun stolenbases
## <chr> <chr> <chr> <int> <int>
## 1 Carlos Beltran Carlos Ivan 435 312
## 2 Barry Bonds Barry Lamar 762 514
## 3 Bobby Bonds Bobby Lee 332 461
## 4 Andre Dawson Andre Nolan 438 314
## 5 Steve Finley Steven Allen 304 320
## 6 Willie Mays Willie Howard 660 338
## 7 Alex Rodriguez Alexander Enmanuel 696 329
## 8 Reggie Sanders Reginald Laverne 305 304
#Wins and Strikeouts
Pitching%>%
group_by(playerID)%>%
summarize(wins=sum(W),strikeouts=sum(SO))%>%
filter(wins>=300 & strikeouts >=3000)%>%
inner_join(People, by=c("playerID"="playerID"))%>%
select(nameFirst,nameLast,nameGiven,wins,strikeouts)
## # A tibble: 10 × 5
## nameFirst nameLast nameGiven wins strikeouts
## <chr> <chr> <chr> <int> <int>
## 1 Steve Carlton Steven Norman 329 4136
## 2 Roger Clemens William Roger 354 4672
## 3 Randy Johnson Randall David 303 4875
## 4 Walter Johnson Walter Perry 417 3509
## 5 Greg Maddux Gregory Alan 355 3371
## 6 Phil Niekro Philip Henry 318 3342
## 7 Gaylord Perry Gaylord Jackson 314 3534
## 8 Nolan Ryan Lynn Nolan 324 5714
## 9 Tom Seaver George Thomas 311 3640
## 10 Don Sutton Donald Howard 324 3574
#Homeruns in a single season
Batting%>%
group_by(playerID,yearID)%>%
summarize(homeruns=sum(HR),batAvg=sum(H)/sum(AB))%>%
filter(homeruns >=50)%>%
inner_join(People, by=c("playerID"="playerID"))%>%
select(nameFirst,nameLast,nameGiven,homeruns,batAvg)%>%
arrange(batAvg)
## # A tibble: 46 × 6
## # Groups: playerID [30]
## playerID nameFirst nameLast nameGiven homeruns batAvg
## <chr> <chr> <chr> <chr> <int> <dbl>
## 1 alonspe01 Pete Alonso Peter Morgan 53 0.260
## 2 bautijo02 Jose Bautista Jose Antonio 54 0.260
## 3 jonesan01 Andruw Jones Andruw Rudolf 51 0.263
## 4 marisro01 Roger Maris Roger Eugene 61 0.269
## 5 vaughgr01 Greg Vaughn Gregory Lamont 50 0.272
## 6 mcgwima01 Mark McGwire Mark David 58 0.274
## 7 fieldce01 Cecil Fielder Cecil Grant 51 0.277
## 8 mcgwima01 Mark McGwire Mark David 65 0.278
## 9 stantmi03 Giancarlo Stanton Giancarlo Cruz-Michael 59 0.281
## 10 judgeaa01 Aaron Judge Aaron James 52 0.284
## # … with 36 more rows