Reading the data from the Web.
mlb <- read.csv("MLB_cleaned.csv")
head(mlb)
## First.Name Last.Name Team Position Height.inches. Weight.pounds. Age
## 1 Jeff Mathis ANA Catcher 72 180 23.92
## 2 Mike Napoli ANA Catcher 72 205 25.33
## 3 Jose Molina ANA Catcher 74 220 31.74
## 4 Howie Kendrick ANA First Baseman 70 180 23.64
## 5 Kendry Morales ANA First Baseman 73 220 23.70
## 6 Casey Kotchman ANA First Baseman 75 210 24.02
knitr::kable(summary(mlb))
| First.Name | Last.Name | Team | Position | Height.inches. | Weight.pounds. | Age | |
|---|---|---|---|---|---|---|---|
| Length:1034 | Length:1034 | Length:1034 | Length:1034 | Min. :67.0 | Min. :150.0 | Min. :20.90 | |
| Class :character | Class :character | Class :character | Class :character | 1st Qu.:72.0 | 1st Qu.:187.0 | 1st Qu.:25.44 | |
| Mode :character | Mode :character | Mode :character | Mode :character | Median :74.0 | Median :200.0 | Median :27.93 | |
| NA | NA | NA | NA | Mean :73.7 | Mean :201.7 | Mean :28.74 | |
| NA | NA | NA | NA | 3rd Qu.:75.0 | 3rd Qu.:215.0 | 3rd Qu.:31.23 | |
| NA | NA | NA | NA | Max. :83.0 | Max. :290.0 | Max. :48.52 |
plotly::plot_ly(mlb, x = ~Height.inches., type = "histogram")
The majority of the players appear to be around 72-76 inches, as seen by the histogram detailing the frequency of players at each height(in). The two shortest players, represented by the leftmost bar on the graph, are 67in. tall, while the tallest player is 83in. tall.
Number of players:
playerCount <- mlb$Team %>% table %>% data.frame
colnames(playerCount) <- c("Team", "Number.Players")
playerCount
## Team Number.Players
## 1 ANA 35
## 2 ARZ 28
## 3 ATL 37
## 4 BAL 35
## 5 BOS 36
## 6 CHC 36
## 7 CIN 36
## 8 CLE 35
## 9 COL 35
## 10 CWS 33
## 11 DET 37
## 12 FLA 32
## 13 HOU 34
## 14 KC 35
## 15 LA 33
## 16 MIN 33
## 17 MLW 35
## 18 NYM 38
## 19 NYY 32
## 20 OAK 37
## 21 PHI 36
## 22 PIT 35
## 23 SD 33
## 24 SEA 34
## 25 SF 34
## 26 STL 32
## 27 TB 33
## 28 TEX 35
## 29 TOR 34
## 30 WAS 36
plotly::plot_ly(playerCount, x = ~Team, y = ~Number.Players, type = "bar")
Texas has 35 players in total. Compared to the other teams, this appears to towards the middle of the distribution.
Youngest player:
TEX <- mlb %>% subset(Team == "TEX")
knitr::kable(head(TEX <- TEX %>% arrange(Age), n=5))
| First.Name | Last.Name | Team | Position | Height.inches. | Weight.pounds. | Age |
|---|---|---|---|---|---|---|
| Joaquin | Arias | TEX | Shortstop | 73 | 160 | 22.44 |
| Brandon | McCarthy | TEX | Starting Pitcher | 79 | 190 | 23.65 |
| Edinson | Volquez | TEX | Starting Pitcher | 73 | 190 | 23.66 |
| Scott | Feldman | TEX | Relief Pitcher | 77 | 210 | 24.06 |
| Wes | Littleton | TEX | Relief Pitcher | 74 | 210 | 24.49 |
plotly::plot_ly(TEX, x = ~Last.Name, y = ~Age, type = "bar")
TEX %>% slice_head(n = 1) %>% knitr::kable()
| First.Name | Last.Name | Team | Position | Height.inches. | Weight.pounds. | Age |
|---|---|---|---|---|---|---|
| Joaquin | Arias | TEX | Shortstop | 73 | 160 | 22.44 |
The youngest player is Joaquin Arias, who is 22.44 years old.
Players per position:
positions <- TEX$Position %>% table %>% data.frame
colnames(positions) <- c("Position", "Number.Players")
positions
## Position Number.Players
## 1 Catcher 4
## 2 Designated Hitter 1
## 3 First Baseman 1
## 4 Outfielder 6
## 5 Relief Pitcher 11
## 6 Second Baseman 1
## 7 Shortstop 2
## 8 Starting Pitcher 8
## 9 Third Baseman 1
plotly::plot_ly(positions, x = ~Position, y = ~Number.Players, type = "bar")
Age distribution of players:
# Age frequency table and bar graph
ageFrequency <- TEX$Age %>% floor %>% table %>% data.frame
colnames(ageFrequency) <- c("Age", "Number.Players")
ageFrequency
## Age Number.Players
## 1 22 1
## 2 23 2
## 3 24 6
## 4 25 3
## 5 26 6
## 6 27 3
## 7 29 4
## 8 30 2
## 9 31 1
## 10 32 3
## 11 35 2
## 12 38 1
## 13 39 1
plotly::plot_ly(ageFrequency, x = ~Age, y = ~Number.Players, type = "bar")
#Age box plot that displays min and max
ageDistribution <- floor(TEX$Age)
plotly::plot_ly(y = ageDistribution, type = "box")
#Age range
ageDistribution %>% min
## [1] 22
ageDistribution %>% max
## [1] 39
The age range is 22-39 years old.
Height and weight analysis across different positions:
library(ggplot2)
# Average height and weight of each position for Team TEX
physique <-
TEX %>%
select(Position, Height.inches., Weight.pounds.) %>%
group_by(Position) %>%
summarize_all(mean)
physique
## # A tibble: 9 x 3
## Position Height.inches. Weight.pounds.
## <chr> <dbl> <dbl>
## 1 Catcher 74.2 204.
## 2 Designated Hitter 77 250
## 3 First Baseman 75 220
## 4 Outfielder 72.3 201.
## 5 Relief Pitcher 74.5 204.
## 6 Second Baseman 72 175
## 7 Shortstop 73 175
## 8 Starting Pitcher 74.9 204
## 9 Third Baseman 73 200
physiquePlot <- ggplot(physique, aes(x = Height.inches., y = Weight.pounds., color = Position)) + geom_point()
physiquePlot
# Average height and weight of each position across MLB
physique2 <-
mlb %>%
select(Position, Height.inches., Weight.pounds.) %>%
group_by(Position) %>%
summarize_all(mean)
physique2
## # A tibble: 9 x 3
## Position Height.inches. Weight.pounds.
## <chr> <dbl> <dbl>
## 1 Catcher 72.7 204.
## 2 Designated Hitter 74.2 221.
## 3 First Baseman 74 213.
## 4 Outfielder 73.0 199.
## 5 Relief Pitcher 74.4 204.
## 6 Second Baseman 71.4 184.
## 7 Shortstop 71.9 183.
## 8 Starting Pitcher 74.7 205.
## 9 Third Baseman 73.0 201.
physiquePlot2 <- ggplot(physique2, aes(x = Height.inches., y = Weight.pounds., color = Position)) + geom_point()
physiquePlot2
Analysis: The average height and weight of various positions were plotted for each position of Team TEX. However, since this was not enough data for an accurate analysis, averages were also taken for each position in the entire league and compared to that of TEX. The results varied, but both plots reveal that designated hitters are on average taller and heavier, while second baseman and shortstop were on the lower ends of the distribution. Outfielder and third basemen had similar average weights of around 200 in TEX, while catcher, relief pitcher, and starting pitcher positions also had similar average weights around 204. Positions varied less in height, and everyone in TEX was within 72 to 77 range. All in all, however, comparing the height and weight averages for positions in team TEX is not enough to make general correlation between position and physique.