Top 10 Home Run Hitters and Their Salaries


-This horizontal bar graph shows the top home run hitters from the 2015 season and how much they got paid.

-Some players were making a lot more than others for similar production.

-Note: Nolan Arenado was making the league minimum 512,500 (USD) compared to Albert Pujols’ 24,000,000 (USD).

Salary vs Batting Average


-This scatter plot is comparing batting average and salary.

-Based on the trend line, it can be inferred that high paid players were producing offensively.

-However, there are some top paid players that did not hit well. It can be assumed that some of these might be star National League pitchers. They had to bat more than American League pitchers so their batting averages are recorded because they had a significant amount of at bats.

Salary Distribution by Home Run Groups


-This is a box plot showing how salary changes as a home run range changes.

-A lot of the outliers can be pitchers, who are not expected to produce offensively.

-However, the 1-10 HR range might include some of the rare exception pitchers who happen to hit a home run, and players who are expected to hit more but didn’t due to injury or a bad season.

Total Team Salaries


-This lollipop chart looks at how much each team spent on their players.

-The LA Dodgers and NY Yankees had the highest payrolls by a significant amount, and well over 200,000,000 (USD).

-Only 4 other teams spent over 150,000,000 (USD).

New York Mets Payroll


-This is a pie chart looking at how much each player on the Mets made that season.

-Some notable players are not included on the season payroll due to being traded on to the team or called up from the Minor League in the middle of the season. This includes players such as Yoenis Cespedes, Noah Syndergaard, and Steven Matz.

-The top paid players were David Wright, Curtis Granderson, Bartolo Colon, and Daniel Murphy. They made up 56.8% of the entire payroll. They were league veterans at this time.

-Some young stars of the pitching staff, like Jacob deGrom and Matt Harvey, only made 556,875 (USD) and 614,125 (USD).

Top 20 Best Cost-Per-Hit Players


-This bar graph shows the best cost-per-hit players, meaning how cost efficient they were in terms of an offensive production metric.

-The lower the salary-per-hit is, the better. This shows that the player hit more for the amount spent on them.

-A graph like this can help show which young players are on the come up and high paid stars are not hitting well.

Team Total Runs vs Total Salary


-This is a scatter plot comparing the total runs scored for the whole season and salary total for each team.

-Teams on the top right scored the most and spent a lot. This showed that their high payroll was able to produce offense for them. This is very clear for the NY Yankees.

-Teams on the bottom left, such as the Miami Marlins, Atlanta Braves, and Tampa Bay Rays, did not spend a lot of money compared to others. This could be why they did not score much.

-Being close to the top left of the chart indicates teams that spent a lot but struggled on offense. Their star hitters did not produce or they only spent a lot on pitching.

-The bottom right indicates great value. They didn’t have to spend more to do well as a team on offense. A notable team is the Pittsburgh Pirates.

---
title: "2015 MLB Batting Stats in Relation to Salary"
output: 
  flexdashboard::flex_dashboard:
    storyboard: true
    social: menu
    source: embed
---

```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
library(duckdb)
library(DBI)
library(Lahman)
library(scales)
library(ggthemes)
knitr::opts_chunk$set(echo = TRUE)
```

### Top 10 Home Run Hitters and Their Salaries

```{r echo=FALSE}
combined_data <- left_join(
  Batting, Salaries, 
  by = c("playerID", "yearID", "teamID")
  )

combined_data <- Batting %>%
  left_join(Salaries, by = c("playerID", "yearID", "teamID")) %>%
  left_join(People %>% select(playerID, nameFirst, nameLast), by = "playerID") %>%
  mutate(name = paste(nameFirst, nameLast))

top_hr_2015 <- combined_data %>%
  filter(yearID == 2015, !is.na(salary)) %>%
  group_by(name) %>%
  summarise(
    total_HR = sum(HR, na.rm = TRUE),
    salary = sum(salary, na.rm = TRUE)
  ) %>%
  slice_max(total_HR, n = 10) %>%
  mutate(label = paste0(name, " (", total_HR, ")"))

ggplot(top_hr_2015, aes(x = reorder(label, total_HR), y = salary)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  scale_y_continuous(labels = label_comma()) +
  labs(
    title = "Top 10 Home Run Hitters and Their Salaries (2015)",
    x = "Player (Home Runs)",
    y = "Salary (USD)"
  ) +
  theme_minimal()
```

***
-This horizontal bar graph shows the top home run hitters from the 2015 season and how much they got paid.

-Some players were making a lot more than others for similar production.

-Note: Nolan Arenado was making the league minimum 512,500 (USD) compared to Albert Pujols' 24,000,000 (USD).


### Salary vs Batting Average

```{r echo=FALSE}
combined_data <- left_join(
  Batting, Salaries, 
  by = c("playerID", "yearID", "teamID")
  )

combined_data <- Batting %>%
  left_join(Salaries, by = c("playerID", "yearID", "teamID")) %>%
  left_join(People %>% select(playerID, nameFirst, nameLast), by = "playerID") %>%
  mutate(name = paste(nameFirst, nameLast))

batting_salary_avg_2015 <- combined_data %>%
  filter(yearID == 2015, !is.na(salary), AB > 50) %>%
  group_by(name) %>%
  summarise(
    total_AB = sum(AB, na.rm = TRUE),
    total_H = sum(H, na.rm = TRUE),
    salary = sum(salary, na.rm = TRUE)
  ) %>%
  mutate(AVG = total_H / total_AB) %>%
  filter(!is.na(AVG))

ggplot(batting_salary_avg_2015, aes(x = salary, y = AVG)) +
  geom_point(alpha = 0.6, color = "darkblue") +
  geom_smooth(method = "lm", se = TRUE, color = "red") +  
  scale_x_continuous(labels = label_comma()) +
  labs(
    title = "Salary vs Batting Average (2015)",
    x = "Salary (USD)",
    y = "Batting Average (AVG)"
  ) +
  theme_minimal()
```

***
-This scatter plot is comparing batting average and salary.

-Based on the trend line, it can be inferred that high paid players were producing offensively.

-However, there are some top paid players that did not hit well. It can be assumed that some of these might be star National League pitchers. They had to bat more than American League pitchers so their batting averages are recorded because they had a significant amount of at bats.

### Salary Distribution by Home Run Groups

```{r echo=FALSE}
combined_data <- left_join(
  Batting, Salaries, 
  by = c("playerID", "yearID", "teamID")
  )

combined_data <- Batting %>%
  left_join(Salaries, by = c("playerID", "yearID", "teamID")) %>%
  left_join(People %>% select(playerID, nameFirst, nameLast), by = "playerID") %>%
  mutate(name = paste(nameFirst, nameLast))

hr_salary_2015 <- combined_data %>%
  filter(yearID == 2015, !is.na(salary)) %>%
  group_by(name) %>%
  summarise(
    total_HR = sum(HR, na.rm = TRUE),
    salary = sum(salary, na.rm = TRUE)
  ) %>%
  mutate(
    HR_group = case_when(
      total_HR == 0 ~ "0 HR",
      total_HR <= 10 ~ "1-10 HR",
      total_HR <= 20 ~ "11-20 HR",
      total_HR <= 30 ~ "21-30 HR",
      TRUE ~ "31+ HR"
    )
  )

ggplot(hr_salary_2015, aes(x = HR_group, y = salary)) +
  geom_boxplot(fill = "lightblue") +
  scale_y_continuous(labels = label_comma()) +
  labs(
    title = "Salary Distribution by Home Run Groups (2015)",
    x = "Home Run Group",
    y = "Salary (USD)"
  ) +
  theme_minimal()
```

***
-This is a box plot showing how salary changes as a home run range changes. 

-A lot of the outliers can be pitchers, who are not expected to produce offensively.

-However, the 1-10 HR range might include some of the rare exception pitchers who happen to hit a home run, and players who are expected to hit more but didn't due to injury or a bad season.

### Total Team Salaries

```{r echo=FALSE}
combined_data <- Batting %>%
  left_join(Salaries, by = c("playerID", "yearID", "teamID")) %>%
  left_join(People %>% select(playerID, nameFirst, nameLast), by = "playerID") %>%
  mutate(name = paste(nameFirst, nameLast))

team_salary_2015 <- combined_data %>%
  filter(yearID == 2015, !is.na(salary)) %>%
  group_by(teamID) %>%
  summarise(total_salary = sum(salary, na.rm = TRUE)) %>%
  mutate(teamID = case_when(
    teamID == "LAN" ~ "LAD",
    teamID == "NYA" ~ "NYY",
    teamID == "SFN" ~ "SFG",
    teamID == "WAS" ~ "WSH",
    teamID == "SLN" ~ "STL",
    teamID == "SDN" ~ "SDP",
    teamID == "CHN" ~ "CHC",
    teamID == "CHA" ~ "CHW",
    teamID == "KCA" ~ "KCR",
    teamID == "NYN" ~ "NYM",
    teamID == "TBA" ~ "TBR",
    TRUE ~ teamID
  ))

ggplot(team_salary_2015, aes(x = reorder(teamID, total_salary), y = total_salary)) +
  geom_segment(aes(x = reorder(teamID, total_salary), xend = reorder(teamID, total_salary),
                   y = 0, yend = total_salary), color = "grey") +
  geom_point(color = "darkgreen", size = 3) +
  coord_flip() +
  scale_y_continuous(labels = scales::label_comma()) +
  labs(
    title = "Total Team Salaries (2015)",
    x = "Team",
    y = "Total Salary (USD)"
  ) +
  theme_minimal()
```

***
-This lollipop chart looks at how much each team spent on their players.

-The LA Dodgers and NY Yankees had the highest payrolls by a significant amount, and well over 200,000,000 (USD).

-Only 4 other teams spent over 150,000,000 (USD).

### New York Mets Payroll

```{r echo=FALSE}
combined_data <- left_join(
  Batting, Salaries, 
  by = c("playerID", "yearID", "teamID")
  )

combined_data <- Batting %>%
  left_join(Salaries, by = c("playerID", "yearID", "teamID")) %>%
  left_join(People %>% select(playerID, nameFirst, nameLast), by = "playerID") %>%
  mutate(name = paste(nameFirst, nameLast))

mets_2015 <- combined_data %>%
  filter(yearID == 2015, teamID == "NYN", !is.na(salary)) %>%
  group_by(name) %>%
  summarise(salary = sum(salary, na.rm = TRUE)) %>%
  arrange(desc(salary)) %>%
  mutate(
    percent = salary / sum(salary),
    label = paste0(name, "\n$", comma(salary), " (", percent(percent), ")")
  )

total_salary <- sum(mets_2015$salary)

ggplot(mets_2015, aes(x = "", y = salary, fill = label)) +
  geom_col(width = 1) +
  coord_polar(theta = "y") +
  labs(
    title = paste("New York Mets Payroll (2015)\nTotal: $", comma(total_salary), sep = ""),
    fill = "Player (Salary & % of Payroll)"
  ) +
  theme_void() +
  theme(
    legend.text = element_text(size = 8),
    plot.title = element_text(hjust = 0.5)
  )
```

***
-This is a pie chart looking at how much each player on the Mets made that season.

-Some notable players are not included on the season payroll due to being traded on to the team or called up from the Minor League in the middle of the season. This includes players such as Yoenis Cespedes, Noah Syndergaard, and Steven Matz.

-The top paid players were David Wright, Curtis Granderson, Bartolo Colon, and Daniel Murphy. They made up 56.8% of the entire payroll. They were league veterans at this time. 

-Some young stars of the pitching staff, like Jacob deGrom and Matt Harvey, only made 556,875 (USD) and 614,125 (USD).


### Top 20 Best Cost-Per-Hit Players

```{r echo=FALSE}
combined_data <- left_join(
  Batting, Salaries, 
  by = c("playerID", "yearID", "teamID")
  )

combined_data <- Batting %>%
  left_join(Salaries, by = c("playerID", "yearID", "teamID")) %>%
  left_join(People %>% select(playerID, nameFirst, nameLast), by = "playerID") %>%
  mutate(name = paste(nameFirst, nameLast))

value_hit <- combined_data %>%
  filter(yearID == 2015, !is.na(salary), H > 0) %>%
  group_by(name) %>%
  summarise(
    total_salary = sum(salary, na.rm = TRUE),
    total_hits = sum(H, na.rm = TRUE),
    salary_per_hit = total_salary / total_hits
  ) %>%
  arrange(salary_per_hit) %>%
  slice_min(salary_per_hit, n = 20)

ggplot(value_hit, aes(x = reorder(name, salary_per_hit), y = salary_per_hit)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  scale_y_continuous(labels = dollar_format()) +
  labs(
    title = "Top 20 Best Cost-Per-Hit Players (2015)",
    x = "Player",
    y = "Salary per Hit (USD)"
  ) +
  theme_minimal()
```

***
-This bar graph shows the best cost-per-hit players, meaning how cost efficient they were in terms of an offensive production metric.

-The lower the salary-per-hit is, the better. This shows that the player hit more for the amount spent on them.

-A graph like this can help show which young players are on the come up and high paid stars are not hitting well. 

### Team Total Runs vs Total Salary

```{r echo=FALSE}
combined_data <- left_join(
  Batting, Salaries, 
  by = c("playerID", "yearID", "teamID")
  )

combined_data <- Batting %>%
  left_join(Salaries, by = c("playerID", "yearID", "teamID")) %>%
  left_join(People %>% select(playerID, nameFirst, nameLast), by = "playerID") %>%
  mutate(name = paste(nameFirst, nameLast))

team_summary_2015 <- combined_data %>%
  filter(yearID == 2015, !is.na(salary)) %>%
  group_by(teamID) %>%
  summarise(
    total_salary = sum(salary, na.rm = TRUE),
    total_runs = sum(R, na.rm = TRUE)
  ) %>%

  mutate(teamID = recode(teamID,
                         "LAN" = "LAD",
                         "NYA" = "NYY",
                         "SFN" = "SFG",
                         "WAS" = "WSH",
                         "SLN" = "STL",
                         "SDN" = "SDP",
                         "CHN" = "CHC",
                         "CHA" = "CHW",
                         "KCA" = "KCR",
                         "NYN" = "NYM",
                         "TBA" = "TBR"
  ))

ggplot(team_summary_2015, aes(x = total_runs, y = total_salary)) +
  geom_point(color = "steelblue", size = 4, alpha = 0.7) +
  geom_text(aes(label = teamID), vjust = -1, size = 3) +  
  scale_y_continuous(labels = dollar_format()) +
  labs(
    title = "Team Total Runs vs Total Salary (2015)",
    x = "Total Runs Scored",
    y = "Total Salary (USD)"
  ) +
  theme_minimal()
```

***
-This is a scatter plot comparing the total runs scored for the whole season and salary total for each team. 

-Teams on the top right scored the most and spent a lot. This showed that their high payroll was able to produce offense for them. This is very clear for the NY Yankees. 

-Teams on the bottom left, such as the Miami Marlins, Atlanta Braves, and Tampa Bay Rays, did not spend a lot of money compared to others. This could be why they did not score much. 

-Being close to the top left of the chart indicates teams that spent a lot but struggled on offense. Their star hitters did not produce or they only spent a lot on pitching.

-The bottom right indicates great value. They didn't have to spend more to do well as a team on offense. A notable team is the Pittsburgh Pirates.