Data wrangling with loops

with an introduction to Quarto

Author

Sam

Introduction

I used MLB batting data from the Lahman Data Base. This shows information on both teams and players throughout the history of the MLB. This data spans all the way back to the beginning years of baseball. Each row is a season that an MLB player had. Each column tells you some aspect about that player and his performance. You can learn more about baseball at this link: https://myxaviermy.sharepoint.com/:x:/g/personal/rushs2_xavier_edu/EUQ_GBXB6CNBsxxfizKKLp0BWriMSAL3CUHN5buXNPerjw?download=1’))

#|include: FALSE
library(tidyverse) #Makes things tidy
library(skimr) # For better summary statistics
batting<- read_csv('https://myxavier-my.sharepoint.com/:x:/g/personal/rushs2_xavier_edu/EUQ_GBXB6CNBsxxfizKKLp0BWriMSAL3CUHN5buXNPerjw?download=1')

Research Question # 1

What team in the history of baseball (Post 1949) hit the most homeruns? I am genuinly curious as there have been many different runs that teams have gone one. If I had to guess, I would think the New York Yankees probably have hit the most Home Runs

Running Code

#|label: Analysis
#|
batting_1950 <- batting %>%
  filter(yearID > 1949)

batting_1950 %>% 
  ggplot(aes(x = teamID, y = HR)) +
  geom_col(fill = "navy") +  
  labs(title = "Home Runs by Team",
       x = "Team",
       y = "Home Runs")

These results again support my hypothesis that the yankees hit the most home runs There nickname is the “Bronx Bombers”. With a nickname like that, you have hit a lot of home runs. This evidence could be used by someone who is purely interested in power hitting teams. This proves the yankees are one of those teams.

Research Question # 2

Do hits have a relationship with Rbi’s. If so, do extra base hits have any impact? There definitely has to be a correlation between hits and Home Runs. I think having more xbh will help as well. The more bases you get, the more likely someone is to score.

#|label: Analysis 2

batting_1950 <- batting_1950 %>%
  mutate(xbh = `2B` + `3B` + HR)

batting_1950 %>% 
  ggplot(aes(x = H, y = RBI, color=xbh)) +
  geom_point() +
  labs(title = "Relationship between Hits and Rbi's (Colored by Extra Base HIts)",
       x="Hits",
       y= "RBI's")

This graph has an upwards slope which does confirm my suspicions that hits do impact the amount of RBI’s a batter hits. If you look closely, the color gets lighter as you go higher on the slope. This means that the more xbh you get the more RBI’s you tend to get as well

1 + 1
[1] 2

You can add options to executable code like this

[1] 4

The echo: false option disables the printing of code (only output is displayed).