Introduction to Baseball Statistics

In the data set that I have scraped, I will be looking at Major League Baseball players and their hitting statistics from the 2022 season. In my data set, I will be conducting analysis through hitters that are qualified for the stat WAR. Through analysis I will try to find correlations between different statistical data and through players and team. I want to find out if there’s relationships between key statistics like WAR, Walk %, Strike Out % and more individual and in relation to the teams

I retrieved this data from the website FanGraphs (https://www.fangraphs.com/leaders.aspx?pos=all&stats=bat&lg=all&qual=y&type=8&season=2022&month=0&season1=2022&ind=0&page=1_50). FanGraphs is one of the many websites that take a deep look into advanced baseball statistics and charts.

What is the purpose of baseball statistics? In today’s game of baseball, analytics drives decision making and roster moves within front offices. Teams are basing decisions more off data to try to get an edge over other teams. This is helping teams save money and get more bang for their buck within the team. Data in baseball has created new strategies within the game and will continue in the future, as front offices evolve.

For players who walk frequency, do we see more home runs from them?

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

HR’s to Walking

There are always discussions about walks and strikeouts in relation to HRs. In the graph we see the relation between home runs and walks. Most of the players who hit 20-40 home runs in the 2022 season fall into the 5-10% walk percentage Players who are trying to hit lots of home runs tend to be aggressive during their at bats. This means they are less likely to walk. For players who have 40 or more Hrs, we see on increase in walk percentage. The factors that go into this are how many times they get intentionally walked and if pitchers give them pitches to hit A pitcher might rather walk a hitter instead of the chances of a home run.

Teams who walk more run to championships?

Getting on base is the key to score runs. If you get on base, you have a higher chance of scoring runs. In this chart, I filtered the average layer walk percentage by team to see if the good teams walk a lot. In fact, good teams do have a high walk percentage. In the top 5, 4 of the teams made the playoffs. Arizona was the only team in the top 5 not to make the playoffs.

Home Runs to Strikeouts

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

## $x
## [1] "Strike Out Percentage (in %)"
## 
## $y
## [1] "Home Runs"
## 
## $title
## [1] "Strikeout Percentage in Comparison to Home Runs"
## 
## attr(,"class")
## [1] "labels"

Strikeouts in Major League Baseball are increasing. Some say this is due to a couple different factors. First, pitchers are getting more talented. Pitchers are throwing harder and have lots of movement on their pitches. Second, Hitters, are trying to hit more home runs. Based on the graph, isn’t a correlation between players who hit lots/few home runs and strikeouts. Shown on the scatter plot, everyone is striking out at the same rates, regardless of how many home runs you hit. Although we see the two leaders in home runs are in the upper quartile for strikeout percentage, around 26 and 29% respectively

Are the best players on the best teams?

## # A tibble: 10 × 2
##    Team  `Average Team WAR`
##    <chr>              <dbl>
##  1 STL                 6.7 
##  2 LAD                 4.34
##  3 HOU                 4.33
##  4 NYM                 4.29
##  5 CLE                 4.18
##  6 SDP                 3.96
##  7 TEX                 3.88
##  8 NYY                 3.78
##  9 TOR                 3.67
## 10 ATL                 3.34

WAR is a stat that defines a players value to the team. It stands for Wins Above Replacement. Fangraphs definition of WAR states that, “summarize(s) a player’s total contributions to their team in one statistic.” To calculate WAR, you total the players Batting Runs + Base Running Runs +Fielding Runs + Positional Adjustment + League Adjustment +Replacement Runs and divide it by the teams run per win. As it is a very confusing stat to understand, it makes a statistic which shows how important each player is to the team

For this table, I wanted to calculate the average WAR on teams to see if the best players are playing on the best teams. On the graph, it shows the top 10 teams in player WAR. The best players are playing on the best teams. 9 of 10 teams on the table made the playoffs, including the World Series winning Houston Astros. The Team who’s players average the highest WAR play for the St. Louis Cardinals.

Does a players offense help his defense?

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

As the results show, there are a lot of bad fielders in the MLB. Around half of the players on this data set have a negative WAR in fielding. But there could be a small correlation that offense can help defense. For players who have an offensive WAR over 30, 53% of them have a positive defensive WAR. Although this is a small sample size, there still could be a correlation