The following synopsis of information analyzes and explores the unlike qualities of the Procrastination Fantasy Point system. Its intention is to find player correlations that can benefit selection and predictability of players.
Step 1 is to create a point’s column representative of our leagues unique characteristics. Not all characteristics can easily be identifiable due to non-specific information at this time. However, hopefully future investigation and exploration will help lead to further resources and tools to more accurately reflect our league’s specific characteristics. This information may also, in the future, help to visualize baseball trends, characteristics, and future point changes in a productive and data driven direction.
In order to use our dataset we must first retrieve useable data. I set the following parameter to ensure a beginning place that is representative of our league. This required limiting data from after 1905, from only the “AL” and “NL” Divisions, and retrieving data from players who, pitched, batted, or fielded in at least one game. Player Positions were also turned into a categorical factor to enable further investigation based on playing position in the future.
The Dead Ball Era (1901 - 1920)
World War 2 (1941 - 1945)
Segregation Era (1901 - 1947ish)
Post-War Era/Yankees Era (1945 - late 50s/early 60s)
Westward Expansion (1953 - 1961)
Dead Ball 2 (The Sixties, roughly)
Designated Hitter Era (1973 - current, AL only)
Free Agency/Arbitration Era (1975 - current)
Steroid Era (unknown, but late 80s - 2005 seems likely)
Wild Card Era (1994 - current)
# Batting Data Filtering
BatTotal <- Batting %>% filter(yearID > 1905, lgID == "AL" | lgID == "NL", G > 0) %>%
replace(., is.na(.), 0)
# Pitching Data Filtering
PitTotal <- Pitching %>%
filter(yearID > 1905, lgID == "AL" | lgID == "NL", G > 0) %>%
replace(., is.na(.), 0)
# Fielding Data Filtering
FieldTotal <- Fielding %>%
filter(yearID > 1905, lgID == "AL" | lgID == "NL", G > 0) %>%
replace(., is.na(.), 0)
# Fielding Position Facotring
FieldTotal$POS <- factor(FieldTotal$POS, levels=c("C", "1B", "2B", "3B", "P", "SS", "OF"))
# Master Data Filtering
MasterTotal <- Master
Create Fantasy Batting Point Totals
Runs Scored (R) = 1
Home Runs (HR) = 2
Total Bases (TB) = 1
Runs Batted In (RBI) = 1
Walks (BB) = 1
Intentional Walks (IBB) = 1
Strikeouts (K) = -1
Stolen Bases (SB) = 1
Caught Stealing (CS) = -1
Hitting for the Cycle (CYC) = 20 – not included
Grand Slam Home Runs (GSHR) = 4 – not included
Errors (E) = -1 – not included – (In Fielding Dataset - included in Combined)
### Batting points are calculated in the following manner:
PointsB = R + RBI + BB + IBB - SO + SB - CS + (H - X2B - X3B - HR) + (2 * X2B) + (3 * X3B) + (4 * HR) + (HR 2)
### Batting Average (AVG) has also been calculated and added to the dataset for further use. AVG = ( H / AB )
### Slugging Percentage (SLG) has also been calculated and added to the dataset for further use.
SLG = ( ( (H - X2B - X3B - HR) + (2 X2B) + (3 * X3B) + (4 * HR) ) / AB)
### On Base Percentage (OBP) has also been calculated and added to the dataset for further use.
OBP = (Hits + Walks + Hit by Pitch) / (At Bats + Walks + Hit by Pitch + Sacrifice Flies)
### On-Base Plus Slugging (OPS) has also been calculated and added to the dataset for further use.
OPS = ( (H+BB+HBP) / (AB+BB+SF+HBP)) + (((1 × B) + (2 × D) + (3 × T) + (4 × HR)) / AB)
### Isolated Power (ISO) has also been calculated and added to the dataset for further use.
ISO = SLG - AVG
# Create Fantasy Batting Point Totals
# Runs Scored (R) = 1
# Home Runs (HR) = 2
# Total Bases (TB) = 1
# Runs Batted In (RBI) = 1
# Walks (BB) = 1
# Intentional Walks (IBB) = 1
# Strikeouts (K) = -1
# Stolen Bases (SB) = 1
# Caught Stealing (CS) = -1
# Hitting for the Cycle (CYC) = 20 -- not included
# Grand Slam Home Runs (GSHR) = 4 -- not included
# Errors (E) = -1 -- not included (In Fielding Dataset - included in Combined)
# IBB count as BB???
# adding OPS:
#Formula:
#OPS = ( (H+BB+HBP) / (AB+BB+SF+HBP)) + (((1 × B) + (2 × D) + (3 × T) + (4 × HR)) / AB)
#Where,
#OPS = On-Base Plus Slugging,
#H = Hits,
#BB = Walks,
#HBP = Hits By Pitch,
#AB = At Bats,
#SF = Sacrifice Flies
#B = Singles,
#D = Doubles,
#T = Triples,
#HR = Homeruns,
# calculating Batting Points based on current Procrastination League Rules
# also includes adding a players Batting AVG rounded to 3 decimal points
BatTotal <- BatTotal %>%
mutate(PointsB = (R + RBI + BB + IBB - SO + SB - CS + (H - X2B - X3B - HR) + (2 * X2B) + (3 * X3B) + (4 * HR) + (HR *2) ) ) %>%
mutate(AVG = round( (H/AB), 3) ) %>%
mutate(SLG = round( ( (H - X2B - X3B - HR) + (2 * X2B) + (3 * X3B) + (4 * HR) ), 3) ) %>%
mutate(OBP = round( ( (H+BB+HBP) / (AB+BB+SF+HBP) ), 3) ) %>%
mutate(OPS = round( ( (H + BB + HBP) / (AB + BB + SF + HBP) ) + ( ( (1 * (H - X2B - X3B - HR) ) + (2 * X2B) + (3 * X3B) + (4 * HR) ) / AB), 3) ) %>%
mutate(ISO = round( (SLG - AVG), 3) )
# Joining the Batting statistics to the Master dataset for use in Batting specific Calculations
BatTotalCol <- BatTotal %>%
left_join(MasterTotal, by = c("playerID")) %>%
select(nameFirst, nameLast, playerID, yearID, teamID, lgID, PointsB, AVG, SLG, OBP, OPS, ISO) %>%
arrange(-PointsB)
#Displaying the Batting Dataset sorted by Descending Points
#head(BatTotalCol)
BatTotalCol
Create Fantasy Pitching Point Totals
Innings Pitched (IP) = 1
Hits Allowed (H) = -1
Earned Runs (ER) = -2
Walks Issued (BB) = -1
Hit Batsmen (HB) = -1
Strikeouts (K) = 1
Wild Pitches (WP) = -1
Balks (B) = -1
Pick Offs (PKO) = 2 – not included
Quality Starts (QS) = 5 – not included
Complete Games (CG) = 10
Shutouts (SO) = 5
No Hitters (NH) = 15 – not included
Perfect Games (PG) = 20 – not included
Wins (W) = 5
Losses (L) = -5
Saves (SV) = 5
Blown Saves (BS) = -5 – not included
### Pitching points are calculated in the following manner:
PointsP = (W * 5) - (L * 5) + (SV * 5) + (CG * 10) - WP - BK + SO - HBP - BB - IBB - (ER * 2) - H + (IPouts/3) + (SHO * 5)
### Earned Run Average (ERA) has also been calculated and added to the dataset for further use. ERA = (ER / (IPouts/3) ) *9
### WHIP has also been calculated and added to the dataset for further use. WHIP = (Number of hits allowed + Number of walks allowed) / Number of Innings Pitched
### Fielding Independant Pitching (FIP) has also been calculated and added to the dataset for further use. FIP = ((13HR)+(3(BB+HBP))-(2*K))/IP + FIP Constant (about 3.10)
maybe one day I can add in the accurate FIP Constant
# Create Fantasy Pitching Point Totals
# Innings Pitched (IP) = 1
# Hits Allowed (H) = -1
# Earned Runs (ER) = -2
# Walks Issued (BB) = -1
# Hit Batsmen (HB) = -1
# Strikeouts (K) = 1
# Wild Pitches (WP) = -1
# Balks (B) = -1
# Pick Offs (PKO) = 2 -- not included
# Quality Starts (QS) = 5 -- not included
# Complete Games (CG) = 10
# Shutouts (SO) = 5
# No Hitters (NH) = 15 -- not included
# Perfect Games (PG) = 20 -- not included
# Wins (W) = 5
# Losses (L) = -5
# Saves (SV) = 5
# Blown Saves (BS) = -5 -- not included
# calculating Pitching Points based on current Procrastination League Rules
# also includes adding a players Pitching ERA rounded to 2 decimal points
PitTotal <- PitTotal %>%
mutate(PointsP = ((W * 5) - (L * 5) + (SV * 5) + (CG * 10) - WP - BK + SO - HBP - BB - IBB - (ER * 2) - H + (IPouts/3) + (SHO * 5) )) %>%
mutate(ERA = round( (ER/(IPouts/3))*9, 2) ) %>%
mutate(WHIP = round( (H + BB) / (IPouts/3), 2)) %>%
mutate(FIP = round( ( ( (13 * HR)+(3 * (BB + HBP) ) - (2 * SO) )/(IPouts/3) + 3.10 ), 2) )
# Joining the Pitching statistics to the Master dataset for use in Pitching specific Calculations
PitTotalCol <- PitTotal %>%
left_join(MasterTotal, by = c("playerID")) %>%
select(nameFirst, nameLast, playerID, yearID, teamID, lgID, PointsP, ERA, WHIP, FIP) %>%
arrange(-PointsP)
#Displaying the Pitching Dataset sorted by Descending Points
#head(PitTotalCol)
PitTotalCol
Create Fantasy Fielding Point Totals
Error (IP) = -1
Errors effect Player’s batting statistics. However this information is stored in a seperate dataset and has to be combined seperately. For now I have point totals calculated in each field (Batting, Pitching, and Fielding) and are then will be added together to get a players total for the year.
# calculating Fielding Points based on current Procrastination League Rules
FieldTotal <- FieldTotal %>%
mutate(PointsF = (-E))
Create One Complete Dataset to calculate player’s total contributions in one year
This was completed by first joining all 4 datasets together into one file. Then Creating a new variable called “Points” that added together a players Batting, Pitching, and Fielding totals for that specific year.
The following table will give a synopsis of the complete dataset for future reference.
# calculating FULL Points based on current Procrastination League Rules
# also includes Combinign all datasets point totals into one variable
PointsCombine <- MasterTotal %>%
left_join(BatTotal, by = c("playerID")) %>%
left_join(PitTotal, by = c("playerID", "yearID")) %>%
left_join(FieldTotal, by = c("playerID", "yearID"))
ColNA <- c(32:53,57:58,60:84,90:101)
PointsCombine[ , ColNA][is.na(PointsCombine[ , ColNA] ) ] = 0
PointsCombine <- PointsCombine %>%
mutate(Points = (PointsB + PointsP + PointsF)) %>%
filter(playerID != "NA") %>%
arrange(-Points) %>%
mutate(yearIDEra = PointsCombine$yearID)
#Creates Era Breaks
EraLabel <- c("Dead Ball Era (1901-1919)", "Live Ball Era (1920-1941)", "Integration Era (1942-1960)", "Expansion Era (1961-1976)", "Free Agency Era (1977-1993)", "Long Ball/Steroid Era (1994-2005)", "Post Steroid Era (2006-Current)" )
EraBreaks <- c(1905,1919,1941,1960,1976,1993,2005,2018)
PointsCombine$yearIDEra <- cut(PointsCombine$yearIDEra, breaks = EraBreaks, labels = EraLabel)
summary(PointsCombine)
## playerID birthYear birthMonth birthDay
## Length:152737 Min. :1820 Min. : 1.000 Min. : 1.00
## Class :character 1st Qu.:1919 1st Qu.: 4.000 1st Qu.: 8.00
## Mode :character Median :1951 Median : 7.000 Median :16.00
## Mean :1944 Mean : 6.637 Mean :15.77
## 3rd Qu.:1971 3rd Qu.:10.000 3rd Qu.:23.00
## Max. :1996 Max. :12.000 Max. :31.00
## NA's :132 NA's :304 NA's :451
## birthCountry birthState birthCity deathYear
## Length:152737 Length:152737 Length:152737 Min. :1872
## Class :character Class :character Class :character 1st Qu.:1962
## Mode :character Mode :character Mode :character Median :1981
## Mean :1979
## 3rd Qu.:2001
## Max. :2017
## NA's :96981
## deathMonth deathDay deathCountry deathState
## Min. : 1.00 Min. : 1.0 Length:152737 Length:152737
## 1st Qu.: 3.00 1st Qu.: 8.0 Class :character Class :character
## Median : 7.00 Median :15.0 Mode :character Mode :character
## Mean : 6.54 Mean :15.5
## 3rd Qu.:10.00 3rd Qu.:23.0
## Max. :12.00 Max. :31.0
## NA's :96982 NA's :96984
## deathCity nameFirst nameLast
## Length:152737 Length:152737 Length:152737
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## nameGiven weight height bats
## Length:152737 Min. : 65.0 Min. :43.00 B :12934
## Class :character 1st Qu.:175.0 1st Qu.:71.00 L :40315
## Mode :character Median :185.0 Median :73.00 R :98209
## Mean :188.6 Mean :72.58 NA's: 1279
## 3rd Qu.:200.0 3rd Qu.:74.00
## Max. :320.0 Max. :83.00
## NA's :1013 NA's :863
## throws debut finalGame retroID
## L : 28136 Length:152737 Length:152737 Length:152737
## R :123612 Class :character Class :character Class :character
## S : 9 Mode :character Mode :character Mode :character
## NA's: 980
##
##
##
## bbrefID deathDate birthDate
## Length:152737 Min. :1872-03-17 Min. :1820-04-17
## Class :character 1st Qu.:1962-11-29 1st Qu.:1920-02-26
## Mode :character Median :1981-10-17 Median :1951-02-07
## Mean :1979-05-21 Mean :1944-04-22
## 3rd Qu.:2001-04-01 3rd Qu.:1971-03-12
## Max. :2017-02-19 Max. :1996-08-12
## NA's :96984 NA's :451
## yearID stint.x teamID.x lgID.x
## Min. :1906 Min. :1.000 SLN : 7164 NL :75725
## 1st Qu.:1951 1st Qu.:1.000 CLE : 7057 AL :74323
## Median :1980 Median :1.000 NYA : 6923 AA : 0
## Mean :1974 Mean :1.179 CHN : 6909 FL : 0
## 3rd Qu.:2001 3rd Qu.:1.000 PHI : 6907 NA : 0
## Max. :2016 Max. :4.000 (Other):115088 (Other): 0
## NA's :2689 NA's :2689 NA's : 2689 NA's : 2689
## G.x AB R.x H.x
## Min. : 1.00 Min. : 0.0 Min. : 0.00 Min. : 0.00
## 1st Qu.: 13.00 1st Qu.: 3.0 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 34.00 Median : 45.0 Median : 4.00 Median : 8.00
## Mean : 50.87 Mean :132.6 Mean : 16.91 Mean : 34.45
## 3rd Qu.: 80.00 3rd Qu.:208.0 3rd Qu.: 24.00 3rd Qu.: 52.00
## Max. :165.00 Max. :716.0 Max. :177.00 Max. :262.00
## NA's :2689
## X2B X3B HR.x RBI
## Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.00
## 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.00
## Median : 1.000 Median : 0.000 Median : 0.000 Median : 3.00
## Mean : 5.971 Mean : 1.052 Mean : 2.784 Mean : 15.59
## 3rd Qu.: 9.000 3rd Qu.: 1.000 3rd Qu.: 2.000 3rd Qu.: 22.00
## Max. :67.000 Max. :36.000 Max. :73.000 Max. :191.00
##
## SB.x CS.x BB.x SO.x
## Min. : 0.000 Min. : 0.000 Min. : 0.00 Min. : 0.0
## 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.0
## Median : 0.000 Median : 0.000 Median : 3.00 Median : 8.0
## Mean : 2.337 Mean : 1.014 Mean : 12.47 Mean : 19.5
## 3rd Qu.: 2.000 3rd Qu.: 1.000 3rd Qu.: 18.00 3rd Qu.: 28.0
## Max. :130.000 Max. :42.000 Max. :232.00 Max. :223.0
##
## IBB.x HBP.x SH.x SF.x
## Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. : 0.0000
## 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.0000
## Median : 0.0000 Median : 0.000 Median : 0.000 Median : 0.0000
## Mean : 0.7416 Mean : 0.939 Mean : 2.081 Mean : 0.7549
## 3rd Qu.: 0.0000 3rd Qu.: 1.000 3rd Qu.: 3.000 3rd Qu.: 1.0000
## Max. :120.0000 Max. :50.000 Max. :67.000 Max. :19.0000
##
## GIDP.x PointsB AVG SLG
## Min. : 0.000 Min. :-52.00 Min. :0.0000 Min. : 0.00
## 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.:0.0000 1st Qu.: 0.00
## Median : 0.000 Median : 13.00 Median :0.2080 Median : 11.00
## Mean : 2.449 Mean : 83.99 Mean :0.1727 Mean : 50.88
## 3rd Qu.: 3.000 3rd Qu.:118.00 3rd Qu.:0.2640 3rd Qu.: 74.00
## Max. :36.000 Max. :991.00 Max. :1.0000 Max. :457.00
##
## OBP OPS ISO stint.y
## Min. :0.0000 Min. :0.0000 Min. : 0.00 Min. :1.00
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.: 0.00 1st Qu.:1.00
## Median :0.2680 Median :0.5470 Median : 10.81 Median :1.00
## Mean :0.2212 Mean :0.4626 Mean : 50.71 Mean :1.25
## 3rd Qu.:0.3320 3rd Qu.:0.7070 3rd Qu.: 73.76 3rd Qu.:1.00
## Max. :1.0000 Max. :5.0000 Max. :456.62 Max. :4.00
## NA's :89349
## teamID.y lgID.y W L
## SLN : 3074 NL :32046 Min. : 0.0 Min. : 0.00
## PHI : 3043 AL :31342 1st Qu.: 0.0 1st Qu.: 0.00
## NYA : 3007 AA : 0 Median : 0.0 Median : 0.00
## CLE : 2956 FL : 0 Mean : 1.5 Mean : 1.57
## BOS : 2904 NA : 0 3rd Qu.: 1.0 3rd Qu.: 1.00
## (Other):48404 (Other): 0 Max. :40.0 Max. :27.00
## NA's :89349 NA's :89349
## G.y GS.x CG SHO
## Min. : 1.00 Min. : 0.000 Min. : 0.0000 Min. : 0.0000
## 1st Qu.: 6.00 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median : 17.00 Median : 0.000 Median : 0.0000 Median : 0.0000
## Mean : 21.01 Mean : 3.032 Mean : 0.7411 Mean : 0.1331
## 3rd Qu.: 32.00 3rd Qu.: 0.000 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :106.00 Max. :49.000 Max. :42.0000 Max. :16.0000
## NA's :89349
## SV IPouts H.y ER
## Min. : 0.0000 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.0000 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 0.0000 Median : 0.00 Median : 0.00 Median : 0.00
## Mean : 0.5497 Mean : 82.89 Mean : 27.83 Mean : 12.24
## 3rd Qu.: 0.0000 3rd Qu.: 81.00 3rd Qu.: 29.00 3rd Qu.: 14.00
## Max. :62.0000 Max. :1392.00 Max. :381.00 Max. :186.00
##
## HR.y BB.y SO.y BAOpp
## Min. : 0.000 Min. : 0.00 Min. : 0.00 Min. :0.0000
## 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:0.0000
## Median : 0.000 Median : 0.00 Median : 0.00 Median :0.0000
## Mean : 2.369 Mean : 10.17 Mean : 15.87 Mean :0.1101
## 3rd Qu.: 2.000 3rd Qu.: 12.00 3rd Qu.: 15.00 3rd Qu.:0.2500
## Max. :50.000 Max. :208.00 Max. :383.00 Max. :1.0000
##
## ERA IBB.y WP.x HBP.y
## Min. :0.00 Min. : 0.0000 Min. : 0.0000 Min. : 0.0000
## 1st Qu.:0.00 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median :0.00 Median : 0.0000 Median : 0.0000 Median : 0.0000
## Mean : Inf Mean : 0.6333 Mean : 0.8325 Mean : 0.7614
## 3rd Qu.:3.84 3rd Qu.: 0.0000 3rd Qu.: 1.0000 3rd Qu.: 1.0000
## Max. : Inf Max. :23.0000 Max. :27.0000 Max. :26.0000
##
## BK BFP GF R.y
## Min. : 0.0000 Min. : 0.0 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.0000 1st Qu.: 0.0 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 0.0000 Median : 0.0 Median : 0.00 Median : 0.00
## Mean : 0.1112 Mean : 118.4 Mean : 2.46 Mean : 13.93
## 3rd Qu.: 0.0000 3rd Qu.: 119.0 3rd Qu.: 2.00 3rd Qu.: 16.00
## Max. :16.0000 Max. :1799.0 Max. :84.00 Max. :205.00
##
## SH.y SF.y GIDP.y PointsP
## Min. : 0.0000 Min. : 0.0000 Min. : 0.0000 Min. :-387.00
## 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: -12.67
## Median : 0.0000 Median : 0.0000 Median : 0.0000 Median : 0.00
## Mean : 0.2231 Mean : 0.2002 Mean : 0.5452 Mean : -10.83
## 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.: 0.00
## Max. :21.0000 Max. :14.0000 Max. :40.0000 Max. : 797.00
##
## WHIP FIP stint teamID
## Min. :0.00 Min. :-2.90 Min. :1.000 SLN : 7102
## 1st Qu.:0.00 1st Qu.: 0.00 1st Qu.:1.000 CLE : 6979
## Median :0.00 Median : 0.00 Median :1.000 PHI : 6904
## Mean : Inf Mean : Inf Mean :1.183 CHN : 6854
## 3rd Qu.:1.38 3rd Qu.: 4.16 3rd Qu.:1.000 NYA : 6831
## Max. : Inf Max. : Inf Max. :4.000 (Other):114215
## NA's :3852 NA's : 3852
## lgID POS G GS.y
## NL :75184 P :61491 Min. : 1.00 Min. : 0.00
## AL :73701 OF :27052 1st Qu.: 5.00 1st Qu.: 0.00
## AA : 0 3B :13680 Median : 18.00 Median : 1.00
## FL : 0 1B :13386 Mean : 32.86 Mean : 16.27
## NA : 0 2B :12239 3rd Qu.: 42.00 3rd Qu.: 15.00
## (Other): 0 (Other):21037 Max. :165.00 Max. :164.00
## NA's : 3852 NA's : 3852 NA's :3852
## InnOuts PO A E
## Min. : 0.0 Min. : 0.00 Min. : 0.00 Min. : 0.000
## 1st Qu.: 0.0 1st Qu.: 1.00 1st Qu.: 1.00 1st Qu.: 0.000
## Median : 69.0 Median : 6.00 Median : 6.00 Median : 1.000
## Mean : 438.2 Mean : 69.87 Mean : 29.32 Mean : 2.465
## 3rd Qu.: 362.0 3rd Qu.: 49.00 3rd Qu.: 20.00 3rd Qu.: 2.000
## Max. :4469.0 Max. :1846.00 Max. :641.00 Max. :80.000
##
## DP PB WP.y SB.y
## Min. : 0.000 Min. : 0.0000 Min. : 0.0000 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.000
## Median : 1.000 Median : 0.0000 Median : 0.0000 Median : 0.000
## Mean : 6.355 Mean : 0.2103 Mean : 0.3426 Mean : 1.063
## 3rd Qu.: 3.000 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.: 0.000
## Max. :194.000 Max. :35.0000 Max. :69.0000 Max. :155.000
##
## CS.y ZR PointsF Points
## Min. : 0.0000 Min. : 0.00000 Min. :-80.000 Min. :-389.00
## 1st Qu.: 0.0000 1st Qu.: 0.00000 1st Qu.: -2.000 1st Qu.: -12.33
## Median : 0.0000 Median : 0.00000 Median : -1.000 Median : 18.00
## Mean : 0.5105 Mean : 0.02036 Mean : -2.465 Mean : 70.70
## 3rd Qu.: 0.0000 3rd Qu.: 0.00000 3rd Qu.: 0.000 3rd Qu.: 125.00
## Max. :89.0000 Max. :15.00000 Max. : 0.000 Max. : 971.00
##
## yearIDEra
## Free Agency Era (1977-1993) :27768
## Post Steroid Era (2006-Current) :26424
## Long Ball/Steroid Era (1994-2005):26331
## Expansion Era (1961-1976) :22241
## Integration Era (1942-1960) :18174
## (Other) :29110
## NA's : 2689
# Joining the FULL statistics to the Master dataset for specific Calculations
PointsCol <- PointsCombine %>%
select(nameFirst, nameLast, yearID, teamID, lgID, Points, POS, AVG, OPS, ERA, yearIDEra) %>%
arrange(-Points)
#Displaying the FULL Dataset sorted by Descending Points
print("The following table gives a summary of the reduced table that will be frequently used")
## [1] "The following table gives a summary of the reduced table that will be frequently used"
summary(PointsCol)
## nameFirst nameLast yearID teamID
## Length:152737 Length:152737 Min. :1906 SLN : 7102
## Class :character Class :character 1st Qu.:1951 CLE : 6979
## Mode :character Mode :character Median :1980 PHI : 6904
## Mean :1974 CHN : 6854
## 3rd Qu.:2001 NYA : 6831
## Max. :2016 (Other):114215
## NA's :2689 NA's : 3852
## lgID Points POS AVG
## NL :75184 Min. :-389.00 P :61491 Min. :0.0000
## AL :73701 1st Qu.: -12.33 OF :27052 1st Qu.:0.0000
## AA : 0 Median : 18.00 3B :13680 Median :0.2080
## FL : 0 Mean : 70.70 1B :13386 Mean :0.1727
## NA : 0 3rd Qu.: 125.00 2B :12239 3rd Qu.:0.2640
## (Other): 0 Max. : 971.00 (Other):21037 Max. :1.0000
## NA's : 3852 NA's : 3852
## OPS ERA yearIDEra
## Min. :0.0000 Min. :0.00 Free Agency Era (1977-1993) :27768
## 1st Qu.:0.0000 1st Qu.:0.00 Post Steroid Era (2006-Current) :26424
## Median :0.5470 Median :0.00 Long Ball/Steroid Era (1994-2005):26331
## Mean :0.4626 Mean : Inf Expansion Era (1961-1976) :22241
## 3rd Qu.:0.7070 3rd Qu.:3.84 Integration Era (1942-1960) :18174
## Max. :5.0000 Max. : Inf (Other) :29110
## NA's : 2689
print("The following table allows you to investigate the entire dataset sorted by point totals in decending order.")
## [1] "The following table allows you to investigate the entire dataset sorted by point totals in decending order."
PointsCol
Exploring the data and trends
First step is to create year’s average player. This helps to create a baseline for a player’s performance level comparative to the league average, Team average, and Position average. This allows us to compare differences between Teams, Players, Era, and Leagues. Haven’t you always wanted to know which league was faster? Or hits more homeruns?
TeamCol <- c(ColNA,102)
PlayerAVGlg <- PointsCombine %>%
group_by(lgID) %>%
filter(POS != "lgID") %>%
summarise_at(vars(TeamCol), funs(mean(., na.rm=TRUE)))
print("Average Player by League")
## [1] "Average Player by League"
PlayerAVGlg
PlayerAVGteam <- PointsCombine %>%
group_by(teamID) %>%
filter(teamID != "NA") %>%
summarise_at(vars(TeamCol), funs(mean(., na.rm=TRUE)))
print("Average Player by Team")
## [1] "Average Player by Team"
PlayerAVGteam
PlayerAVGpos <- PointsCombine %>%
group_by(POS) %>%
filter(POS != "NA") %>%
summarise_at(vars(TeamCol), funs(mean(., na.rm=TRUE)))
print("Average Player by Position")
## [1] "Average Player by Position"
PlayerAVGpos
PlayerAVGYear <- PointsCombine %>%
group_by(yearID) %>%
filter(yearID != "NA") %>%
summarise_at(vars(TeamCol), funs(mean(., na.rm=TRUE)))
print("Average Player by Year")
## [1] "Average Player by Year"
PlayerAVGYear
PlayerAVGEra <- PointsCombine %>%
group_by(yearIDEra) %>%
filter(yearIDEra != "NA") %>%
summarise_at(vars(TeamCol), funs(mean(., na.rm=TRUE)))
print("Average Player by Era")
## [1] "Average Player by Era"
PlayerAVGEra
Exploring the Batting Points and trends connected exclusively with the Procrastination League Characteristics.
In this section will briefly explore some characteristics of the Batting Dataset. This will compare only statistics from hitters (however if will included Pitchers that batted)
ggplot(data = BatTotal, aes(x = PointsB), colour = barlines, fill = barfill) +
geom_histogram(aes(fill = ..count..), bins = 40) +
labs(title = "Batting Points") +
scale_fill_gradient("Count", low = "Orange", high = "Blue")
## Points by Team StackedBar
ggplot(BatTotal, aes(PointsB, fill = teamID)) +
geom_bar() +
labs(title = "Batting Points by Team", x = "Team ID", y = "Count") +
coord_cartesian(ylim=c(-5,150))
## Pect. points by Team
ggplot(data = BatTotal, aes(PointsB, fill = teamID)) +
geom_bar(position="fill")+
labs(title = "Points by Team", x = "Points", y = "Percentage")
ggplot(BatTotal, aes(x = yearID, y = PointsB)) +
geom_point(aes(color = teamID)) +
scale_x_continuous("Year")+
scale_y_continuous("Batting Points") +
theme_bw() + labs(title="Batting Points by Year Scatterplot") + facet_wrap( ~ lgID) +
geom_smooth(method = "lm", aes(x = yearID, y = PointsB))
# to remove NA and Below 0 AVGs
BatTotalAvg <- BatTotal %>%
filter(AVG != "NA" & AVG > 0)
ggplot(BatTotalAvg, aes(x = AVG, y = PointsB)) +
geom_point(aes(color = teamID)) +
scale_x_continuous("Batting AVG")+
scale_y_continuous("Points") +
theme_bw() + labs(title="Batting Points by AVG Scatterplot") + facet_wrap( ~ lgID) +
geom_smooth(method = "lm", aes(x = AVG, y = PointsB))
#avg and total Number of HR by year
SumYrHR <- BatTotal %>%
group_by(yearID) %>%
filter(HR != "NA" & HR > 0) %>%
summarise(HRsum = sum(HR), HRavg = mean(HR) ) %>%
select(yearID, HRavg, HRsum)
#avg and total Number of SB by year
SumYrSB <- BatTotal %>%
group_by(yearID) %>%
filter(SB != "NA" & SB > 0) %>%
summarise(SBsum = sum(SB), SBavg = mean(SB) ) %>%
select(yearID, SBavg, SBsum)
#avg and total Number of SO by year
SumYrSO <- BatTotal %>%
group_by(yearID) %>%
filter(SO != "NA" & SO > 0) %>%
summarise(SOsum = sum(SO), SOavg = mean(SO) ) %>%
select(yearID, SOavg, SOsum)
#needs to remove lines that contain NA not replace with 0...
#avg of AVG by year
SumYrAVG <- BatTotal %>%
group_by(yearID) %>%
filter(AVG != "NA" & AVG > 0 & AVG != "Inf") %>%
summarise(AVGavg = mean(AVG) ) %>%
select(yearID, AVGavg)
# Combine SUm and Avg Datasets together
SumYr <- SumYrHR %>%
left_join(SumYrSB, by = c("yearID")) %>%
left_join(SumYrSO, by = c("yearID")) %>%
left_join(SumYrAVG, by = c("yearID"))
ggplot(SumYr, aes(x = yearID, y = HRsum)) +
geom_point() +
scale_x_continuous("Year")+
scale_y_continuous("Sum of HR") +
theme_bw() + labs(title="Sum of Homeruns by Year") +
geom_smooth(method = "lm", aes(x = yearID, y = HRsum))
ggplot(SumYr, aes(x = yearID, y = HRavg)) +
geom_point() +
scale_x_continuous("Year")+
scale_y_continuous("Avg Numberof HR") +
theme_bw() + labs(title="Avg Number of Homeruns by Year") +
geom_smooth(method = "lm", aes(x = yearID, y = HRavg))
ggplot(SumYr, aes(x = yearID, y = SBsum)) +
geom_point() +
scale_x_continuous("Year")+
scale_y_continuous("Sum of SB") +
theme_bw() + labs(title="Sum of Stolen Bases by Year") +
geom_smooth(method = "lm", aes(x = yearID, y = SBsum))
ggplot(SumYr, aes(x = yearID, y = SBavg)) +
geom_point() +
scale_x_continuous("Year")+
scale_y_continuous("Avg Numberof SB") +
theme_bw() + labs(title="Avg Number of Stolen Bases by Year") +
geom_smooth(method = "lm", aes(x = yearID, y = SBavg))
ggplot(SumYr, aes(x = yearID, y = SOsum)) +
geom_point() +
scale_x_continuous("Year")+
scale_y_continuous("Sum of SO") +
theme_bw() + labs(title="Sum of Strike Outs by Year") +
geom_smooth(method = "lm", aes(x = yearID, y = SOsum))
## Warning: Removed 4 rows containing non-finite values (stat_smooth).
## Warning: Removed 4 rows containing missing values (geom_point).
ggplot(SumYr, aes(x = yearID, y = SOavg)) +
geom_point() +
scale_x_continuous("Year")+
scale_y_continuous("Avg Numberof SO") +
theme_bw() + labs(title="Avg Number of Strike Outs by Year") +
geom_smooth(method = "lm", aes(x = yearID, y = SOavg))
## Warning: Removed 4 rows containing non-finite values (stat_smooth).
## Warning: Removed 4 rows containing missing values (geom_point).
ggplot(SumYr, aes(x = yearID, y = AVGavg)) +
geom_point() +
scale_x_continuous("Year")+
scale_y_continuous("League AVG") +
theme_bw() + labs(title="Average player's Average by Year") +
geom_smooth(method = "lm", aes(x = yearID, y = AVGavg))
ggplot(SumYr, aes(x = HRsum, y = SOsum)) +
geom_point() +
scale_x_continuous("Sum of HR")+
scale_y_continuous("Sum of SO") +
theme_bw() + labs(title="Sum of Homeruns vs Sum of Strike Outs by Year") +
geom_smooth(method = "lm", aes(x = HRsum, y = SOsum))
## Warning: Removed 4 rows containing non-finite values (stat_smooth).
## Warning: Removed 4 rows containing missing values (geom_point).
Exploring the Pitching Points and trends connected exclusively with the Procrastination League Characteristics.
ggplot(data = PitTotal, aes(x = PointsP), colour = barlines, fill = barfill) +
geom_histogram(aes(fill = ..count..), bins = 40) +
labs(title = "Pitching Points") +
scale_fill_gradient("Count", low = "Orange", high = "Blue")
ggplot(PitTotal, aes(x = yearID, y = PointsP)) +
geom_point(aes(color = teamID)) +
scale_x_continuous("Year")+
scale_y_continuous("Points") +
theme_bw() + labs(title="Pitching Points by Year Scatterplot") + facet_wrap( ~ lgID) +
geom_smooth(method = "lm", aes(x = yearID, y = PointsP))
# to remove NA and below 0
PitTotalEra <- PitTotal %>%
filter(ERA != "NA" & ERA > 0)
ggplot(PitTotalEra, aes(x = ERA, y = PointsP)) +
geom_point(aes(color = teamID)) +
scale_x_continuous("ERA")+
scale_y_continuous("Points") +
theme_bw() + labs(title="Pitching Points by ERA Scatterplot") + facet_wrap( ~ lgID) +
coord_cartesian(ylim=c(-10,600))
#avg and total Number of GS by year
SumYrGS <- PitTotal %>%
group_by(yearID) %>%
filter(GS != "NA" & GS > 0) %>%
summarise(GSsum = sum(GS), GSavg = mean(GS) ) %>%
select(yearID, GSavg, GSsum)
#avg and total Number of CG by year
SumYrCG <- PitTotal %>%
group_by(yearID) %>%
filter(CG != "NA" & CG > 0) %>%
summarise(CGsum = sum(CG), CGavg = mean(CG) ) %>%
select(yearID, CGavg, CGsum)
#avg and total Number of IPouts by year
SumYrIP <- PitTotal %>%
group_by(yearID) %>%
filter(IPouts != "NA" & IPouts > 0) %>%
summarise(IPsum = sum(IPouts/3), IPavg = mean(IPouts/3) ) %>%
select(yearID, IPavg, IPsum)
#avg and total Number of K by year
SumYrK <- PitTotal %>%
group_by(yearID) %>%
filter(SO != "NA" & SO > 0) %>%
summarise(Ksum = sum(SO), Kavg = mean(SO) ) %>%
select(yearID, Kavg, Ksum)
#needs to remove lines that contain NA not replace with 0...
#avg of ERA by year
SumYrERA <- PitTotal %>%
group_by(yearID) %>%
filter(ERA != "NA" & ERA > 0 & ERA != "Inf") %>%
summarise(ERAavg = mean(ERA) ) %>%
select(yearID, ERAavg)
# Combine SUm and Avg Datasets together
SumYrP <- SumYrGS %>%
left_join(SumYrCG, by = c("yearID")) %>%
left_join(SumYrIP, by = c("yearID")) %>%
left_join(SumYrK, by = c("yearID")) %>%
left_join(SumYrERA, by = c("yearID"))
ggplot(SumYrP, aes(x = yearID, y = GSsum)) +
geom_point() +
scale_x_continuous("Year")+
scale_y_continuous("Sum of Games Started") +
theme_bw() + labs(title="Sum of Games Started by Year") +
geom_smooth(method = "lm", aes(x = yearID, y = GSsum))
ggplot(SumYrP, aes(x = yearID, y = GSavg)) +
geom_point() +
scale_x_continuous("Year")+
scale_y_continuous("Avg Number of GS") +
theme_bw() + labs(title="Avg Number of Games Started by Year") +
geom_smooth(method = "lm", aes(x = yearID, y = GSavg))
ggplot(SumYrP, aes(x = yearID, y = CGsum)) +
geom_point() +
scale_x_continuous("Year")+
scale_y_continuous("Sum of CG") +
theme_bw() + labs(title="Sum of Complete Games by Year") +
geom_smooth(method = "lm", aes(x = yearID, y = CGsum))
ggplot(SumYrP, aes(x = yearID, y = CGavg)) +
geom_point() +
scale_x_continuous("Year")+
scale_y_continuous("Avg Numberof CG") +
theme_bw() + labs(title="Avg Number of Complete Games by Year") +
geom_smooth(method = "lm", aes(x = yearID, y = CGavg))
ggplot(SumYrP, aes(x = yearID, y = IPsum)) +
geom_point() +
scale_x_continuous("Year")+
scale_y_continuous("Sum of Innings Pitched") +
theme_bw() + labs(title="Sum of Innings Pitched by Year") +
geom_smooth(method = "lm", aes(x = yearID, y = IPsum))
ggplot(SumYrP, aes(x = yearID, y = IPavg)) +
geom_point() +
scale_x_continuous("Year")+
scale_y_continuous("Avg Number of Innings Pitched") +
theme_bw() + labs(title="Avg Number of Innings Pitched by Year") +
geom_smooth(method = "lm", aes(x = yearID, y = IPavg))
ggplot(SumYrP, aes(x = yearID, y = Ksum)) +
geom_point() +
scale_x_continuous("Year")+
scale_y_continuous("Sum of Pitching Strike Outs Pitched") +
theme_bw() + labs(title="Sum of Pitching Strike Outs Pitched by Year") +
geom_smooth(method = "lm", aes(x = yearID, y = Ksum))
ggplot(SumYrP, aes(x = yearID, y = Kavg)) +
geom_point() +
scale_x_continuous("Year") +
scale_y_continuous("Avg Number of Pitching Strike Outs Pitched") +
theme_bw() + labs(title="Avg Number of Pitching Strike Outs by Year") +
geom_smooth(method = "lm", aes(x = yearID, y = Kavg))
ggplot(SumYrP, aes(x = yearID, y = ERAavg)) +
geom_point() +
scale_x_continuous("Year")+
scale_y_continuous("League ERA") +
theme_bw() + labs(title="Average player's ERA by Year") +
geom_smooth(method = "lm", aes(x = yearID, y = ERAavg))
ggplot(SumYrP, aes(x = GSsum, y = IPsum)) +
geom_point() +
scale_x_continuous("Sum of GS")+
scale_y_continuous("Sum of Innings Pitched") +
theme_bw() + labs(title="Sum of Games Started vs Sum of Innings Pitched by Year") +
geom_smooth(method = "lm", aes(x = GSsum, y = IPsum))
Exploring the Total Points and trends connected exclusively with the Procrastination League Characteristics.
#avg and total Number of IPouts/3 (Innings) by year
SumYrPts <- PointsCombine %>%
group_by(yearID) %>%
filter(Points != "NA" & Points > 0 & Points != "Inf") %>%
summarise(Pointssum = sum(Points), Pointsavg = mean(Points) ) %>%
select(yearID, Pointsavg, Pointssum)
#needs to remove lines that contain NA not replace with 0...
#avg of ERA by year
SumYrError <- PointsCombine %>%
group_by(yearID) %>%
filter(E != "NA" & E > 0 & E != "Inf") %>%
summarise(Errorsum = sum(E), Erroravg = mean(E) ) %>%
select(yearID, Erroravg, Errorsum)
# Combine SUm and Avg Datasets together
SumYrPC <- SumYrPts %>%
left_join(SumYrError, by = c("yearID"))
ggplot(SumYrPC, aes(x = yearID, y = Pointsavg)) +
geom_point() +
scale_x_continuous("Year")+
scale_y_continuous("League Points Avg") +
theme_bw() + labs(title="Average Points by Year") +
geom_smooth(method = "lm", aes(x = yearID, y = Pointsavg))
ggplot(SumYrPC, aes(x = yearID, y = Pointssum)) +
geom_point() +
scale_x_continuous("Year")+
scale_y_continuous("Sum of Points") +
theme_bw() + labs(title="Sum of Points by Year") +
geom_smooth(method = "lm", aes(x = yearID, y = Pointssum))
ggplot(SumYrPC, aes(x = yearID, y = Erroravg)) +
geom_point() +
scale_x_continuous("Year")+
scale_y_continuous("League Error's Average") +
theme_bw() + labs(title="League's Error Average by Year") +
geom_smooth(method = "lm", aes(x = yearID, y = Erroravg))
ggplot(SumYrPC, aes(x = yearID, y = Errorsum)) +
geom_point() +
scale_x_continuous("Year")+
scale_y_continuous("Sum of Errors") +
theme_bw() + labs(title="Sum of Errors by Year") +
geom_smooth(method = "lm", aes(x = yearID, y = Errorsum))
Exploring what Characteristics have a greater correlation to a players total point accumulations during a year.
We will explore the impact on player Batting Avg, OPS, Strikeout Rate, etc.. And its projected correlation to a players point totals for a year. In order to speed up this process and focus on players who are the major contributors I narrowed the players down to the top 30% point totals of all time. This should still give an accurate reading of the characteristics that are best exhibited by the elite players throughout history.
A positive correlation means that a higher number in that category will have a positive impact on that variable. And of course a negative value will indicate a negative relationship. Also, The closer to 1.0 a number is (of the darker blue the ellipse appears) indicates how strong of a relationship that characteristic has. This creates a matrix that allows you to compare all the supplied variables against each other to see which has a greatest impact on each other. For our purposes, the Points Column/row will help to indicate which player features are most likely to result in positive growth or exceeding expectations.
PointsCombineFac <- PointsCombine
PointsCombineFac <- PointsCombineFac %>%
select("yearID", "lgID", "Points", "SLG", "OBP", "OPS", "ISO", "AVG", "ERA", "WHIP", "FIP") %>%
mutate(Points = round(Points, 0)) %>%
mutate(OPS = round((OBP * 1000), 0)) %>%
mutate(AVG = round((SLG * 1000), 0)) %>%
mutate(OPS = round((OPS * 1000), 0)) %>%
mutate(ISO = round((ISO * 1000), 0)) %>%
mutate(AVG = round((AVG * 1000), 0)) %>%
mutate(ERA = round((ERA * 100), 0)) %>%
mutate(OPS = round((WHIP * 1000), 0)) %>%
mutate(AVG = round((FIP * 1000), 0)) %>%
replace(., is.na(.), 0) %>%
filter(ERA != "Inf", OPS != "Inf", AVG != "Inf", ISO != "Inf", WHIP != "Inf", FIP != "Inf", OBP != "Inf", SLG != "Inf") %>%
filter(Points > quantile(Points, .7)) # Selects only the top 30% of Point totals
## Warning in `[<-.factor`(`*tmp*`, thisvar, value = 0): invalid factor level,
## NA generated
summary(PointsCombineFac)
## yearID lgID Points SLG
## Min. :1906 NL :22885 Min. : 94.0 Min. : 0.0
## 1st Qu.:1945 AL :22637 1st Qu.:145.0 1st Qu.: 86.0
## Median :1977 AA : 0 Median :221.0 Median :135.0
## Mean :1971 FL : 0 Mean :253.1 Mean :145.1
## 3rd Qu.:1998 NA : 0 3rd Qu.:333.0 3rd Qu.:198.0
## Max. :2016 (Other): 0 Max. :971.0 Max. :457.0
## NA's : 103
## OBP OPS ISO AVG
## Min. :0.0000 Min. : 0 Min. : 0 Min. :-2900.0
## 1st Qu.:0.3040 1st Qu.: 0 1st Qu.: 85758 1st Qu.: 0.0
## Median :0.3300 Median : 0 Median :134730 Median : 0.0
## Mean :0.3245 Mean : 102 Mean :144792 Mean : 315.1
## 3rd Qu.:0.3580 3rd Qu.: 0 3rd Qu.:197682 3rd Qu.: 0.0
## Max. :1.0000 Max. :12000 Max. :456622 Max. :60100.0
##
## ERA WHIP FIP
## Min. : 0.00 Min. : 0.000 Min. :-2.9000
## 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 0.0000
## Median : 0.00 Median : 0.000 Median : 0.0000
## Mean : 26.24 Mean : 0.102 Mean : 0.3151
## 3rd Qu.: 0.00 3rd Qu.: 0.000 3rd Qu.: 0.0000
## Max. :6750.00 Max. :12.000 Max. :60.1000
##
PairCol <- c(3,1,4:11)
# creates scatterplots of all numeric variables
pairs(PointsCombineFac[,PairCol], pch=16, lower.panel=panel.smooth)
# Creates Correlation Matrix of all Numeric Variables
CorTable <- cor(PointsCombineFac[,PairCol])
# Do not modify the following code:
# displays selection on Markdown File
knitr::kable(round(CorTable,4), format = "markdown")
| Points | yearID | SLG | OBP | OPS | ISO | AVG | ERA | WHIP | FIP | |
|---|---|---|---|---|---|---|---|---|---|---|
| Points | 1.0000 | -0.0494 | 0.9006 | 0.3773 | -0.1169 | 0.9007 | -0.1079 | -0.0779 | -0.1169 | -0.1079 |
| yearID | -0.0494 | 1.0000 | 0.0536 | -0.0690 | -0.0459 | 0.0537 | -0.0330 | -0.0242 | -0.0459 | -0.0330 |
| SLG | 0.9006 | 0.0536 | 1.0000 | 0.5004 | -0.3207 | 1.0000 | -0.2819 | -0.1884 | -0.3207 | -0.2819 |
| OBP | 0.3773 | -0.0690 | 0.5004 | 1.0000 | -0.4197 | 0.5000 | -0.3643 | -0.2473 | -0.4197 | -0.3643 |
| OPS | -0.1169 | -0.0459 | -0.3207 | -0.4197 | 1.0000 | -0.3206 | 0.8537 | 0.8153 | 1.0000 | 0.8537 |
| ISO | 0.9007 | 0.0537 | 1.0000 | 0.5000 | -0.3206 | 1.0000 | -0.2818 | -0.1883 | -0.3206 | -0.2818 |
| AVG | -0.1079 | -0.0330 | -0.2819 | -0.3643 | 0.8537 | -0.2818 | 1.0000 | 0.7126 | 0.8537 | 1.0000 |
| ERA | -0.0779 | -0.0242 | -0.1884 | -0.2473 | 0.8153 | -0.1883 | 0.7126 | 1.0000 | 0.8153 | 0.7126 |
| WHIP | -0.1169 | -0.0459 | -0.3207 | -0.4197 | 1.0000 | -0.3206 | 0.8537 | 0.8153 | 1.0000 | 0.8537 |
| FIP | -0.1079 | -0.0330 | -0.2819 | -0.3643 | 0.8537 | -0.2818 | 1.0000 | 0.7126 | 0.8537 | 1.0000 |
# Visually displays the Correlation Matrix
## The narrower the width of the ellipse and darker the color the stronger the degree of correlation
## Blue indicates Positive Correlation
## Red Negative Correlation
corrplot(cor(PointsCombineFac[,PairCol]), method="ellipse",
addCoef.col = "black", # Add coefficient of correlation
tl.col="black", tl.srt=45, number.cex = 0.7) #Text label color and rotation
Through this brief overview several trends and analysis information can be gained. From an overall trend, the number of strikeouts and Homeruns continues to increase while the number of Innings Pitched continues to fall. In relationship to the Procrastination Baseball Points perspective, the number of points per player continues to fall. This is likely due to the decrease in the amount of time players spend in the field, and the continued rise of Strikeouts in favor of the long ball. This increased focus on hitting homeruns at the sacrifice of increasing number of strikeouts has given rise to a continued decline of a player’s Batting Average. If the trend continues players that contribute consistent high point totals will continue to dwindle in favor of a team’s approach of spreading out the workload on more players. This is most obviously seen in Pitching where teams are more careful with player injuries.
#First Remove all the Objects from the Data Set Up
remove(BatTotal,PitTotal,FieldTotal,MasterTotal,BatTotalCol,PitTotalCol,PointsCombine,PointsCol,ColNA,EraLabel,EraBreaks,TeamCol,PlayerAVGEra,PlayerAVGYear,PlayerAVGpos,PlayerAVGteam,PlayerAVGlg)
#now remove all Objects from the Correlation Section
remove(CorTable,PairCol,PointsCombineFac)
#now remove all Objects from the Batting/Pitching Exploration Section Section
remove(SumYr,SumYrAVG,SumYrCG,SumYrERA,SumYrGS,SumYrHR,SumYrP,SumYrSB,SumYrSO,BatTotalAvg,PitTotalEra,SumYrError,SumYrIP,SumYrPC,SumYrPts)