Procrastination Fantasy Baseball

 

The following synopsis of information analyzes and explores the unlike qualities of the Procrastination Fantasy Point system. Its intention is to find player correlations that can benefit selection and predictability of players.

Creating Simulated Point Totals

Step 1 is to create a point’s column representative of our leagues unique characteristics. Not all characteristics can easily be identifiable due to non-specific information at this time. However, hopefully future investigation and exploration will help lead to further resources and tools to more accurately reflect our league’s specific characteristics. This information may also, in the future, help to visualize baseball trends, characteristics, and future point changes in a productive and data driven direction.

Data Cleaning


In order to use our dataset we must first retrieve useable data. I set the following parameter to ensure a beginning place that is representative of our league. This required limiting data from after 1905, from only the “AL” and “NL” Divisions, and retrieving data from players who, pitched, batted, or fielded in at least one game. Player Positions were also turned into a categorical factor to enable further investigation based on playing position in the future.

Baseball Eras


The Dead Ball Era (1901 - 1920)
World War 2 (1941 - 1945)
Segregation Era (1901 - 1947ish)
Post-War Era/Yankees Era (1945 - late 50s/early 60s)
Westward Expansion (1953 - 1961)
Dead Ball 2 (The Sixties, roughly)
Designated Hitter Era (1973 - current, AL only)
Free Agency/Arbitration Era (1975 - current)
Steroid Era (unknown, but late 80s - 2005 seems likely)
Wild Card Era (1994 - current)

# Batting Data Filtering
BatTotal <- Batting  %>% filter(yearID > 1905, lgID == "AL" | lgID == "NL", G > 0) %>% 
  replace(., is.na(.), 0)


# Pitching Data Filtering
PitTotal <- Pitching  %>% 
  filter(yearID > 1905, lgID == "AL" | lgID == "NL", G > 0) %>% 
  replace(., is.na(.), 0)

# Fielding Data Filtering
FieldTotal <- Fielding  %>% 
  filter(yearID > 1905, lgID == "AL" | lgID == "NL", G > 0) %>% 
  replace(., is.na(.), 0)

# Fielding Position Facotring
FieldTotal$POS <- factor(FieldTotal$POS, levels=c("C", "1B", "2B", "3B", "P", "SS", "OF"))

# Master Data Filtering
MasterTotal <- Master  

 

Creating Batting Points

 

Create Fantasy Batting Point Totals

 

Runs Scored (R) = 1
Home Runs (HR) = 2
Total Bases (TB) = 1
Runs Batted In (RBI) = 1
Walks (BB) = 1
Intentional Walks (IBB) = 1
Strikeouts (K) = -1
Stolen Bases (SB) = 1
Caught Stealing (CS) = -1
Hitting for the Cycle (CYC) = 20 – not included
Grand Slam Home Runs (GSHR) = 4 – not included
Errors (E) = -1 – not included – (In Fielding Dataset - included in Combined)


### Batting points are calculated in the following manner:
PointsB = R + RBI + BB + IBB - SO + SB - CS + (H - X2B - X3B - HR) + (2 * X2B) + (3 * X3B) + (4 * HR) + (HR 2)

### Batting Average (AVG) has also been calculated and added to the dataset for further use. AVG = ( H / AB )


### Slugging Percentage (SLG) has also been calculated and added to the dataset for further use.
SLG = ( ( (H - X2B - X3B - HR) + (2
X2B) + (3 * X3B) + (4 * HR) ) / AB)


### On Base Percentage (OBP) has also been calculated and added to the dataset for further use.
OBP = (Hits + Walks + Hit by Pitch) / (At Bats + Walks + Hit by Pitch + Sacrifice Flies)


### On-Base Plus Slugging (OPS) has also been calculated and added to the dataset for further use.
OPS = ( (H+BB+HBP) / (AB+BB+SF+HBP)) + (((1 × B) + (2 × D) + (3 × T) + (4 × HR)) / AB)


### Isolated Power (ISO) has also been calculated and added to the dataset for further use.
ISO = SLG - AVG

# Create Fantasy Batting Point Totals

# Runs Scored (R) = 1
# Home Runs (HR) = 2
# Total Bases (TB) = 1
# Runs Batted In (RBI) = 1
# Walks (BB) = 1
# Intentional Walks (IBB) = 1
# Strikeouts (K) = -1
# Stolen Bases (SB) = 1
# Caught Stealing (CS) = -1

# Hitting for the Cycle (CYC) = 20 -- not included
# Grand Slam Home Runs (GSHR) = 4 -- not included
# Errors (E) = -1 -- not included (In Fielding Dataset - included in Combined)


# IBB count as BB???


# adding OPS:
#Formula:

#OPS = ( (H+BB+HBP) / (AB+BB+SF+HBP)) + (((1 × B) + (2 × D) + (3 × T) + (4 × HR)) / AB)

#Where,
#OPS = On-Base Plus Slugging,
#H = Hits,
#BB = Walks,
#HBP = Hits By Pitch,
#AB = At Bats,
#SF = Sacrifice Flies
#B = Singles,
#D = Doubles,
#T = Triples,
#HR = Homeruns,


# calculating Batting Points based on current Procrastination League Rules

# also includes adding a players Batting AVG rounded to 3 decimal points

BatTotal <- BatTotal %>% 
  mutate(PointsB = (R + RBI + BB + IBB - SO + SB - CS + (H - X2B - X3B - HR) + (2 * X2B) + (3 * X3B) + (4 * HR) + (HR *2) ) ) %>%
  mutate(AVG = round( (H/AB), 3) )  %>%
  mutate(SLG = round( ( (H - X2B - X3B - HR) + (2 * X2B) + (3 * X3B) + (4 * HR) ), 3) ) %>%
  mutate(OBP = round( ( (H+BB+HBP) / (AB+BB+SF+HBP) ), 3) ) %>%
  mutate(OPS = round( ( (H + BB + HBP) / (AB + BB + SF + HBP) ) + ( ( (1 * (H - X2B - X3B - HR) ) + (2 * X2B) + (3 * X3B) + (4 * HR) ) / AB), 3) ) %>%
  mutate(ISO = round( (SLG - AVG), 3) )


# Joining the Batting statistics to the Master dataset for use in Batting specific Calculations

BatTotalCol <- BatTotal %>% 
  left_join(MasterTotal, by = c("playerID")) %>%
  select(nameFirst, nameLast, playerID, yearID, teamID, lgID, PointsB, AVG, SLG, OBP, OPS, ISO) %>%
  arrange(-PointsB)

#Displaying the Batting Dataset sorted by Descending Points

#head(BatTotalCol)
BatTotalCol

Creating Pitching Points

 

Create Fantasy Pitching Point Totals

 

Innings Pitched (IP) = 1
Hits Allowed (H) = -1
Earned Runs (ER) = -2
Walks Issued (BB) = -1
Hit Batsmen (HB) = -1
Strikeouts (K) = 1
Wild Pitches (WP) = -1
Balks (B) = -1
Pick Offs (PKO) = 2 – not included
Quality Starts (QS) = 5 – not included
Complete Games (CG) = 10
Shutouts (SO) = 5
No Hitters (NH) = 15 – not included
Perfect Games (PG) = 20 – not included
Wins (W) = 5
Losses (L) = -5
Saves (SV) = 5
Blown Saves (BS) = -5 – not included


### Pitching points are calculated in the following manner:
PointsP = (W * 5) - (L * 5) + (SV * 5) + (CG * 10) - WP - BK + SO - HBP - BB - IBB - (ER * 2) - H + (IPouts/3) + (SHO * 5)

### Earned Run Average (ERA) has also been calculated and added to the dataset for further use. ERA = (ER / (IPouts/3) ) *9



### WHIP has also been calculated and added to the dataset for further use. WHIP = (Number of hits allowed + Number of walks allowed) / Number of Innings Pitched


### Fielding Independant Pitching (FIP) has also been calculated and added to the dataset for further use. FIP = ((13HR)+(3(BB+HBP))-(2*K))/IP + FIP Constant (about 3.10)


maybe one day I can add in the accurate FIP Constant

# Create Fantasy Pitching Point Totals

# Innings Pitched (IP) = 1
# Hits Allowed (H) = -1
# Earned Runs (ER) = -2
# Walks Issued (BB) = -1
# Hit Batsmen (HB) = -1
# Strikeouts (K) = 1
# Wild Pitches (WP) = -1
# Balks (B) = -1
# Pick Offs (PKO) = 2   -- not included
# Quality Starts (QS) = 5   -- not included
# Complete Games (CG) = 10
# Shutouts (SO) = 5
# No Hitters (NH) = 15  -- not included
# Perfect Games (PG) = 20  -- not included
# Wins (W) = 5
# Losses (L) = -5
# Saves (SV) = 5
# Blown Saves (BS) = -5  -- not included


# calculating Pitching Points based on current Procrastination League Rules

# also includes adding a players Pitching ERA rounded to 2 decimal points


PitTotal <- PitTotal %>% 
  mutate(PointsP = ((W * 5) - (L * 5) + (SV * 5) + (CG * 10) - WP - BK + SO - HBP - BB - IBB - (ER * 2) - H + (IPouts/3) + (SHO * 5) )) %>%
  mutate(ERA = round( (ER/(IPouts/3))*9, 2) ) %>%
  mutate(WHIP = round( (H + BB) / (IPouts/3), 2))  %>%
  mutate(FIP = round( ( ( (13 * HR)+(3 * (BB + HBP) ) - (2 * SO) )/(IPouts/3) + 3.10 ), 2) )

# Joining the Pitching statistics to the Master dataset for use in Pitching specific Calculations


PitTotalCol <- PitTotal %>% 
  left_join(MasterTotal, by = c("playerID")) %>%
  select(nameFirst, nameLast, playerID, yearID, teamID, lgID, PointsP, ERA, WHIP, FIP) %>%
  arrange(-PointsP)

#Displaying the Pitching Dataset sorted by Descending Points

#head(PitTotalCol)
PitTotalCol

Creating Fielding Points

 

Create Fantasy Fielding Point Totals

 

Error (IP) = -1

Errors effect Player’s batting statistics. However this information is stored in a seperate dataset and has to be combined seperately. For now I have point totals calculated in each field (Batting, Pitching, and Fielding) and are then will be added together to get a players total for the year.

# calculating Fielding Points based on current Procrastination League Rules



FieldTotal <- FieldTotal %>% 
  mutate(PointsF = (-E))

Merging Each Dataset into one

 

Create One Complete Dataset to calculate player’s total contributions in one year

 

This was completed by first joining all 4 datasets together into one file. Then Creating a new variable called “Points” that added together a players Batting, Pitching, and Fielding totals for that specific year.

The following table will give a synopsis of the complete dataset for future reference.

# calculating FULL Points based on current Procrastination League Rules

# also includes Combinign all datasets point totals into one variable


PointsCombine <- MasterTotal %>% 
  left_join(BatTotal, by = c("playerID")) %>%
  left_join(PitTotal, by = c("playerID", "yearID")) %>%  
  left_join(FieldTotal, by = c("playerID", "yearID"))

ColNA <- c(32:53,57:58,60:84,90:101)

PointsCombine[ , ColNA][is.na(PointsCombine[ , ColNA] ) ] = 0 

PointsCombine <- PointsCombine %>% 
  mutate(Points = (PointsB + PointsP + PointsF)) %>%
  filter(playerID != "NA") %>%
  arrange(-Points) %>%
  mutate(yearIDEra = PointsCombine$yearID)
  

#Creates Era Breaks

EraLabel <- c("Dead Ball Era (1901-1919)", "Live Ball Era (1920-1941)", "Integration Era (1942-1960)", "Expansion Era (1961-1976)", "Free Agency Era (1977-1993)", "Long Ball/Steroid Era (1994-2005)", "Post Steroid Era (2006-Current)" )


EraBreaks <- c(1905,1919,1941,1960,1976,1993,2005,2018)

PointsCombine$yearIDEra <- cut(PointsCombine$yearIDEra, breaks = EraBreaks, labels = EraLabel) 


summary(PointsCombine)
##    playerID           birthYear      birthMonth        birthDay    
##  Length:152737      Min.   :1820   Min.   : 1.000   Min.   : 1.00  
##  Class :character   1st Qu.:1919   1st Qu.: 4.000   1st Qu.: 8.00  
##  Mode  :character   Median :1951   Median : 7.000   Median :16.00  
##                     Mean   :1944   Mean   : 6.637   Mean   :15.77  
##                     3rd Qu.:1971   3rd Qu.:10.000   3rd Qu.:23.00  
##                     Max.   :1996   Max.   :12.000   Max.   :31.00  
##                     NA's   :132    NA's   :304      NA's   :451    
##  birthCountry        birthState         birthCity           deathYear    
##  Length:152737      Length:152737      Length:152737      Min.   :1872   
##  Class :character   Class :character   Class :character   1st Qu.:1962   
##  Mode  :character   Mode  :character   Mode  :character   Median :1981   
##                                                           Mean   :1979   
##                                                           3rd Qu.:2001   
##                                                           Max.   :2017   
##                                                           NA's   :96981  
##    deathMonth       deathDay     deathCountry        deathState       
##  Min.   : 1.00   Min.   : 1.0    Length:152737      Length:152737     
##  1st Qu.: 3.00   1st Qu.: 8.0    Class :character   Class :character  
##  Median : 7.00   Median :15.0    Mode  :character   Mode  :character  
##  Mean   : 6.54   Mean   :15.5                                         
##  3rd Qu.:10.00   3rd Qu.:23.0                                         
##  Max.   :12.00   Max.   :31.0                                         
##  NA's   :96982   NA's   :96984                                        
##   deathCity          nameFirst           nameLast        
##  Length:152737      Length:152737      Length:152737     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##   nameGiven             weight          height        bats      
##  Length:152737      Min.   : 65.0   Min.   :43.00   B   :12934  
##  Class :character   1st Qu.:175.0   1st Qu.:71.00   L   :40315  
##  Mode  :character   Median :185.0   Median :73.00   R   :98209  
##                     Mean   :188.6   Mean   :72.58   NA's: 1279  
##                     3rd Qu.:200.0   3rd Qu.:74.00               
##                     Max.   :320.0   Max.   :83.00               
##                     NA's   :1013    NA's   :863                 
##   throws          debut            finalGame           retroID         
##  L   : 28136   Length:152737      Length:152737      Length:152737     
##  R   :123612   Class :character   Class :character   Class :character  
##  S   :     9   Mode  :character   Mode  :character   Mode  :character  
##  NA's:   980                                                           
##                                                                        
##                                                                        
##                                                                        
##    bbrefID            deathDate            birthDate         
##  Length:152737      Min.   :1872-03-17   Min.   :1820-04-17  
##  Class :character   1st Qu.:1962-11-29   1st Qu.:1920-02-26  
##  Mode  :character   Median :1981-10-17   Median :1951-02-07  
##                     Mean   :1979-05-21   Mean   :1944-04-22  
##                     3rd Qu.:2001-04-01   3rd Qu.:1971-03-12  
##                     Max.   :2017-02-19   Max.   :1996-08-12  
##                     NA's   :96984        NA's   :451         
##      yearID        stint.x         teamID.x          lgID.x     
##  Min.   :1906   Min.   :1.000   SLN    :  7164   NL     :75725  
##  1st Qu.:1951   1st Qu.:1.000   CLE    :  7057   AL     :74323  
##  Median :1980   Median :1.000   NYA    :  6923   AA     :    0  
##  Mean   :1974   Mean   :1.179   CHN    :  6909   FL     :    0  
##  3rd Qu.:2001   3rd Qu.:1.000   PHI    :  6907   NA     :    0  
##  Max.   :2016   Max.   :4.000   (Other):115088   (Other):    0  
##  NA's   :2689   NA's   :2689    NA's   :  2689   NA's   : 2689  
##       G.x               AB             R.x              H.x        
##  Min.   :  1.00   Min.   :  0.0   Min.   :  0.00   Min.   :  0.00  
##  1st Qu.: 13.00   1st Qu.:  3.0   1st Qu.:  0.00   1st Qu.:  0.00  
##  Median : 34.00   Median : 45.0   Median :  4.00   Median :  8.00  
##  Mean   : 50.87   Mean   :132.6   Mean   : 16.91   Mean   : 34.45  
##  3rd Qu.: 80.00   3rd Qu.:208.0   3rd Qu.: 24.00   3rd Qu.: 52.00  
##  Max.   :165.00   Max.   :716.0   Max.   :177.00   Max.   :262.00  
##  NA's   :2689                                                      
##       X2B              X3B              HR.x             RBI        
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   :  0.00  
##  1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.:  0.00  
##  Median : 1.000   Median : 0.000   Median : 0.000   Median :  3.00  
##  Mean   : 5.971   Mean   : 1.052   Mean   : 2.784   Mean   : 15.59  
##  3rd Qu.: 9.000   3rd Qu.: 1.000   3rd Qu.: 2.000   3rd Qu.: 22.00  
##  Max.   :67.000   Max.   :36.000   Max.   :73.000   Max.   :191.00  
##                                                                     
##       SB.x              CS.x             BB.x             SO.x      
##  Min.   :  0.000   Min.   : 0.000   Min.   :  0.00   Min.   :  0.0  
##  1st Qu.:  0.000   1st Qu.: 0.000   1st Qu.:  0.00   1st Qu.:  0.0  
##  Median :  0.000   Median : 0.000   Median :  3.00   Median :  8.0  
##  Mean   :  2.337   Mean   : 1.014   Mean   : 12.47   Mean   : 19.5  
##  3rd Qu.:  2.000   3rd Qu.: 1.000   3rd Qu.: 18.00   3rd Qu.: 28.0  
##  Max.   :130.000   Max.   :42.000   Max.   :232.00   Max.   :223.0  
##                                                                     
##      IBB.x              HBP.x             SH.x             SF.x        
##  Min.   :  0.0000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.0000  
##  1st Qu.:  0.0000   1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.0000  
##  Median :  0.0000   Median : 0.000   Median : 0.000   Median : 0.0000  
##  Mean   :  0.7416   Mean   : 0.939   Mean   : 2.081   Mean   : 0.7549  
##  3rd Qu.:  0.0000   3rd Qu.: 1.000   3rd Qu.: 3.000   3rd Qu.: 1.0000  
##  Max.   :120.0000   Max.   :50.000   Max.   :67.000   Max.   :19.0000  
##                                                                        
##      GIDP.x          PointsB            AVG              SLG        
##  Min.   : 0.000   Min.   :-52.00   Min.   :0.0000   Min.   :  0.00  
##  1st Qu.: 0.000   1st Qu.:  0.00   1st Qu.:0.0000   1st Qu.:  0.00  
##  Median : 0.000   Median : 13.00   Median :0.2080   Median : 11.00  
##  Mean   : 2.449   Mean   : 83.99   Mean   :0.1727   Mean   : 50.88  
##  3rd Qu.: 3.000   3rd Qu.:118.00   3rd Qu.:0.2640   3rd Qu.: 74.00  
##  Max.   :36.000   Max.   :991.00   Max.   :1.0000   Max.   :457.00  
##                                                                     
##       OBP              OPS              ISO            stint.y     
##  Min.   :0.0000   Min.   :0.0000   Min.   :  0.00   Min.   :1.00   
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:  0.00   1st Qu.:1.00   
##  Median :0.2680   Median :0.5470   Median : 10.81   Median :1.00   
##  Mean   :0.2212   Mean   :0.4626   Mean   : 50.71   Mean   :1.25   
##  3rd Qu.:0.3320   3rd Qu.:0.7070   3rd Qu.: 73.76   3rd Qu.:1.00   
##  Max.   :1.0000   Max.   :5.0000   Max.   :456.62   Max.   :4.00   
##                                                     NA's   :89349  
##     teamID.y         lgID.y            W              L        
##  SLN    : 3074   NL     :32046   Min.   : 0.0   Min.   : 0.00  
##  PHI    : 3043   AL     :31342   1st Qu.: 0.0   1st Qu.: 0.00  
##  NYA    : 3007   AA     :    0   Median : 0.0   Median : 0.00  
##  CLE    : 2956   FL     :    0   Mean   : 1.5   Mean   : 1.57  
##  BOS    : 2904   NA     :    0   3rd Qu.: 1.0   3rd Qu.: 1.00  
##  (Other):48404   (Other):    0   Max.   :40.0   Max.   :27.00  
##  NA's   :89349   NA's   :89349                                 
##       G.y              GS.x              CG               SHO         
##  Min.   :  1.00   Min.   : 0.000   Min.   : 0.0000   Min.   : 0.0000  
##  1st Qu.:  6.00   1st Qu.: 0.000   1st Qu.: 0.0000   1st Qu.: 0.0000  
##  Median : 17.00   Median : 0.000   Median : 0.0000   Median : 0.0000  
##  Mean   : 21.01   Mean   : 3.032   Mean   : 0.7411   Mean   : 0.1331  
##  3rd Qu.: 32.00   3rd Qu.: 0.000   3rd Qu.: 0.0000   3rd Qu.: 0.0000  
##  Max.   :106.00   Max.   :49.000   Max.   :42.0000   Max.   :16.0000  
##  NA's   :89349                                                        
##        SV              IPouts             H.y               ER        
##  Min.   : 0.0000   Min.   :   0.00   Min.   :  0.00   Min.   :  0.00  
##  1st Qu.: 0.0000   1st Qu.:   0.00   1st Qu.:  0.00   1st Qu.:  0.00  
##  Median : 0.0000   Median :   0.00   Median :  0.00   Median :  0.00  
##  Mean   : 0.5497   Mean   :  82.89   Mean   : 27.83   Mean   : 12.24  
##  3rd Qu.: 0.0000   3rd Qu.:  81.00   3rd Qu.: 29.00   3rd Qu.: 14.00  
##  Max.   :62.0000   Max.   :1392.00   Max.   :381.00   Max.   :186.00  
##                                                                       
##       HR.y             BB.y             SO.y            BAOpp       
##  Min.   : 0.000   Min.   :  0.00   Min.   :  0.00   Min.   :0.0000  
##  1st Qu.: 0.000   1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.:0.0000  
##  Median : 0.000   Median :  0.00   Median :  0.00   Median :0.0000  
##  Mean   : 2.369   Mean   : 10.17   Mean   : 15.87   Mean   :0.1101  
##  3rd Qu.: 2.000   3rd Qu.: 12.00   3rd Qu.: 15.00   3rd Qu.:0.2500  
##  Max.   :50.000   Max.   :208.00   Max.   :383.00   Max.   :1.0000  
##                                                                     
##       ERA           IBB.y              WP.x             HBP.y        
##  Min.   :0.00   Min.   : 0.0000   Min.   : 0.0000   Min.   : 0.0000  
##  1st Qu.:0.00   1st Qu.: 0.0000   1st Qu.: 0.0000   1st Qu.: 0.0000  
##  Median :0.00   Median : 0.0000   Median : 0.0000   Median : 0.0000  
##  Mean   : Inf   Mean   : 0.6333   Mean   : 0.8325   Mean   : 0.7614  
##  3rd Qu.:3.84   3rd Qu.: 0.0000   3rd Qu.: 1.0000   3rd Qu.: 1.0000  
##  Max.   : Inf   Max.   :23.0000   Max.   :27.0000   Max.   :26.0000  
##                                                                      
##        BK               BFP               GF             R.y        
##  Min.   : 0.0000   Min.   :   0.0   Min.   : 0.00   Min.   :  0.00  
##  1st Qu.: 0.0000   1st Qu.:   0.0   1st Qu.: 0.00   1st Qu.:  0.00  
##  Median : 0.0000   Median :   0.0   Median : 0.00   Median :  0.00  
##  Mean   : 0.1112   Mean   : 118.4   Mean   : 2.46   Mean   : 13.93  
##  3rd Qu.: 0.0000   3rd Qu.: 119.0   3rd Qu.: 2.00   3rd Qu.: 16.00  
##  Max.   :16.0000   Max.   :1799.0   Max.   :84.00   Max.   :205.00  
##                                                                     
##       SH.y              SF.y             GIDP.y           PointsP       
##  Min.   : 0.0000   Min.   : 0.0000   Min.   : 0.0000   Min.   :-387.00  
##  1st Qu.: 0.0000   1st Qu.: 0.0000   1st Qu.: 0.0000   1st Qu.: -12.67  
##  Median : 0.0000   Median : 0.0000   Median : 0.0000   Median :   0.00  
##  Mean   : 0.2231   Mean   : 0.2002   Mean   : 0.5452   Mean   : -10.83  
##  3rd Qu.: 0.0000   3rd Qu.: 0.0000   3rd Qu.: 0.0000   3rd Qu.:   0.00  
##  Max.   :21.0000   Max.   :14.0000   Max.   :40.0000   Max.   : 797.00  
##                                                                         
##       WHIP           FIP            stint           teamID      
##  Min.   :0.00   Min.   :-2.90   Min.   :1.000   SLN    :  7102  
##  1st Qu.:0.00   1st Qu.: 0.00   1st Qu.:1.000   CLE    :  6979  
##  Median :0.00   Median : 0.00   Median :1.000   PHI    :  6904  
##  Mean   : Inf   Mean   :  Inf   Mean   :1.183   CHN    :  6854  
##  3rd Qu.:1.38   3rd Qu.: 4.16   3rd Qu.:1.000   NYA    :  6831  
##  Max.   : Inf   Max.   :  Inf   Max.   :4.000   (Other):114215  
##                                 NA's   :3852    NA's   :  3852  
##       lgID            POS              G               GS.y       
##  NL     :75184   P      :61491   Min.   :  1.00   Min.   :  0.00  
##  AL     :73701   OF     :27052   1st Qu.:  5.00   1st Qu.:  0.00  
##  AA     :    0   3B     :13680   Median : 18.00   Median :  1.00  
##  FL     :    0   1B     :13386   Mean   : 32.86   Mean   : 16.27  
##  NA     :    0   2B     :12239   3rd Qu.: 42.00   3rd Qu.: 15.00  
##  (Other):    0   (Other):21037   Max.   :165.00   Max.   :164.00  
##  NA's   : 3852   NA's   : 3852   NA's   :3852                     
##     InnOuts             PO                A                E         
##  Min.   :   0.0   Min.   :   0.00   Min.   :  0.00   Min.   : 0.000  
##  1st Qu.:   0.0   1st Qu.:   1.00   1st Qu.:  1.00   1st Qu.: 0.000  
##  Median :  69.0   Median :   6.00   Median :  6.00   Median : 1.000  
##  Mean   : 438.2   Mean   :  69.87   Mean   : 29.32   Mean   : 2.465  
##  3rd Qu.: 362.0   3rd Qu.:  49.00   3rd Qu.: 20.00   3rd Qu.: 2.000  
##  Max.   :4469.0   Max.   :1846.00   Max.   :641.00   Max.   :80.000  
##                                                                      
##        DP                PB               WP.y              SB.y        
##  Min.   :  0.000   Min.   : 0.0000   Min.   : 0.0000   Min.   :  0.000  
##  1st Qu.:  0.000   1st Qu.: 0.0000   1st Qu.: 0.0000   1st Qu.:  0.000  
##  Median :  1.000   Median : 0.0000   Median : 0.0000   Median :  0.000  
##  Mean   :  6.355   Mean   : 0.2103   Mean   : 0.3426   Mean   :  1.063  
##  3rd Qu.:  3.000   3rd Qu.: 0.0000   3rd Qu.: 0.0000   3rd Qu.:  0.000  
##  Max.   :194.000   Max.   :35.0000   Max.   :69.0000   Max.   :155.000  
##                                                                         
##       CS.y               ZR              PointsF            Points       
##  Min.   : 0.0000   Min.   : 0.00000   Min.   :-80.000   Min.   :-389.00  
##  1st Qu.: 0.0000   1st Qu.: 0.00000   1st Qu.: -2.000   1st Qu.: -12.33  
##  Median : 0.0000   Median : 0.00000   Median : -1.000   Median :  18.00  
##  Mean   : 0.5105   Mean   : 0.02036   Mean   : -2.465   Mean   :  70.70  
##  3rd Qu.: 0.0000   3rd Qu.: 0.00000   3rd Qu.:  0.000   3rd Qu.: 125.00  
##  Max.   :89.0000   Max.   :15.00000   Max.   :  0.000   Max.   : 971.00  
##                                                                          
##                              yearIDEra    
##  Free Agency Era (1977-1993)      :27768  
##  Post Steroid Era (2006-Current)  :26424  
##  Long Ball/Steroid Era (1994-2005):26331  
##  Expansion Era (1961-1976)        :22241  
##  Integration Era (1942-1960)      :18174  
##  (Other)                          :29110  
##  NA's                             : 2689
# Joining the FULL statistics to the Master dataset for specific Calculations


PointsCol <- PointsCombine %>% 
  select(nameFirst, nameLast, yearID, teamID, lgID, Points, POS, AVG, OPS, ERA, yearIDEra) %>%
  arrange(-Points) 

#Displaying the FULL Dataset sorted by Descending Points
print("The following table gives a summary of the reduced table that will be frequently used")
## [1] "The following table gives a summary of the reduced table that will be frequently used"
summary(PointsCol)
##   nameFirst           nameLast             yearID         teamID      
##  Length:152737      Length:152737      Min.   :1906   SLN    :  7102  
##  Class :character   Class :character   1st Qu.:1951   CLE    :  6979  
##  Mode  :character   Mode  :character   Median :1980   PHI    :  6904  
##                                        Mean   :1974   CHN    :  6854  
##                                        3rd Qu.:2001   NYA    :  6831  
##                                        Max.   :2016   (Other):114215  
##                                        NA's   :2689   NA's   :  3852  
##       lgID           Points             POS             AVG        
##  NL     :75184   Min.   :-389.00   P      :61491   Min.   :0.0000  
##  AL     :73701   1st Qu.: -12.33   OF     :27052   1st Qu.:0.0000  
##  AA     :    0   Median :  18.00   3B     :13680   Median :0.2080  
##  FL     :    0   Mean   :  70.70   1B     :13386   Mean   :0.1727  
##  NA     :    0   3rd Qu.: 125.00   2B     :12239   3rd Qu.:0.2640  
##  (Other):    0   Max.   : 971.00   (Other):21037   Max.   :1.0000  
##  NA's   : 3852                     NA's   : 3852                   
##       OPS              ERA                                   yearIDEra    
##  Min.   :0.0000   Min.   :0.00   Free Agency Era (1977-1993)      :27768  
##  1st Qu.:0.0000   1st Qu.:0.00   Post Steroid Era (2006-Current)  :26424  
##  Median :0.5470   Median :0.00   Long Ball/Steroid Era (1994-2005):26331  
##  Mean   :0.4626   Mean   : Inf   Expansion Era (1961-1976)        :22241  
##  3rd Qu.:0.7070   3rd Qu.:3.84   Integration Era (1942-1960)      :18174  
##  Max.   :5.0000   Max.   : Inf   (Other)                          :29110  
##                                  NA's                             : 2689
print("The following table allows you to investigate the entire dataset sorted by point totals in decending order.")
## [1] "The following table allows you to investigate the entire dataset sorted by point totals in decending order."
PointsCol

Data Exploration

 

Exploring the data and trends

 

First step is to create year’s average player. This helps to create a baseline for a player’s performance level comparative to the league average, Team average, and Position average. This allows us to compare differences between Teams, Players, Era, and Leagues. Haven’t you always wanted to know which league was faster? Or hits more homeruns?

TeamCol <- c(ColNA,102)

PlayerAVGlg <- PointsCombine  %>%
  group_by(lgID) %>%
  filter(POS != "lgID") %>%
  summarise_at(vars(TeamCol), funs(mean(., na.rm=TRUE)))

print("Average Player by League")
## [1] "Average Player by League"
PlayerAVGlg
PlayerAVGteam <- PointsCombine  %>%
  group_by(teamID) %>%
  filter(teamID != "NA") %>%
  summarise_at(vars(TeamCol), funs(mean(., na.rm=TRUE)))
  
print("Average Player by Team")
## [1] "Average Player by Team"
PlayerAVGteam
PlayerAVGpos <- PointsCombine  %>%
  group_by(POS) %>%
  filter(POS != "NA") %>%
  summarise_at(vars(TeamCol), funs(mean(., na.rm=TRUE)))

print("Average Player by Position")
## [1] "Average Player by Position"
PlayerAVGpos
PlayerAVGYear <- PointsCombine  %>%
  group_by(yearID) %>%
  filter(yearID != "NA") %>%
  summarise_at(vars(TeamCol), funs(mean(., na.rm=TRUE)))

print("Average Player by Year")
## [1] "Average Player by Year"
PlayerAVGYear
PlayerAVGEra <- PointsCombine  %>%
  group_by(yearIDEra) %>%
  filter(yearIDEra != "NA") %>%
  summarise_at(vars(TeamCol), funs(mean(., na.rm=TRUE)))

print("Average Player by Era")
## [1] "Average Player by Era"
PlayerAVGEra

Batting Point Exploration

 

Exploring the Batting Points and trends connected exclusively with the Procrastination League Characteristics.

 

In this section will briefly explore some characteristics of the Batting Dataset. This will compare only statistics from hitters (however if will included Pitchers that batted)

ggplot(data = BatTotal, aes(x = PointsB), colour = barlines, fill = barfill) +
  geom_histogram(aes(fill = ..count..), bins = 40) +
  labs(title = "Batting Points") +
  scale_fill_gradient("Count", low = "Orange", high = "Blue")

## Points by Team StackedBar
ggplot(BatTotal, aes(PointsB, fill = teamID)) + 
  geom_bar() + 
  labs(title = "Batting Points by Team", x = "Team ID", y = "Count") +
  coord_cartesian(ylim=c(-5,150))

## Pect. points by Team
ggplot(data = BatTotal, aes(PointsB, fill = teamID)) + 
  geom_bar(position="fill")+ 
  labs(title = "Points by Team", x = "Points", y = "Percentage")

ggplot(BatTotal, aes(x = yearID, y = PointsB)) + 
  geom_point(aes(color = teamID)) + 
  scale_x_continuous("Year")+
  scale_y_continuous("Batting Points") + 
  theme_bw() + labs(title="Batting Points by Year Scatterplot") + facet_wrap( ~ lgID) +
  geom_smooth(method = "lm", aes(x = yearID, y = PointsB))

# to remove NA and Below 0 AVGs
BatTotalAvg <- BatTotal %>%
  filter(AVG != "NA" & AVG > 0)

ggplot(BatTotalAvg, aes(x = AVG, y = PointsB)) + 
  geom_point(aes(color = teamID)) + 
  scale_x_continuous("Batting AVG")+
  scale_y_continuous("Points") + 
  theme_bw() + labs(title="Batting Points by AVG Scatterplot") + facet_wrap( ~ lgID) +
  geom_smooth(method = "lm", aes(x = AVG, y = PointsB))

#avg and total Number of HR by year
SumYrHR <- BatTotal %>% 
  group_by(yearID) %>%
  filter(HR != "NA" & HR > 0) %>%
  summarise(HRsum = sum(HR), HRavg = mean(HR) ) %>%
  select(yearID, HRavg, HRsum)

#avg and total Number of SB by year
SumYrSB <- BatTotal %>% 
  group_by(yearID) %>%
  filter(SB != "NA" & SB > 0) %>%
  summarise(SBsum = sum(SB), SBavg = mean(SB) ) %>%
  select(yearID, SBavg, SBsum)

#avg and total Number of SO by year
SumYrSO <- BatTotal %>% 
  group_by(yearID) %>%
  filter(SO != "NA" & SO > 0) %>%
  summarise(SOsum = sum(SO), SOavg = mean(SO) ) %>%
  select(yearID, SOavg, SOsum)

#needs to remove lines that contain NA not replace with 0...

#avg of AVG by year
SumYrAVG <- BatTotal %>% 
  group_by(yearID) %>% 
  filter(AVG != "NA" & AVG > 0 & AVG != "Inf") %>%
  summarise(AVGavg = mean(AVG) ) %>%
  select(yearID, AVGavg)

# Combine SUm and Avg Datasets together
SumYr <- SumYrHR %>% 
  left_join(SumYrSB, by = c("yearID")) %>%  
  left_join(SumYrSO, by = c("yearID")) %>%  
  left_join(SumYrAVG, by = c("yearID")) 

ggplot(SumYr, aes(x = yearID, y = HRsum)) + 
  geom_point() + 
  scale_x_continuous("Year")+
  scale_y_continuous("Sum of HR") + 
  theme_bw() + labs(title="Sum of Homeruns by Year") +
  geom_smooth(method = "lm", aes(x = yearID, y = HRsum))

ggplot(SumYr, aes(x = yearID, y = HRavg)) + 
  geom_point() + 
  scale_x_continuous("Year")+
  scale_y_continuous("Avg Numberof HR") + 
  theme_bw() + labs(title="Avg Number of Homeruns by Year") +
  geom_smooth(method = "lm", aes(x = yearID, y = HRavg))

ggplot(SumYr, aes(x = yearID, y = SBsum)) + 
  geom_point() + 
  scale_x_continuous("Year")+
  scale_y_continuous("Sum of SB") + 
  theme_bw() + labs(title="Sum of Stolen Bases by Year") +
  geom_smooth(method = "lm", aes(x = yearID, y = SBsum))

ggplot(SumYr, aes(x = yearID, y = SBavg)) + 
  geom_point() + 
  scale_x_continuous("Year")+
  scale_y_continuous("Avg Numberof SB") + 
  theme_bw() + labs(title="Avg Number of Stolen Bases by Year") +
  geom_smooth(method = "lm", aes(x = yearID, y = SBavg))

ggplot(SumYr, aes(x = yearID, y = SOsum)) + 
  geom_point() + 
  scale_x_continuous("Year")+
  scale_y_continuous("Sum of SO") + 
  theme_bw() + labs(title="Sum of Strike Outs by Year") +
  geom_smooth(method = "lm", aes(x = yearID, y = SOsum))
## Warning: Removed 4 rows containing non-finite values (stat_smooth).
## Warning: Removed 4 rows containing missing values (geom_point).

ggplot(SumYr, aes(x = yearID, y = SOavg)) + 
  geom_point() + 
  scale_x_continuous("Year")+
  scale_y_continuous("Avg Numberof SO") + 
  theme_bw() + labs(title="Avg Number of Strike Outs by Year") +
  geom_smooth(method = "lm", aes(x = yearID, y = SOavg))
## Warning: Removed 4 rows containing non-finite values (stat_smooth).

## Warning: Removed 4 rows containing missing values (geom_point).

ggplot(SumYr, aes(x = yearID, y = AVGavg)) + 
  geom_point() + 
  scale_x_continuous("Year")+
  scale_y_continuous("League AVG") + 
  theme_bw() + labs(title="Average player's Average by Year") +
  geom_smooth(method = "lm", aes(x = yearID, y = AVGavg))

ggplot(SumYr, aes(x = HRsum, y = SOsum)) + 
  geom_point() + 
  scale_x_continuous("Sum of HR")+
  scale_y_continuous("Sum of SO") + 
  theme_bw() + labs(title="Sum of Homeruns vs Sum of Strike Outs by Year") +
  geom_smooth(method = "lm", aes(x = HRsum, y = SOsum))
## Warning: Removed 4 rows containing non-finite values (stat_smooth).

## Warning: Removed 4 rows containing missing values (geom_point).

Pitching Point Exploration

 

Exploring the Pitching Points and trends connected exclusively with the Procrastination League Characteristics.

 

ggplot(data = PitTotal, aes(x = PointsP), colour = barlines, fill = barfill) +
  geom_histogram(aes(fill = ..count..), bins = 40) +
  labs(title = "Pitching Points") +
  scale_fill_gradient("Count", low = "Orange", high = "Blue")

ggplot(PitTotal, aes(x = yearID, y = PointsP)) + 
  geom_point(aes(color = teamID)) + 
  scale_x_continuous("Year")+
  scale_y_continuous("Points") + 
  theme_bw() + labs(title="Pitching Points by Year Scatterplot") + facet_wrap( ~ lgID) +
  geom_smooth(method = "lm", aes(x = yearID, y = PointsP))

# to remove NA and below 0
PitTotalEra <- PitTotal %>%
  filter(ERA != "NA" & ERA > 0)

ggplot(PitTotalEra, aes(x = ERA, y = PointsP)) + 
  geom_point(aes(color = teamID)) + 
  scale_x_continuous("ERA")+
  scale_y_continuous("Points") + 
  theme_bw() + labs(title="Pitching Points by ERA Scatterplot") + facet_wrap( ~ lgID) +
  coord_cartesian(ylim=c(-10,600))

#avg and total Number of GS by year
SumYrGS <- PitTotal %>% 
  group_by(yearID) %>%
  filter(GS != "NA" & GS > 0) %>%
  summarise(GSsum = sum(GS), GSavg = mean(GS) ) %>%
  select(yearID, GSavg, GSsum)

#avg and total Number of CG by year
SumYrCG <- PitTotal %>% 
  group_by(yearID) %>%
  filter(CG != "NA" & CG > 0) %>%
  summarise(CGsum = sum(CG), CGavg = mean(CG) ) %>%
  select(yearID, CGavg, CGsum)

#avg and total Number of IPouts by year
SumYrIP <- PitTotal %>% 
  group_by(yearID) %>%
  filter(IPouts != "NA" & IPouts > 0) %>%
  summarise(IPsum = sum(IPouts/3), IPavg = mean(IPouts/3) ) %>%
  select(yearID, IPavg, IPsum)

#avg and total Number of K by year
SumYrK <- PitTotal %>% 
  group_by(yearID) %>%
  filter(SO != "NA" & SO > 0) %>%
  summarise(Ksum = sum(SO), Kavg = mean(SO) ) %>%
  select(yearID, Kavg, Ksum)

#needs to remove lines that contain NA not replace with 0...

#avg of ERA by year
SumYrERA <- PitTotal %>% 
  group_by(yearID) %>% 
  filter(ERA != "NA" & ERA > 0 & ERA != "Inf") %>%
  summarise(ERAavg = mean(ERA) ) %>%
  select(yearID, ERAavg)

# Combine SUm and Avg Datasets together
SumYrP <- SumYrGS %>% 
  left_join(SumYrCG, by = c("yearID")) %>%  
  left_join(SumYrIP, by = c("yearID")) %>%  
  left_join(SumYrK, by = c("yearID")) %>%  
  left_join(SumYrERA, by = c("yearID")) 

ggplot(SumYrP, aes(x = yearID, y = GSsum)) + 
  geom_point() + 
  scale_x_continuous("Year")+
  scale_y_continuous("Sum of Games Started") + 
  theme_bw() + labs(title="Sum of Games Started by Year") +
  geom_smooth(method = "lm", aes(x = yearID, y = GSsum))

ggplot(SumYrP, aes(x = yearID, y = GSavg)) + 
  geom_point() + 
  scale_x_continuous("Year")+
  scale_y_continuous("Avg Number of GS") + 
  theme_bw() + labs(title="Avg Number of Games Started by Year") +
  geom_smooth(method = "lm", aes(x = yearID, y = GSavg))

ggplot(SumYrP, aes(x = yearID, y = CGsum)) + 
  geom_point() + 
  scale_x_continuous("Year")+
  scale_y_continuous("Sum of CG") + 
  theme_bw() + labs(title="Sum of Complete Games by Year") +
  geom_smooth(method = "lm", aes(x = yearID, y = CGsum))

ggplot(SumYrP, aes(x = yearID, y = CGavg)) + 
  geom_point() + 
  scale_x_continuous("Year")+
  scale_y_continuous("Avg Numberof CG") + 
  theme_bw() + labs(title="Avg Number of Complete Games by Year") +
  geom_smooth(method = "lm", aes(x = yearID, y = CGavg))

ggplot(SumYrP, aes(x = yearID, y = IPsum)) + 
  geom_point() + 
  scale_x_continuous("Year")+
  scale_y_continuous("Sum of Innings Pitched") + 
  theme_bw() + labs(title="Sum of Innings Pitched by Year") +
  geom_smooth(method = "lm", aes(x = yearID, y = IPsum))

ggplot(SumYrP, aes(x = yearID, y = IPavg)) + 
  geom_point() + 
  scale_x_continuous("Year")+
  scale_y_continuous("Avg Number of Innings Pitched") + 
  theme_bw() + labs(title="Avg Number of Innings Pitched by Year") +
  geom_smooth(method = "lm", aes(x = yearID, y = IPavg))

ggplot(SumYrP, aes(x = yearID, y = Ksum)) + 
  geom_point() + 
  scale_x_continuous("Year")+
  scale_y_continuous("Sum of Pitching Strike Outs Pitched") + 
  theme_bw() + labs(title="Sum of Pitching Strike Outs Pitched by Year") +
  geom_smooth(method = "lm", aes(x = yearID, y = Ksum))

ggplot(SumYrP, aes(x = yearID, y = Kavg)) + 
  geom_point() +
  scale_x_continuous("Year") +
  scale_y_continuous("Avg Number of Pitching Strike Outs Pitched") + 
  theme_bw() + labs(title="Avg Number of Pitching Strike Outs by Year") +
  geom_smooth(method = "lm", aes(x = yearID, y = Kavg))

ggplot(SumYrP, aes(x = yearID, y = ERAavg)) + 
  geom_point() + 
  scale_x_continuous("Year")+
  scale_y_continuous("League ERA") + 
  theme_bw() + labs(title="Average player's ERA by Year") +
  geom_smooth(method = "lm", aes(x = yearID, y = ERAavg))

ggplot(SumYrP, aes(x = GSsum, y = IPsum)) + 
  geom_point() + 
  scale_x_continuous("Sum of GS")+
  scale_y_continuous("Sum of Innings Pitched") + 
  theme_bw() + labs(title="Sum of Games Started vs Sum of Innings Pitched by Year") +
  geom_smooth(method = "lm", aes(x = GSsum, y = IPsum))

Total Point Exploration

 

Exploring the Total Points and trends connected exclusively with the Procrastination League Characteristics.

 

#avg and total Number of IPouts/3 (Innings) by year
SumYrPts <- PointsCombine %>% 
  group_by(yearID) %>%
  filter(Points != "NA" & Points > 0 & Points != "Inf") %>%
  summarise(Pointssum = sum(Points), Pointsavg = mean(Points) ) %>%
  select(yearID, Pointsavg, Pointssum)


#needs to remove lines that contain NA not replace with 0...

#avg of ERA by year
SumYrError <- PointsCombine %>% 
  group_by(yearID) %>%
  filter(E != "NA" & E > 0 & E != "Inf") %>%
  summarise(Errorsum = sum(E), Erroravg = mean(E) ) %>%
  select(yearID, Erroravg, Errorsum)

# Combine SUm and Avg Datasets together
SumYrPC <- SumYrPts %>% 
  left_join(SumYrError, by = c("yearID"))


ggplot(SumYrPC, aes(x = yearID, y = Pointsavg)) + 
  geom_point() + 
  scale_x_continuous("Year")+
  scale_y_continuous("League Points Avg") + 
  theme_bw() + labs(title="Average Points by Year") +
  geom_smooth(method = "lm", aes(x = yearID, y = Pointsavg))

ggplot(SumYrPC, aes(x = yearID, y = Pointssum)) + 
  geom_point() + 
  scale_x_continuous("Year")+
  scale_y_continuous("Sum of Points") + 
  theme_bw() + labs(title="Sum of Points by Year") +
  geom_smooth(method = "lm", aes(x = yearID, y = Pointssum))

ggplot(SumYrPC, aes(x = yearID, y = Erroravg)) + 
  geom_point() + 
  scale_x_continuous("Year")+
  scale_y_continuous("League Error's Average") + 
  theme_bw() + labs(title="League's Error Average by Year") +
  geom_smooth(method = "lm", aes(x = yearID, y = Erroravg))

ggplot(SumYrPC, aes(x = yearID, y = Errorsum)) + 
  geom_point() + 
  scale_x_continuous("Year")+
  scale_y_continuous("Sum of Errors") + 
  theme_bw() + labs(title="Sum of Errors by Year") +
  geom_smooth(method = "lm", aes(x = yearID, y = Errorsum))

Point Correlation

 

Exploring what Characteristics have a greater correlation to a players total point accumulations during a year.

 

We will explore the impact on player Batting Avg, OPS, Strikeout Rate, etc.. And its projected correlation to a players point totals for a year. In order to speed up this process and focus on players who are the major contributors I narrowed the players down to the top 30% point totals of all time. This should still give an accurate reading of the characteristics that are best exhibited by the elite players throughout history.


A positive correlation means that a higher number in that category will have a positive impact on that variable. And of course a negative value will indicate a negative relationship. Also, The closer to 1.0 a number is (of the darker blue the ellipse appears) indicates how strong of a relationship that characteristic has. This creates a matrix that allows you to compare all the supplied variables against each other to see which has a greatest impact on each other. For our purposes, the Points Column/row will help to indicate which player features are most likely to result in positive growth or exceeding expectations.

PointsCombineFac <- PointsCombine 

PointsCombineFac <- PointsCombineFac %>%
  select("yearID", "lgID", "Points", "SLG", "OBP", "OPS", "ISO", "AVG", "ERA", "WHIP", "FIP") %>%
  mutate(Points = round(Points, 0)) %>%
  mutate(OPS = round((OBP * 1000), 0)) %>%
  mutate(AVG = round((SLG * 1000), 0)) %>%
  mutate(OPS = round((OPS * 1000), 0)) %>%
  mutate(ISO = round((ISO * 1000), 0)) %>%
  mutate(AVG = round((AVG * 1000), 0)) %>%
  mutate(ERA = round((ERA * 100), 0)) %>%
  mutate(OPS = round((WHIP * 1000), 0)) %>%
  mutate(AVG = round((FIP * 1000), 0)) %>%
  replace(., is.na(.), 0) %>%
  filter(ERA != "Inf", OPS != "Inf", AVG != "Inf", ISO != "Inf", WHIP != "Inf", FIP != "Inf", OBP != "Inf", SLG != "Inf") %>%
  filter(Points > quantile(Points, .7)) # Selects only the top 30% of Point totals
## Warning in `[<-.factor`(`*tmp*`, thisvar, value = 0): invalid factor level,
## NA generated
summary(PointsCombineFac)
##      yearID          lgID           Points           SLG       
##  Min.   :1906   NL     :22885   Min.   : 94.0   Min.   :  0.0  
##  1st Qu.:1945   AL     :22637   1st Qu.:145.0   1st Qu.: 86.0  
##  Median :1977   AA     :    0   Median :221.0   Median :135.0  
##  Mean   :1971   FL     :    0   Mean   :253.1   Mean   :145.1  
##  3rd Qu.:1998   NA     :    0   3rd Qu.:333.0   3rd Qu.:198.0  
##  Max.   :2016   (Other):    0   Max.   :971.0   Max.   :457.0  
##                 NA's   :  103                                  
##       OBP              OPS             ISO              AVG         
##  Min.   :0.0000   Min.   :    0   Min.   :     0   Min.   :-2900.0  
##  1st Qu.:0.3040   1st Qu.:    0   1st Qu.: 85758   1st Qu.:    0.0  
##  Median :0.3300   Median :    0   Median :134730   Median :    0.0  
##  Mean   :0.3245   Mean   :  102   Mean   :144792   Mean   :  315.1  
##  3rd Qu.:0.3580   3rd Qu.:    0   3rd Qu.:197682   3rd Qu.:    0.0  
##  Max.   :1.0000   Max.   :12000   Max.   :456622   Max.   :60100.0  
##                                                                     
##       ERA               WHIP             FIP         
##  Min.   :   0.00   Min.   : 0.000   Min.   :-2.9000  
##  1st Qu.:   0.00   1st Qu.: 0.000   1st Qu.: 0.0000  
##  Median :   0.00   Median : 0.000   Median : 0.0000  
##  Mean   :  26.24   Mean   : 0.102   Mean   : 0.3151  
##  3rd Qu.:   0.00   3rd Qu.: 0.000   3rd Qu.: 0.0000  
##  Max.   :6750.00   Max.   :12.000   Max.   :60.1000  
## 
PairCol <- c(3,1,4:11)

# creates scatterplots of all numeric variables
pairs(PointsCombineFac[,PairCol], pch=16, lower.panel=panel.smooth)

# Creates Correlation Matrix of all Numeric Variables
CorTable <- cor(PointsCombineFac[,PairCol])

# Do not modify the following code:
# displays selection on Markdown File
knitr::kable(round(CorTable,4), format = "markdown")
Points yearID SLG OBP OPS ISO AVG ERA WHIP FIP
Points 1.0000 -0.0494 0.9006 0.3773 -0.1169 0.9007 -0.1079 -0.0779 -0.1169 -0.1079
yearID -0.0494 1.0000 0.0536 -0.0690 -0.0459 0.0537 -0.0330 -0.0242 -0.0459 -0.0330
SLG 0.9006 0.0536 1.0000 0.5004 -0.3207 1.0000 -0.2819 -0.1884 -0.3207 -0.2819
OBP 0.3773 -0.0690 0.5004 1.0000 -0.4197 0.5000 -0.3643 -0.2473 -0.4197 -0.3643
OPS -0.1169 -0.0459 -0.3207 -0.4197 1.0000 -0.3206 0.8537 0.8153 1.0000 0.8537
ISO 0.9007 0.0537 1.0000 0.5000 -0.3206 1.0000 -0.2818 -0.1883 -0.3206 -0.2818
AVG -0.1079 -0.0330 -0.2819 -0.3643 0.8537 -0.2818 1.0000 0.7126 0.8537 1.0000
ERA -0.0779 -0.0242 -0.1884 -0.2473 0.8153 -0.1883 0.7126 1.0000 0.8153 0.7126
WHIP -0.1169 -0.0459 -0.3207 -0.4197 1.0000 -0.3206 0.8537 0.8153 1.0000 0.8537
FIP -0.1079 -0.0330 -0.2819 -0.3643 0.8537 -0.2818 1.0000 0.7126 0.8537 1.0000
# Visually displays the Correlation Matrix
## The narrower the width of the ellipse and darker the color the stronger the degree of correlation
## Blue indicates Positive Correlation
## Red Negative Correlation
corrplot(cor(PointsCombineFac[,PairCol]), method="ellipse", 
         addCoef.col = "black", # Add coefficient of correlation
         tl.col="black", tl.srt=45, number.cex = 0.7) #Text label color and rotation

Conculsion

 

Through this brief overview several trends and analysis information can be gained. From an overall trend, the number of strikeouts and Homeruns continues to increase while the number of Innings Pitched continues to fall. In relationship to the Procrastination Baseball Points perspective, the number of points per player continues to fall. This is likely due to the decrease in the amount of time players spend in the field, and the continued rise of Strikeouts in favor of the long ball. This increased focus on hitting homeruns at the sacrifice of increasing number of strikeouts has given rise to a continued decline of a player’s Batting Average. If the trend continues players that contribute consistent high point totals will continue to dwindle in favor of a team’s approach of spreading out the workload on more players. This is most obviously seen in Pitching where teams are more careful with player injuries.

 

#First Remove all the Objects from the Data Set Up
remove(BatTotal,PitTotal,FieldTotal,MasterTotal,BatTotalCol,PitTotalCol,PointsCombine,PointsCol,ColNA,EraLabel,EraBreaks,TeamCol,PlayerAVGEra,PlayerAVGYear,PlayerAVGpos,PlayerAVGteam,PlayerAVGlg)

#now remove all Objects from the Correlation Section
remove(CorTable,PairCol,PointsCombineFac)

#now remove all Objects from the Batting/Pitching Exploration Section Section
remove(SumYr,SumYrAVG,SumYrCG,SumYrERA,SumYrGS,SumYrHR,SumYrP,SumYrSB,SumYrSO,BatTotalAvg,PitTotalEra,SumYrError,SumYrIP,SumYrPC,SumYrPts)