R write-up
The dataset I analyzed contains information about a professional baseball team from 1871 to 2006, with 17 numeric columns and 2,076 observations. One column TEAM_BATTING_HBP, has a significant amount of missing data. I noticed that some columns had a large number of outliers, particularly TEAM_PITCHING_H which indicates hits allowed.The distribution of the predictor variable TARGET_WINS was normally distributed, while some of the other variables were skewed.The correlation matrix revealed high correlation between some variables such as TEAM_BATTING_HR and TEAM_PITCHING_HR but most of the matrix had missing data. After cleaning the data, I noticed high correlation between some predictors and thus avoided including them in the regression model. But most of the variables were not correlated with the predictor TARGET_WINS.
I deleted the columns TEAM_BATTING_HBP and TEAM_BASERUN_CS since the majority of their observations contained missing values.For the other columns with missing data, I used the MICE Package in R to impute the missing values. Specifically, I used a mix of predictive mean matching,classification and regression trees on TEAM_FIELDING_DP and TEAM_PITCHING_SO,since they had the most missing values after removing the other columns. I then removed the remaining variables and observations with negative or zero values since I wanted to perform a Box-Cox transformation on the data.
I created five linear regression models. The first model included all predictor variables against the response. Then, I used stepwise selection to remove insignificant predictors. Next, I applied the box-cox transformation and transformed the y variable to the power of 1.3536, which maximized the log-likelihood of the transformed data and improved the model slightly. The coefficients of the model had both positive and negative slopes. Since some predictors increase/decrease a team’s chance of winning. For instance, in my final model, TEAM_BATTING_H had a slope of 0.263 meaning that for every base hit by the batter, the win increased by 0.263. This outcome was expected as a hit by the batter can increase their chances of scoring and ultimately winning the game.
For my final model, I selected the model with the box-cox transformation. This model included all significant variables and had the lowest-root-mean-squared error (RMSE) compared to the other models,with a score of 12.43907.The diagnostic checks for this model showed that all assumptions were met, as the residuals were clearly scattered with no distinct patterns in the plot, and the QQ plot was normal. Additionally the F-statictics for the model was 158 and the adjusted R-squared value was 0.33.
The equation of the model is:
Y^.13536 = 83.63 + TEAM_BATTING_H * 0.263 + TEAM_BATTING_HR * 0.49 +
0.0709 * TEAM_BATTING_BB + TEAM_BATTING_SO * (-0.09) + TEAM_BASERUN_SB * 0.293 + TEAM_FIELDING_E * (-0.188) + TEAM_FIELDING_DP + (-0.693)
Using the model for my predictions I had to apply the inverse box-cox transformation in order to get the actual predicted value for the TARGET_WINS so that I can better interpret the values. I.e (Y^(1/.13536).
Here were some websites that helped me with my analysis and the data imputation:
Wu, Songhao. “Multi-Collinearity in Regression.” Medium, Towards Data Science, 5 June 2021, https://towardsdatascience.com/multi-collinearity-in-regression-fe7a2c1467ea.
“Imputation in R: Top 3 Ways for Imputing Missing Data.” Machine Learning, R Programming, 8 Oct. 2021, https://appsilon.com/imputation-in-r/.
Here is my R code stored as an appendix:
The training dataset contains seventeen columns and two thousand seventy six observations about a professional baseball team throughout the years of 1871 to 2006
## Step 1 call in your libraries and import the data from csv and read it into R
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ dplyr 1.1.0
## ✔ tidyr 1.3.0 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 1.0.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(reshape2)
##
## Attaching package: 'reshape2'
##
## The following object is masked from 'package:tidyr':
##
## smiths
library(corrplot)
## corrplot 0.92 loaded
training <- read.csv('https://raw.githubusercontent.com/AldataSci/Baseball-Data/main/moneyball-training-data.csv')
Looking at the structure of the dataset we can see they are all integer columns and one of the columns TEAM_BATTING_HBP contains a lot of NA values for the head of the data..
str(training)
## 'data.frame': 2276 obs. of 17 variables:
## $ INDEX : int 1 2 3 4 5 6 7 8 11 12 ...
## $ TARGET_WINS : int 39 70 86 70 82 75 80 85 86 76 ...
## $ TEAM_BATTING_H : int 1445 1339 1377 1387 1297 1279 1244 1273 1391 1271 ...
## $ TEAM_BATTING_2B : int 194 219 232 209 186 200 179 171 197 213 ...
## $ TEAM_BATTING_3B : int 39 22 35 38 27 36 54 37 40 18 ...
## $ TEAM_BATTING_HR : int 13 190 137 96 102 92 122 115 114 96 ...
## $ TEAM_BATTING_BB : int 143 685 602 451 472 443 525 456 447 441 ...
## $ TEAM_BATTING_SO : int 842 1075 917 922 920 973 1062 1027 922 827 ...
## $ TEAM_BASERUN_SB : int NA 37 46 43 49 107 80 40 69 72 ...
## $ TEAM_BASERUN_CS : int NA 28 27 30 39 59 54 36 27 34 ...
## $ TEAM_BATTING_HBP: int NA NA NA NA NA NA NA NA NA NA ...
## $ TEAM_PITCHING_H : int 9364 1347 1377 1396 1297 1279 1244 1281 1391 1271 ...
## $ TEAM_PITCHING_HR: int 84 191 137 97 102 92 122 116 114 96 ...
## $ TEAM_PITCHING_BB: int 927 689 602 454 472 443 525 459 447 441 ...
## $ TEAM_PITCHING_SO: int 5456 1082 917 928 920 973 1062 1033 922 827 ...
## $ TEAM_FIELDING_E : int 1011 193 175 164 138 123 136 112 127 131 ...
## $ TEAM_FIELDING_DP: int NA 155 153 156 168 149 186 136 169 159 ...
A quick glance at the summary statistics of the column.
## OK one of the columns has over 2,085 missing values out of 2276 of its columns..
## TEAM_BATTING_HBP which is the column for Batters hit by pitch (may have to remove this column..)
summary(training)
## INDEX TARGET_WINS TEAM_BATTING_H TEAM_BATTING_2B
## Min. : 1.0 Min. : 0.00 Min. : 891 Min. : 69.0
## 1st Qu.: 630.8 1st Qu.: 71.00 1st Qu.:1383 1st Qu.:208.0
## Median :1270.5 Median : 82.00 Median :1454 Median :238.0
## Mean :1268.5 Mean : 80.79 Mean :1469 Mean :241.2
## 3rd Qu.:1915.5 3rd Qu.: 92.00 3rd Qu.:1537 3rd Qu.:273.0
## Max. :2535.0 Max. :146.00 Max. :2554 Max. :458.0
##
## TEAM_BATTING_3B TEAM_BATTING_HR TEAM_BATTING_BB TEAM_BATTING_SO
## Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.0
## 1st Qu.: 34.00 1st Qu.: 42.00 1st Qu.:451.0 1st Qu.: 548.0
## Median : 47.00 Median :102.00 Median :512.0 Median : 750.0
## Mean : 55.25 Mean : 99.61 Mean :501.6 Mean : 735.6
## 3rd Qu.: 72.00 3rd Qu.:147.00 3rd Qu.:580.0 3rd Qu.: 930.0
## Max. :223.00 Max. :264.00 Max. :878.0 Max. :1399.0
## NA's :102
## TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_H
## Min. : 0.0 Min. : 0.0 Min. :29.00 Min. : 1137
## 1st Qu.: 66.0 1st Qu.: 38.0 1st Qu.:50.50 1st Qu.: 1419
## Median :101.0 Median : 49.0 Median :58.00 Median : 1518
## Mean :124.8 Mean : 52.8 Mean :59.36 Mean : 1779
## 3rd Qu.:156.0 3rd Qu.: 62.0 3rd Qu.:67.00 3rd Qu.: 1682
## Max. :697.0 Max. :201.0 Max. :95.00 Max. :30132
## NA's :131 NA's :772 NA's :2085
## TEAM_PITCHING_HR TEAM_PITCHING_BB TEAM_PITCHING_SO TEAM_FIELDING_E
## Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 65.0
## 1st Qu.: 50.0 1st Qu.: 476.0 1st Qu.: 615.0 1st Qu.: 127.0
## Median :107.0 Median : 536.5 Median : 813.5 Median : 159.0
## Mean :105.7 Mean : 553.0 Mean : 817.7 Mean : 246.5
## 3rd Qu.:150.0 3rd Qu.: 611.0 3rd Qu.: 968.0 3rd Qu.: 249.2
## Max. :343.0 Max. :3645.0 Max. :19278.0 Max. :1898.0
## NA's :102
## TEAM_FIELDING_DP
## Min. : 52.0
## 1st Qu.:131.0
## Median :149.0
## Mean :146.4
## 3rd Qu.:164.0
## Max. :228.0
## NA's :286
We can see that HBP contains 2085 missing values followed by TEAM_BASERUN_CS so I may have to omit those columns from the dataset.
## Easier to see all the missing values
sapply(training,function(x) sum(is.na(x)))
## INDEX TARGET_WINS TEAM_BATTING_H TEAM_BATTING_2B
## 0 0 0 0
## TEAM_BATTING_3B TEAM_BATTING_HR TEAM_BATTING_BB TEAM_BATTING_SO
## 0 0 0 102
## TEAM_BASERUN_SB TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_H
## 131 772 2085 0
## TEAM_PITCHING_HR TEAM_PITCHING_BB TEAM_PITCHING_SO TEAM_FIELDING_E
## 0 0 102 0
## TEAM_FIELDING_DP
## 286
From the boxplot the column of TEAM_PITCHING_H has a lot of outliers, I may consider removing this column from the model in order to not sway it.
## Let's try the ggplot method and melt-method..
data_long <- melt(training)
## No id variables; using all as measure variables
##plot boxplot with ggplot.. ## there are a lot of outliers in TEAM_PITCHING_H
gg <- ggplot(data_long,aes(x=variable,y=value,fill = "red")) + geom_boxplot() + coord_flip() + xlab("Columns")
gg
gg + coord_cartesian(ylim = c(0,2000)) + theme(axis.text.x = element_text(angle = 45, hjust = 1))
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
data_gathered <- training %>%
gather(variable,value)
The histograms have various distribution but the predictor variable TARGET_WINS is normally distributed but some of the others are skewed like TEAM_FIELDING_E and etc.
## each panel can have its own scale when we use scale = "Free"
histograms <- ggplot(data_gathered,aes(x=value)) + geom_histogram() +
facet_wrap(~variable,scale="free")
histograms
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The correlation matrix shows a lot of question marks which shows missing data in the columns,
## Let's create a correlation matrix with our data..
sum(is.na(training))
## [1] 3478
## there are a lot of missing data in these columns... i'm gonna have to remove some of those columns..
corrplot(cor(training))
I’ve removed the columns of HBP and CS since they contained a lot of missing values
## Cleaning the data and imputating some of the data.. i'm going to remove columns TEAM_BATTING_HBP and TEAM_BASERUN_CS since they have a lot of missing data and I will imputate the rest of the data with columns.. those 2 columns are basically batters caught stealing and batters hit by pitch which rarely happened in those cases...
Training <- training %>%
dplyr::select(-c(TEAM_BATTING_HBP,TEAM_BASERUN_CS))
sapply(Training,function(x) sum(is.na(x)))
## INDEX TARGET_WINS TEAM_BATTING_H TEAM_BATTING_2B
## 0 0 0 0
## TEAM_BATTING_3B TEAM_BATTING_HR TEAM_BATTING_BB TEAM_BATTING_SO
## 0 0 0 102
## TEAM_BASERUN_SB TEAM_PITCHING_H TEAM_PITCHING_HR TEAM_PITCHING_BB
## 131 0 0 0
## TEAM_PITCHING_SO TEAM_FIELDING_E TEAM_FIELDING_DP
## 102 0 286
I am going to try imputing the missing values with the MICE package and I will use predictive mean matching, cart: Classification and regression trees and lasso linear regression and for each I will see which imputation method closely resembles the distribution of the normal data and choose that method to impute the missing values.
## Now I will imputate the data with the mice package..
library(mice)
##
## Attaching package: 'mice'
## The following object is masked from 'package:stats':
##
## filter
## The following objects are masked from 'package:base':
##
## cbind, rbind
mice_imputed <- data.frame(
original = Training$TEAM_FIELDING_DP,
imp_pmm = complete(mice(Training,method ="pmm"))$TEAM_FIELDING_DP,
imp_cart = complete(mice(Training,method ="cart"))$TEAM_FIELDING_DP,
imp_lasso = complete(mice(Training,method ="lasso.norm"))$TEAM_FIELDING_DP
)
##
## iter imp variable
## 1 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
##
## iter imp variable
## 1 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
##
## iter imp variable
## 1 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
head(mice_imputed)
I am going to compare the distribution of the original and then figure which distribution resembles the original.
## compare the distribution between each imputation and see which one resembles the original the most..
## I think the imp_cart looks smiliar to the original histogram so I will use those values.
par(mfrow=c(2,2))
hist(mice_imputed$original)
hist(mice_imputed$imp_pmm)
hist(mice_imputed$imp_cart)
hist(mice_imputed$imp_lasso)
## replace the values with the imputed values..
Training$TEAM_FIELDING_DP <- mice_imputed$imp_cart
## now I will imputate the rest of the columns with the same method..
sapply(Training,function(x) sum(is.na(x)))
## INDEX TARGET_WINS TEAM_BATTING_H TEAM_BATTING_2B
## 0 0 0 0
## TEAM_BATTING_3B TEAM_BATTING_HR TEAM_BATTING_BB TEAM_BATTING_SO
## 0 0 0 102
## TEAM_BASERUN_SB TEAM_PITCHING_H TEAM_PITCHING_HR TEAM_PITCHING_BB
## 131 0 0 0
## TEAM_PITCHING_SO TEAM_FIELDING_E TEAM_FIELDING_DP
## 102 0 0
## i will imputate the TEAM_BASERUN_SB which is stolen bases..
mice_imputed2 <- data.frame(
original = Training$TEAM_BASERUN_SB,
imp_pmm = complete(mice(Training,method ="pmm"))$TEAM_BASERUN_SB,
imp_cart = complete(mice(Training,method ="cart"))$TEAM_BASERUN_SB,
imp_lasso = complete(mice(Training,method ="lasso.norm"))$TEAM_BASERUN_SB
)
##
## iter imp variable
## 1 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 1 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 1 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 1 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 1 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
##
## iter imp variable
## 1 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 1 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 1 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 1 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 1 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
##
## iter imp variable
## 1 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 1 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 1 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 1 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 1 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
head(mice_imputed2)
## I will impute that value with imp_cart since they resemble the original histogram..
par(mfrow=c(2,2))
hist(mice_imputed2$original)
hist(mice_imputed2$imp_pmm)
hist(mice_imputed2$imp_cart)
hist(mice_imputed2$imp_lasso)
## imputate BASERUN_SB with this value since the distributions looks smiliar
Training$TEAM_BASERUN_SB <- mice_imputed2$imp_pmm
## looking at the empty values again I think i should be fine with it this time..
sapply(Training,function(x) sum(is.na(x)))
## INDEX TARGET_WINS TEAM_BATTING_H TEAM_BATTING_2B
## 0 0 0 0
## TEAM_BATTING_3B TEAM_BATTING_HR TEAM_BATTING_BB TEAM_BATTING_SO
## 0 0 0 102
## TEAM_BASERUN_SB TEAM_PITCHING_H TEAM_PITCHING_HR TEAM_PITCHING_BB
## 0 0 0 0
## TEAM_PITCHING_SO TEAM_FIELDING_E TEAM_FIELDING_DP
## 102 0 0
## now I want to look at the correlation matrix again and see if I can gleam any valuable information..
Training <- na.omit(Training)
corrplot(cor(Training),method = "color")
## I am going to split the training data set into training and testing datasets...
## 70% in Training and 30% in Testing..
library(caret)
## Loading required package: lattice
##
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
##
## lift
set.seed(123)
index <- createDataPartition(Training$TARGET_WINS,p=0.7,list = FALSE)
Ttraining <- Training[index,]
Ttest <- Training[-index,]
## It went up only a little bit.. but that's fine..
mod1 <- lm(TARGET_WINS ~ .-INDEX,data=Ttraining)
summary(mod1)
##
## Call:
## lm(formula = TARGET_WINS ~ . - INDEX, data = Ttraining)
##
## Residuals:
## Min 1Q Median 3Q Max
## -57.598 -8.275 -0.002 8.180 65.562
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.2154867 6.2560068 5.309 1.26e-07 ***
## TEAM_BATTING_H 0.0442560 0.0042495 10.414 < 2e-16 ***
## TEAM_BATTING_2B -0.0276487 0.0109659 -2.521 0.01179 *
## TEAM_BATTING_3B 0.0611680 0.0197807 3.092 0.00202 **
## TEAM_BATTING_HR 0.0642616 0.0300527 2.138 0.03265 *
## TEAM_BATTING_BB 0.0108344 0.0065623 1.651 0.09895 .
## TEAM_BATTING_SO -0.0150882 0.0029569 -5.103 3.77e-07 ***
## TEAM_BASERUN_SB 0.0412726 0.0048644 8.485 < 2e-16 ***
## TEAM_PITCHING_H -0.0001039 0.0004585 -0.227 0.82071
## TEAM_PITCHING_HR 0.0270565 0.0262724 1.030 0.30325
## TEAM_PITCHING_BB -0.0014802 0.0045355 -0.326 0.74419
## TEAM_PITCHING_SO 0.0033969 0.0010182 3.336 0.00087 ***
## TEAM_FIELDING_E -0.0348440 0.0031879 -10.930 < 2e-16 ***
## TEAM_FIELDING_DP -0.1184901 0.0158941 -7.455 1.51e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.59 on 1510 degrees of freedom
## Multiple R-squared: 0.3657, Adjusted R-squared: 0.3602
## F-statistic: 66.96 on 13 and 1510 DF, p-value: < 2.2e-16
## I will get rid of the not so signficant variables so TEAM_PITCHING_HR and TEAM_PITCHING_BB and the R squared has gone up a few values.. since they are signficant I will look at the diagnostics..
mod2 <- lm(TARGET_WINS ~ .-INDEX-TEAM_PITCHING_H-TEAM_PITCHING_HR-TEAM_PITCHING_BB,data=Ttraining)
summary(mod2)
##
## Call:
## lm(formula = TARGET_WINS ~ . - INDEX - TEAM_PITCHING_H - TEAM_PITCHING_HR -
## TEAM_PITCHING_BB, data = Ttraining)
##
## Residuals:
## Min 1Q Median 3Q Max
## -57.450 -8.196 -0.005 8.102 65.939
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 32.2438320 6.1261755 5.263 1.62e-07 ***
## TEAM_BATTING_H 0.0446899 0.0041885 10.670 < 2e-16 ***
## TEAM_BATTING_2B -0.0282364 0.0109156 -2.587 0.009780 **
## TEAM_BATTING_3B 0.0646486 0.0192677 3.355 0.000812 ***
## TEAM_BATTING_HR 0.0927076 0.0112779 8.220 4.32e-16 ***
## TEAM_BATTING_BB 0.0090047 0.0037847 2.379 0.017471 *
## TEAM_BATTING_SO -0.0146237 0.0028015 -5.220 2.04e-07 ***
## TEAM_BASERUN_SB 0.0414658 0.0046268 8.962 < 2e-16 ***
## TEAM_PITCHING_SO 0.0030944 0.0005988 5.168 2.68e-07 ***
## TEAM_FIELDING_E -0.0350488 0.0025508 -13.740 < 2e-16 ***
## TEAM_FIELDING_DP -0.1173787 0.0158475 -7.407 2.14e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.58 on 1513 degrees of freedom
## Multiple R-squared: 0.3652, Adjusted R-squared: 0.361
## F-statistic: 87.04 on 10 and 1513 DF, p-value: < 2.2e-16
plot(fitted(mod2),residuals(mod2),xlab="Fitted",ylab="Residuals")
## attempt a box-cox transformation..
Ttraining <- Ttraining %>%
filter(TARGET_WINS != 0)
Ttest <- Ttest %>%
filter(TARGET_WINS != 0)
library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
set.seed(123)
bcox <-boxcox(mod2,plotit = T)
val <- cbind(bcox$x,bcox$y)
## sort the values in ascending-order.. our lambda value is 1.1919 that maxmizes the log-likelihood of the transformed data
head(val[order(-bcox$y),])
## [,1] [,2]
## [1,] 1.353535 -2769.905
## [2,] 1.393939 -2769.937
## [3,] 1.313131 -2770.073
## [4,] 1.434343 -2770.166
## [5,] 1.272727 -2770.447
## [6,] 1.474747 -2770.588
## Let use the lambda value on our model to see if it improves the model even if its a little bit.
bmod3 <- lm(TARGET_WINS ^(1.3536) ~ .-INDEX-TEAM_PITCHING_H-TEAM_PITCHING_HR-TEAM_PITCHING_BB,data=Ttraining)
summary(bmod3)
##
## Call:
## lm(formula = TARGET_WINS^(1.3536) ~ . - INDEX - TEAM_PITCHING_H -
## TEAM_PITCHING_HR - TEAM_PITCHING_BB, data = Ttraining)
##
## Residuals:
## Min 1Q Median 3Q Max
## -312.67 -53.02 -1.40 51.29 444.89
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 79.192463 39.101153 2.025 0.043010 *
## TEAM_BATTING_H 0.282125 0.026966 10.462 < 2e-16 ***
## TEAM_BATTING_2B -0.181950 0.069413 -2.621 0.008848 **
## TEAM_BATTING_3B 0.403303 0.121614 3.316 0.000934 ***
## TEAM_BATTING_HR 0.599837 0.071339 8.408 < 2e-16 ***
## TEAM_BATTING_BB 0.059411 0.023887 2.487 0.012984 *
## TEAM_BATTING_SO -0.095918 0.017734 -5.409 7.37e-08 ***
## TEAM_BASERUN_SB 0.250584 0.029311 8.549 < 2e-16 ***
## TEAM_PITCHING_SO 0.019955 0.003795 5.258 1.66e-07 ***
## TEAM_FIELDING_E -0.204932 0.016580 -12.360 < 2e-16 ***
## TEAM_FIELDING_DP -0.756543 0.099965 -7.568 6.55e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 79.35 on 1512 degrees of freedom
## Multiple R-squared: 0.3456, Adjusted R-squared: 0.3413
## F-statistic: 79.87 on 10 and 1512 DF, p-value: < 2.2e-16
## it looks a bit better
plot(fitted(mod2),residuals(mod2),xlab="Fitted",ylab="Residuals")
plot(fitted(bmod3),residuals(bmod3),xlab="Fitted",ylab="Residuals")
## This looks good I think, I removed the other least signficant variables..
bmod4 <- lm(TARGET_WINS ^(1.3536) ~ .-INDEX-TEAM_PITCHING_H-TEAM_PITCHING_HR-TEAM_PITCHING_BB-TEAM_BATTING_3B,data=Training)
summary(bmod4)
##
## Call:
## lm(formula = TARGET_WINS^(1.3536) ~ . - INDEX - TEAM_PITCHING_H -
## TEAM_PITCHING_HR - TEAM_PITCHING_BB - TEAM_BATTING_3B, data = Training)
##
## Residuals:
## Min 1Q Median 3Q Max
## -317.89 -53.97 0.03 51.63 428.18
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 65.983524 33.139593 1.991 0.046598 *
## TEAM_BATTING_H 0.298351 0.021164 14.097 < 2e-16 ***
## TEAM_BATTING_2B -0.132452 0.056678 -2.337 0.019534 *
## TEAM_BATTING_HR 0.484318 0.056101 8.633 < 2e-16 ***
## TEAM_BATTING_BB 0.076188 0.019604 3.886 0.000105 ***
## TEAM_BATTING_SO -0.097033 0.015011 -6.464 1.25e-10 ***
## TEAM_BASERUN_SB 0.266067 0.023915 11.125 < 2e-16 ***
## TEAM_PITCHING_SO 0.018152 0.003661 4.959 7.65e-07 ***
## TEAM_FIELDING_E -0.193911 0.013677 -14.178 < 2e-16 ***
## TEAM_FIELDING_DP -0.750365 0.084105 -8.922 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 79.71 on 2164 degrees of freedom
## Multiple R-squared: 0.3362, Adjusted R-squared: 0.3334
## F-statistic: 121.8 on 9 and 2164 DF, p-value: < 2.2e-16
## Here I removed the least signficant variables and I'm curious now..
bmod5 <- lm(TARGET_WINS ^(1.3536) ~ .-INDEX-TEAM_PITCHING_H-TEAM_PITCHING_HR-TEAM_PITCHING_BB-TEAM_BATTING_3B-TEAM_BATTING_2B-TEAM_PITCHING_SO,data=Training)
summary(bmod5)
##
## Call:
## lm(formula = TARGET_WINS^(1.3536) ~ . - INDEX - TEAM_PITCHING_H -
## TEAM_PITCHING_HR - TEAM_PITCHING_BB - TEAM_BATTING_3B - TEAM_BATTING_2B -
## TEAM_PITCHING_SO, data = Training)
##
## Residuals:
## Min 1Q Median 3Q Max
## -314.47 -54.48 -0.52 51.95 413.36
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 83.89418 31.68828 2.647 0.008168 **
## TEAM_BATTING_H 0.26447 0.01575 16.796 < 2e-16 ***
## TEAM_BATTING_HR 0.45287 0.05604 8.081 1.06e-15 ***
## TEAM_BATTING_BB 0.07403 0.01970 3.758 0.000176 ***
## TEAM_BATTING_SO -0.07785 0.01349 -5.770 9.07e-09 ***
## TEAM_BASERUN_SB 0.25831 0.02374 10.881 < 2e-16 ***
## TEAM_FIELDING_E -0.17418 0.01322 -13.176 < 2e-16 ***
## TEAM_FIELDING_DP -0.74332 0.08451 -8.796 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 80.17 on 2166 degrees of freedom
## Multiple R-squared: 0.3278, Adjusted R-squared: 0.3257
## F-statistic: 150.9 on 7 and 2166 DF, p-value: < 2.2e-16
I think the model fits all the assumptions but with some outliers here and there in the cook’s distance chart.
par(mfrow=c(2,2))
plot(bmod5)
I have calculated the Root Mean Squared Error in this section and I’ve compared against the model I’ve found interesting. I choose bmod4 because it had the lowest rmse then the others.
## I will then use mod,mod2,bmod4 and compare each rmse
## import the caret library..
library(caret)
predictions_1 <- predict(mod1,Ttest)
head(predictions_1)
## 1 2 3 4 5 6
## 62.93778 75.29586 67.27069 66.65399 69.42688 86.56737
rmse <- RMSE(predictions_1,Ttest$TARGET_WINS)
rmse
## [1] 12.57593
## create the next predictions with mod4
predictions_2 <- predict(mod2,Ttest)
head(predictions_2)
## 1 2 3 4 5 6
## 61.46682 75.30854 67.27508 66.78664 69.27285 86.69152
rmse2 <- RMSE(predictions_2,Ttest$TARGET_WINS)
rmse2
## [1] 12.57123
## make sure to inverse the box-cox transformation
predictions_3 <- predict(bmod4,Ttest)
## make sure to inverse the box-cox transformation
inv_box_pred <- predictions_3 ^(1/1.3536)
rmse3 <- RMSE(inv_box_pred,Ttest$TARGET_WINS)
head(inv_box_pred)
## 1 2 3 4 5 6
## 64.98320 76.10651 67.83114 64.94323 71.38245 87.60343
rmse3
## [1] 12.51824
predictions_4 <- predict(bmod5,Ttest)
## make sure to inverse the box-cox transformation
inv_box_pred2 <- predictions_4 ^(1/1.3536)
rmse4 <- RMSE(inv_box_pred2,Ttest$TARGET_WINS)
head(inv_box_pred)
## 1 2 3 4 5 6
## 64.98320 76.10651 67.83114 64.94323 71.38245 87.60343
rmse4
## [1] 12.50728
I went to clean the testing dataset in a manner smiliar to the way I have cleaned the training dataset in which I deleted the empty columns and imputate some others and omitted the rest.
## Will predict values with mod4,mod5,and mod6..
Test <- read.csv("https://raw.githubusercontent.com/AldataSci/Baseball-Data/main/moneyball-evaluation-data.csv")
## before I do that I have to clean the test data for the linear regression model.. I will clean it in a manner that will resemble the training set
str(Test)
## 'data.frame': 259 obs. of 16 variables:
## $ INDEX : int 9 10 14 47 60 63 74 83 98 120 ...
## $ TEAM_BATTING_H : int 1209 1221 1395 1539 1445 1431 1430 1385 1259 1397 ...
## $ TEAM_BATTING_2B : int 170 151 183 309 203 236 219 158 177 212 ...
## $ TEAM_BATTING_3B : int 33 29 29 29 68 53 55 42 78 42 ...
## $ TEAM_BATTING_HR : int 83 88 93 159 5 10 37 33 23 58 ...
## $ TEAM_BATTING_BB : int 447 516 509 486 95 215 568 356 466 452 ...
## $ TEAM_BATTING_SO : int 1080 929 816 914 416 377 527 609 689 584 ...
## $ TEAM_BASERUN_SB : int 62 54 59 148 NA NA 365 185 150 52 ...
## $ TEAM_BASERUN_CS : int 50 39 47 57 NA NA NA NA NA NA ...
## $ TEAM_BATTING_HBP: int NA NA NA 42 NA NA NA NA NA NA ...
## $ TEAM_PITCHING_H : int 1209 1221 1395 1539 3902 2793 1544 1626 1342 1489 ...
## $ TEAM_PITCHING_HR: int 83 88 93 159 14 20 40 39 25 62 ...
## $ TEAM_PITCHING_BB: int 447 516 509 486 257 420 613 418 497 482 ...
## $ TEAM_PITCHING_SO: int 1080 929 816 914 1123 736 569 715 734 622 ...
## $ TEAM_FIELDING_E : int 140 135 156 124 616 572 490 328 226 184 ...
## $ TEAM_FIELDING_DP: int 156 164 153 154 130 105 NA 104 132 145 ...
## remove the HBP column again and imputate the
sapply(Test,function(x) sum(is.na(x)))
## INDEX TEAM_BATTING_H TEAM_BATTING_2B TEAM_BATTING_3B
## 0 0 0 0
## TEAM_BATTING_HR TEAM_BATTING_BB TEAM_BATTING_SO TEAM_BASERUN_SB
## 0 0 18 13
## TEAM_BASERUN_CS TEAM_BATTING_HBP TEAM_PITCHING_H TEAM_PITCHING_HR
## 87 240 0 0
## TEAM_PITCHING_BB TEAM_PITCHING_SO TEAM_FIELDING_E TEAM_FIELDING_DP
## 0 18 0 31
## remove hbp and Cs
Test <- Test %>%
dplyr::select(-c(TEAM_BATTING_HBP,TEAM_BASERUN_CS))
sapply(Test,function(x) sum(is.na(x)))
## INDEX TEAM_BATTING_H TEAM_BATTING_2B TEAM_BATTING_3B
## 0 0 0 0
## TEAM_BATTING_HR TEAM_BATTING_BB TEAM_BATTING_SO TEAM_BASERUN_SB
## 0 0 18 13
## TEAM_PITCHING_H TEAM_PITCHING_HR TEAM_PITCHING_BB TEAM_PITCHING_SO
## 0 0 0 18
## TEAM_FIELDING_E TEAM_FIELDING_DP
## 0 31
## now we imputate..
library(mice)
mice_imputed3 <- data.frame(
original = Test$TEAM_FIELDING_DP,
imp_pmm = complete(mice(Test,method ="pmm"))$TEAM_FIELDING_DP,
imp_cart = complete(mice(Test,method ="cart"))$TEAM_FIELDING_DP,
imp_lasso = complete(mice(Test,method ="lasso.norm"))$TEAM_FIELDING_DP
)
##
## iter imp variable
## 1 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
##
## iter imp variable
## 1 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## Warning: Number of logged events: 13
##
## iter imp variable
## 1 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 1 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 2 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 3 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 4 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
## 5 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO TEAM_FIELDING_DP
head(mice_imputed3)
par(mfrow=c(2,2))
hist(mice_imputed3$original)
hist(mice_imputed3$imp_pmm)
hist(mice_imputed3$imp_cart)
hist(mice_imputed3$imp_lasso)
## Since the imp_cart looks smiliar to the original distribution I will use that then..
Test$TEAM_FIELDING_DP <- mice_imputed3$imp_cart
## now we imputate the next column.. which is BASERUN_SB
mice_imputed4 <- data.frame(
original = Test$TEAM_BASERUN_SB,
imp_pmm = complete(mice(Test,method ="pmm"))$TEAM_BASERUN_SB,
imp_cart = complete(mice(Test,method ="cart"))$TEAM_BASERUN_SB,
imp_lasso = complete(mice(Test,method ="lasso.norm"))$TEAM_BASERUN_SB
)
##
## iter imp variable
## 1 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 1 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 1 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 1 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 1 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
##
## iter imp variable
## 1 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 1 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 1 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 1 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 1 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
##
## iter imp variable
## 1 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 1 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 1 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 1 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 1 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 2 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 3 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 4 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 1 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 2 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 3 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 4 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
## 5 5 TEAM_BATTING_SO TEAM_BASERUN_SB TEAM_PITCHING_SO
head(mice_imputed4)
par(mfrow=c(2,2))
hist(mice_imputed4$original)
hist(mice_imputed4$imp_pmm)
hist(mice_imputed4$imp_cart)
hist(mice_imputed4$imp_lasso)
## I will use imp_pmm again and replace those columns with those imputated values..
Test$TEAM_BASERUN_SB <- mice_imputed4$imp_pmm
sapply(Test,function(x) sum(is.na(x)))
## INDEX TEAM_BATTING_H TEAM_BATTING_2B TEAM_BATTING_3B
## 0 0 0 0
## TEAM_BATTING_HR TEAM_BATTING_BB TEAM_BATTING_SO TEAM_BASERUN_SB
## 0 0 18 0
## TEAM_PITCHING_H TEAM_PITCHING_HR TEAM_PITCHING_BB TEAM_PITCHING_SO
## 0 0 0 18
## TEAM_FIELDING_E TEAM_FIELDING_DP
## 0 0
## Then I will remove some of the columns since I had imputated most of the columns..
Testt <- na.omit(Test)
sapply(Testt,function(x) sum(is.na(Testt)))
## INDEX TEAM_BATTING_H TEAM_BATTING_2B TEAM_BATTING_3B
## 0 0 0 0
## TEAM_BATTING_HR TEAM_BATTING_BB TEAM_BATTING_SO TEAM_BASERUN_SB
## 0 0 0 0
## TEAM_PITCHING_H TEAM_PITCHING_HR TEAM_PITCHING_BB TEAM_PITCHING_SO
## 0 0 0 0
## TEAM_FIELDING_E TEAM_FIELDING_DP
## 0 0
Finally, I used the model and I created predictions with the test dataset.
set.seed(123)
pred <- predict(bmod5,newdata=Testt)
## I have to revert the transformation back..
actual_predictions <- pred ^ (1/1.3536)
actual_predictions
## 1 2 3 4 5 6 7 8
## 61.84930 64.43498 74.71737 88.19905 71.15099 76.57790 85.16479 76.95295
## 9 10 11 12 13 14 15 16
## 69.17214 74.36038 70.32894 82.04584 80.99146 83.66965 86.00930 77.66413
## 17 18 20 21 22 23 24 25
## 74.94783 79.61878 91.46610 82.44225 85.31416 80.32776 73.51264 83.33830
## 26 27 28 29 30 31 32 33
## 88.81144 63.09041 75.91467 85.16738 77.44445 91.25169 85.65728 82.64912
## 34 35 36 37 38 39 40 41
## 84.42701 79.44160 87.46804 76.48140 88.93335 85.42981 91.18396 85.64544
## 42 43 44 45 46 47 48 49
## 91.76689 23.64848 100.51423 89.60579 92.80141 97.23755 77.01990 69.03631
## 50 51 52 53 54 55 56 57
## 80.01837 77.64821 86.72308 76.39701 73.34806 76.14399 78.68639 90.99818
## 58 61 62 63 64 65 66 67
## 75.93369 87.12887 73.28574 88.79488 86.77272 84.82156 101.04718 74.03182
## 68 70 71 72 73 74 75 76
## 79.35332 86.07246 82.53252 70.80341 77.58168 88.99942 81.46213 83.78643
## 77 78 81 82 83 84 85 86
## 81.71207 84.26288 87.11174 87.83810 96.43844 75.29771 84.43779 81.95886
## 87 88 89 90 91 92 93 97
## 83.73404 83.53460 89.73692 91.45050 81.27488 85.39572 74.67256 86.79716
## 98 99 100 101 102 103 104 105
## 99.80526 85.64957 85.61635 79.25594 75.83839 84.33216 84.14362 79.56430
## 106 107 108 109 110 111 112 113
## 75.79221 61.47419 78.44368 87.36180 59.25517 84.82429 86.60988 93.01689
## 114 115 116 117 118 119 120 121
## 91.15159 81.01480 79.54025 85.31510 81.82653 75.25450 79.85416 92.94476
## 125 126 127 128 129 130 131 132
## 67.46779 87.05470 89.59379 76.34769 92.82324 90.90268 86.60316 81.53925
## 133 134 135 136 137 138 139 140
## 81.64148 84.19961 86.66708 77.09248 73.83076 78.15329 88.35539 81.91686
## 141 143 144 145 146 147 148 149
## 65.21392 89.82173 73.32767 72.57789 72.25746 77.83180 79.70360 79.06384
## 150 151 152 153 154 155 156 157
## 83.91487 82.32363 81.53671 46.85312 69.74669 77.09588 70.44905 90.07318
## 158 159 161 162 163 164 165 166
## 78.88035 89.68833 100.24991 104.58684 93.14933 101.64941 96.43412 88.21536
## 167 168 169 170 172 173 174 175
## 80.37345 81.87318 73.94455 82.34513 87.99642 81.13663 93.90529 84.45892
## 176 177 178 179 180 181 182 183
## 73.66354 78.85652 70.54919 74.54723 79.60696 88.59108 88.99611 86.38980
## 184 185 186 187 188 189 190 193
## 85.76187 86.42747 93.43328 86.39272 55.91878 69.65853 112.76496 77.21880
## 194 195 196 197 198 199 200 201
## 78.42843 81.81643 69.99719 79.60943 84.32617 79.51418 83.04248 73.85928
## 202 203 204 205 206 207 208 209
## 78.57583 72.45275 89.65209 82.31471 83.37808 78.56086 78.62720 82.68337
## 210 211 212 213 214 215 216 217
## 69.77133 104.77259 94.23390 79.46385 65.61489 67.69880 82.34726 77.13837
## 218 219 220 221 222 223 224 225
## 92.84837 78.25119 78.75911 78.51747 74.94712 82.58438 73.03858 78.85815
## 226 227 228 229 230 232 233 234
## 74.54286 82.42076 79.83949 82.24726 70.94105 90.70332 78.27249 89.05509
## 235 236 237 238 239 240 241 242
## 80.57293 75.23115 83.21572 76.73696 89.69549 71.04698 87.70732 86.29062
## 243 244 245 246 247 248 249 250
## 84.21911 81.98194 61.94749 88.47766 81.58155 85.63590 73.41545 83.97253
## 251 252 253 254 255 256 257 258
## 81.32226 65.04006 88.95430 28.93765 69.74016 77.79276 83.41562 85.06883
## 259
## 78.92672
## And that is all!! done...