This is an R Markdown
Notebook. When you execute code within the notebook, the results appear
beneath the code.
Try executing this chunk by clicking the Run button within
the chunk or by placing your cursor inside it and pressing
Ctrl+Shift+Enter.
** import the data **
# Load the data
baseball = read.csv("C:/Users/duway/Downloads/baseball.csv")
str(baseball)# check the structure of the data
getwd()# check the working directory
# create a subset of the data with only the columns we need
moneyball = subset(baseball,Year<2002)
str(moneyball)
902 rows and 15 columns
#Compute runs difference and add it to the dataframe as a new column RD
moneyball$RD = moneyball$RS - moneyball$RA
str(moneyball)
# plot rd and wins
plot(moneyball$RD, moneyball$W)
#Regression model to predict wins
WinsReg = lm(W ~ RD, data = moneyball)
summary(WinsReg)
for one unit increase in RD, the wins will increase by 0.1057661 or
10.57661%
# use the model to compute the predicted wins 100
Wins<-predict(WinsReg, data.frame(RD=100))
# w=81 + 0.1057661*100
Wins
RunsReg = lm(RS ~ OBP + SLG + BA, data = moneyball)
summary(RunsReg)
a negative signal can signal a problem with multicollinearity
check correlation
cor(moneyball$BA, moneyball$OBP)
# run variance inflation factor from scratch
#install.packages("car")
#library(car)
# check for multicollinearity using the variance inflation factor
vif(RunsReg)
when you see greater than 5, it is a sign of multicollinearity
this shows a high correlation between BA and OBP
above these features show a high level of correlation with the runs
scored
# use the model to compute the predicted runs
RunsReg = lm(RS ~ OBP + SLG , data = moneyball)
summary(RunsReg)
almost 92% of the variance in runs scored is explained by the model
using OBP and SLG
Create a regression model to predict runs allowed
RAReg = lm(RA ~ OOBP + OSLG, data = moneyball)
summary(RAReg)
#oobp are the opponent’s on-base percentage and oslg is the
opponent’s slugging percentage are both significant predictors of runs
allowed #Is t-value is greater than 2, it is significant 9.979 for OOBP
and 8.632 for OSLG
vif(RAReg)# check for multicollinearity
inclass activty seven
f a baseball team scores 763 runs and allows 614 runs, how many
games do we expect the team to win?
Using the linear regression model constructed during the lecture,
enter the number of games we expect the team to win:
# use the model to compute the predicted runs
Num_wins=88.88 + 0.1057661*(763-614)
Wins<-predict(WinsReg, data.frame(RD=763-614))
# w=81 + 0.1057661*100
Wins
The team is expected to win 97 games with 763 runs scored and 614
runs allowed difference of 149
inclass activty eight
Exercise 1 If a baseball team’s OBP is 0.361, SLG is 0.409, and BA is
0.257, how many runs do we expect the team to score? Using the linear
regression model constructed during the lecture (the one that uses OBP,
SLG, and BA as independent variables), find the number of runs we expect
the team to score:
#If a baseball team’s OBP is 0.361, SLG is 0.409, and BA is 0.257, how many runs do we expect the team to score?
0
[1] 0
ExpectedRuns=-804.63 + 2737.77*(0.361) + 1584.91*(0.409)
ExpectedRuns # recheck the formula
[1] 831.9332
832 runs are expected to be scored by the team with OBP 0.361, SLG
0.409, and BA 0.257
Exercise 2
If a baseball team’s opponents OBP (OOBP) is 0.267 and opponents SLG
(OSLG) is 0.392, how many runs do we expect the team to allow? Using the
linear regression model discussed during the lecture (the one on the
last slide of the previous video), find the number of runs we expect the
team to allow.
#If a baseball team’s opponents OBP (OOBP) is 0.267 and opponents SLG (OSLG) is 0.392, how many runs do we expect the team to allow?
ExpectedRunsAllowed=-837.38 + 2913.60*(0.267) + 1514.29*(0.392)
ExpectedRunsAllowed
[1] 534.1529
534 runs are expected to be allowed by the team with OOBP 0.267 and
OSLG 0.392
LS0tDQp0aXRsZTogIlIgTm90ZWJvb2siDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCi0tLQ0KDQpUaGlzIGlzIGFuIFtSIE1hcmtkb3duXShodHRwOi8vcm1hcmtkb3duLnJzdHVkaW8uY29tKSBOb3RlYm9vay4gV2hlbiB5b3UgZXhlY3V0ZSBjb2RlIHdpdGhpbiB0aGUgbm90ZWJvb2ssIHRoZSByZXN1bHRzIGFwcGVhciBiZW5lYXRoIHRoZSBjb2RlLiANCg0KVHJ5IGV4ZWN1dGluZyB0aGlzIGNodW5rIGJ5IGNsaWNraW5nIHRoZSAqUnVuKiBidXR0b24gd2l0aGluIHRoZSBjaHVuayBvciBieSBwbGFjaW5nIHlvdXIgY3Vyc29yIGluc2lkZSBpdCBhbmQgcHJlc3NpbmcgKkN0cmwrU2hpZnQrRW50ZXIqLiANCg0KKiogaW1wb3J0IHRoZSBkYXRhICoqDQpgYGB7cn0NCiMgTG9hZCB0aGUgZGF0YQ0KYmFzZWJhbGwgPSByZWFkLmNzdigiQzovVXNlcnMvZHV3YXkvRG93bmxvYWRzL2Jhc2ViYWxsLmNzdiIpDQpzdHIoYmFzZWJhbGwpIyBjaGVjayB0aGUgc3RydWN0dXJlIG9mIHRoZSBkYXRhDQpgYGANCmBgYHtyfQ0KDQpnZXR3ZCgpIyBjaGVjayB0aGUgd29ya2luZyBkaXJlY3RvcnkNCmBgYA0KDQpgYGB7cn0NCiMgY3JlYXRlIGEgc3Vic2V0IG9mIHRoZSBkYXRhIHdpdGggb25seSB0aGUgY29sdW1ucyB3ZSBuZWVkDQptb25leWJhbGwgPSBzdWJzZXQoYmFzZWJhbGwsWWVhcjwyMDAyKQ0Kc3RyKG1vbmV5YmFsbCkNCmBgYA0KDQojIDkwMiByb3dzIGFuZCAxNSBjb2x1bW5zDQoNCg0KYGBge3J9DQojQ29tcHV0ZSBydW5zIGRpZmZlcmVuY2UgYW5kIGFkZCBpdCB0byB0aGUgZGF0YWZyYW1lIGFzIGEgbmV3IGNvbHVtbiBSRA0KbW9uZXliYWxsJFJEID0gbW9uZXliYWxsJFJTIC0gbW9uZXliYWxsJFJBDQpzdHIobW9uZXliYWxsKQ0KYGBgDQoNCg0KDQoNCmBgYHtyfQ0KIyBwbG90IHJkIGFuZCB3aW5zDQpwbG90KG1vbmV5YmFsbCRSRCwgbW9uZXliYWxsJFcpDQpgYGANCg0KYGBge3J9DQojUmVncmVzc2lvbiBtb2RlbCB0byBwcmVkaWN0IHdpbnMNCldpbnNSZWcgPSBsbShXIH4gUkQsIGRhdGEgPSBtb25leWJhbGwpDQpzdW1tYXJ5KFdpbnNSZWcpDQpgYGANCg0KDQojIyMgZm9yIG9uZSB1bml0IGluY3JlYXNlIGluIFJELCB0aGUgd2lucyB3aWxsIGluY3JlYXNlIGJ5IDAuMTA1NzY2MSBvciAxMC41NzY2MSUNCg0KDQoNCmBgYHtyfQ0KIyB1c2UgdGhlIG1vZGVsIHRvIGNvbXB1dGUgdGhlIHByZWRpY3RlZCB3aW5zIDEwMA0KV2luczwtcHJlZGljdChXaW5zUmVnLCBkYXRhLmZyYW1lKFJEPTEwMCkpDQojIHc9ODEgKyAwLjEwNTc2NjEqMTAwDQpXaW5zDQpgYGANCg0KYGBge3J9DQpSdW5zUmVnID0gbG0oUlMgfiBPQlAgKyBTTEcgKyBCQSwgZGF0YSA9IG1vbmV5YmFsbCkNCnN1bW1hcnkoUnVuc1JlZykNCmBgYA0KDQojIGEgbmVnYXRpdmUgc2lnbmFsIGNhbiBzaWduYWwgYSBwcm9ibGVtIHdpdGggbXVsdGljb2xsaW5lYXJpdHkNCg0KDQojIGNoZWNrIGNvcnJlbGF0aW9uDQpgYGB7cn0NCmNvcihtb25leWJhbGwkQkEsIG1vbmV5YmFsbCRPQlApDQpgYGANCmBgYHtyfQ0KIyBydW4gdmFyaWFuY2UgaW5mbGF0aW9uIGZhY3RvciBmcm9tIHNjcmF0Y2gNCiNpbnN0YWxsLnBhY2thZ2VzKCJjYXIiKQ0KYGBgDQoNCmBgYHtyfQ0KI2xpYnJhcnkoY2FyKQ0KYGBgDQpgYGB7cn0NCiMgY2hlY2sgZm9yIG11bHRpY29sbGluZWFyaXR5IHVzaW5nIHRoZSB2YXJpYW5jZSBpbmZsYXRpb24gZmFjdG9yDQp2aWYoUnVuc1JlZykNCmBgYA0KIyB3aGVuIHlvdSBzZWUgZ3JlYXRlciB0aGFuIDUsIGl0IGlzIGEgc2lnbiBvZiBtdWx0aWNvbGxpbmVhcml0eQ0KDQojIyB0aGlzIHNob3dzIGEgaGlnaCBjb3JyZWxhdGlvbiBiZXR3ZWVuIEJBIGFuZCBPQlANCg0KDQojIGFib3ZlIHRoZXNlIGZlYXR1cmVzIHNob3cgYSBoaWdoIGxldmVsIG9mIGNvcnJlbGF0aW9uIHdpdGggdGhlIHJ1bnMgc2NvcmVkDQoNCmBgYHtyfQ0KIyB1c2UgdGhlIG1vZGVsIHRvIGNvbXB1dGUgdGhlIHByZWRpY3RlZCBydW5zDQpSdW5zUmVnID0gbG0oUlMgfiBPQlAgKyBTTEcgLCBkYXRhID0gbW9uZXliYWxsKQ0Kc3VtbWFyeShSdW5zUmVnKQ0KYGBgDQoNCiMgYWxtb3N0IDkyJSBvZiB0aGUgdmFyaWFuY2UgaW4gcnVucyBzY29yZWQgaXMgZXhwbGFpbmVkIGJ5IHRoZSBtb2RlbCB1c2luZyBPQlAgYW5kIFNMRw0KDQoNCiMgQ3JlYXRlIGEgcmVncmVzc2lvbiBtb2RlbCB0byBwcmVkaWN0IHJ1bnMgYWxsb3dlZCANCmBgYHtyfQ0KUkFSZWcgPSBsbShSQSB+IE9PQlAgKyBPU0xHLCBkYXRhID0gbW9uZXliYWxsKQ0Kc3VtbWFyeShSQVJlZykNCmBgYA0KDQojb29icCBhcmUgdGhlIG9wcG9uZW50J3Mgb24tYmFzZSBwZXJjZW50YWdlIGFuZCBvc2xnIGlzIHRoZSBvcHBvbmVudCdzIHNsdWdnaW5nIHBlcmNlbnRhZ2UgYXJlIGJvdGggc2lnbmlmaWNhbnQgcHJlZGljdG9ycyBvZiBydW5zIGFsbG93ZWQNCiNJcyB0LXZhbHVlIGlzIGdyZWF0ZXIgdGhhbiAyLCBpdCBpcyBzaWduaWZpY2FudCAgOS45NzkgZm9yIE9PQlAgYW5kIDguNjMyIGZvciBPU0xHDQoNCmBgYHtyfQ0KdmlmKFJBUmVnKSMgY2hlY2sgZm9yIG11bHRpY29sbGluZWFyaXR5DQpgYGANCg0KIyMjIGluY2xhc3MgYWN0aXZ0eSBzZXZlbg0KDQojIyMgZiBhIGJhc2ViYWxsIHRlYW0gc2NvcmVzIDc2MyBydW5zIGFuZCBhbGxvd3MgNjE0IHJ1bnMsIGhvdyBtYW55IGdhbWVzIGRvIHdlIGV4cGVjdCB0aGUgdGVhbSB0byB3aW4/DQojIyMgVXNpbmcgdGhlIGxpbmVhciByZWdyZXNzaW9uIG1vZGVsIGNvbnN0cnVjdGVkIGR1cmluZyB0aGUgbGVjdHVyZSwgZW50ZXIgdGhlIG51bWJlciBvZiBnYW1lcyB3ZSBleHBlY3QgdGhlIHRlYW0gdG8gd2luOg0KDQpgYGB7cn0NCiMgdXNlIHRoZSBtb2RlbCB0byBjb21wdXRlIHRoZSBwcmVkaWN0ZWQgcnVucw0KTnVtX3dpbnM9ODguODggKyAwLjEwNTc2NjEqKDc2My02MTQpDQpXaW5zPC1wcmVkaWN0KFdpbnNSZWcsIGRhdGEuZnJhbWUoUkQ9NzYzLTYxNCkpDQojIHc9ODEgKyAwLjEwNTc2NjEqMTAwDQpXaW5zDQoNCmBgYA0KDQojIyMgVGhlIHRlYW0gaXMgZXhwZWN0ZWQgdG8gd2luIDk3IGdhbWVzIHdpdGggNzYzIHJ1bnMgc2NvcmVkIGFuZCA2MTQgcnVucyBhbGxvd2VkIGRpZmZlcmVuY2Ugb2YgMTQ5DQoNCg0KDQoNCg0KDQoNCg0KIyMjIGluY2xhc3MgYWN0aXZ0eSBlaWdodA0KDQpFeGVyY2lzZSAxDQpJZiBhIGJhc2ViYWxsIHRlYW3igJlzIE9CUCBpcyAwLjM2MSwgU0xHIGlzIDAuNDA5LCBhbmQgQkEgaXMgMC4yNTcsIGhvdyBtYW55IHJ1bnMgZG8gd2UgZXhwZWN0IHRoZSB0ZWFtIHRvIHNjb3JlPw0KVXNpbmcgdGhlIGxpbmVhciByZWdyZXNzaW9uIG1vZGVsIGNvbnN0cnVjdGVkIGR1cmluZyB0aGUgbGVjdHVyZSAodGhlIG9uZSB0aGF0IHVzZXMgT0JQLCBTTEcsIGFuZCBCQSBhcyBpbmRlcGVuZGVudCB2YXJpYWJsZXMpLCBmaW5kIHRoZSBudW1iZXIgb2YgcnVucyB3ZSBleHBlY3QgdGhlIHRlYW0gdG8gc2NvcmU6DQoNCg0KYGBge3J9DQojSWYgYSBiYXNlYmFsbCB0ZWFt4oCZcyBPQlAgaXMgMC4zNjEsIFNMRyBpcyAwLjQwOSwgYW5kIEJBIGlzIDAuMjU3LCBob3cgbWFueSBydW5zIGRvIHdlIGV4cGVjdCB0aGUgdGVhbSB0byBzY29yZT8NCg0KRXhwZWN0ZWRSdW5zPS04MDQuNjMgKyAyNzM3Ljc3KigwLjM2MSkgKyAxNTg0LjkxKigwLjQwOSkNCkV4cGVjdGVkUnVucyAjIHJlY2hlY2sgdGhlIGZvcm11bGENCg0KYGBgDQoNCiMgODMyIHJ1bnMgYXJlIGV4cGVjdGVkIHRvIGJlIHNjb3JlZCBieSB0aGUgdGVhbSB3aXRoIE9CUCAwLjM2MSwgU0xHIDAuNDA5LCBhbmQgQkEgMC4yNTcNCg0KDQoNCg0KDQoNCg0KDQoNCiBFeGVyY2lzZSAyIA0KDQoNCklmIGEgYmFzZWJhbGwgdGVhbeKAmXMgb3Bwb25lbnRzIE9CUCAoT09CUCkgaXMgMC4yNjcgYW5kIG9wcG9uZW50cyBTTEcgKE9TTEcpIGlzIDAuMzkyLCBob3cgbWFueSBydW5zIGRvIHdlIGV4cGVjdCB0aGUgdGVhbSB0byBhbGxvdz8NClVzaW5nIHRoZSBsaW5lYXIgcmVncmVzc2lvbiBtb2RlbCBkaXNjdXNzZWQgZHVyaW5nIHRoZSBsZWN0dXJlICh0aGUgb25lIG9uIHRoZSBsYXN0IHNsaWRlIG9mIHRoZSBwcmV2aW91cyB2aWRlbyksIGZpbmQgdGhlIG51bWJlciBvZiBydW5zIHdlIGV4cGVjdCB0aGUgdGVhbSB0byBhbGxvdy4gDQoNCg0KYGBge3J9DQojSWYgYSBiYXNlYmFsbCB0ZWFt4oCZcyBvcHBvbmVudHMgT0JQIChPT0JQKSBpcyAwLjI2NyBhbmQgb3Bwb25lbnRzIFNMRyAoT1NMRykgaXMgMC4zOTIsIGhvdyBtYW55IHJ1bnMgZG8gd2UgZXhwZWN0IHRoZSB0ZWFtIHRvIGFsbG93Pw0KDQpFeHBlY3RlZFJ1bnNBbGxvd2VkPS04MzcuMzggKyAyOTEzLjYwKigwLjI2NykgKyAxNTE0LjI5KigwLjM5MikNCkV4cGVjdGVkUnVuc0FsbG93ZWQNCmBgYA0KIyA1MzQgcnVucyBhcmUgZXhwZWN0ZWQgdG8gYmUgYWxsb3dlZCBieSB0aGUgdGVhbSB3aXRoIE9PQlAgMC4yNjcgYW5kIE9TTEcgMC4zOTINCg0KDQo=