Today we cover the concepts of model building and residual analysis. Specifically we talked went over the ideas of diagnostic plots, transformation of variables and outliers. We use diagnostic plots which contains residual analysis.

Diagnostic Plot In class, we discussed the four different plots which are given by r when we use the plot command to plot our linear model. See second r chunk code for example. We discussed that the most important plot to focus on is the residual vs. fitted values. We shouldn’t see any trends and it should have equal spread in order for it to be the right mean function.

Transformation of variables This is used when we observed the plots and realized that the mean function follows a trend and the cook’s distance has some variables that are more influential than others. We discussed 3 common ways to transform the variables in hopes of improving the heteroscedasticity. See r chunk 3 for demonstration

Outliers Last but not least is the concept of outliers. The big question is how should we deal with them. We discussed that it really depends on the situation. In the x direction, the outlier might not have an affect because it follows the linear model. In the y direction, we have to examine it case by case. We shouldn’t exclude the outlier unless there’s clearly an error or if we can provide justification as to why we exclude them.

library(alr3)
data(brains)
head(brains)
attach(brains)
mod1<-lm(BrainWt~BodyWt,data = brains)
mod1
plot(mod1)

The first plot is the most important. We have to make sure the point prediction/mean function is correct. Now we should try and improve the mean function. Note: win(brains,plot(BrainWt~BodyWt)) this is an alternative method to attach command. R will get to the first variable and pull the data from brains.

smod<-lm(sqrt(BrainWt)~sqrt(BodyWt),data = brains)
plot(smod)
lmod<-lm(log(BrainWt)~log(BodyWt),data = brains)
plot(lmod)
l1mod<-lm(log(BrainWt)~log(BodyWt),data = brains)
plot(l1mod)
brainw<-1/BrainWt
bodyw<-1/BodyWt
imod<-lm(brainw~bodyw,data = brains)
plot(imod)

we can use different transformations to try and help with heteroscedasticity. We can transform one or both of the variables. The goal is to get the best transformation. We see that the log transformation is the best because all of the variables are within the dotted area of the cook’s distance. None of the points now have that great of leverage

data(stopping)
names(stopping)
attach(stopping)
mod2<-lm(Distance~Speed)
plot(mod2)
mod3<-lm(sqrt(Distance)~sqrt(Speed),data = stopping)
plot(mod3)
mod3a<-lm(Distance~sqrt(Speed),data = stopping)
plot(mod3a)
mod3b<-lm(sqrt(Distance)~Speed,data = stopping)
plot(mod3b) #this is the best transformation
mod4<-lm(log(Distance)~log(Speed),data = stopping)
plot(mod4)
mod5<-lm(Distance~log(Distance),data = stopping)
plot(mod5)#taking the log of one isn't better than doing it for both
Dist1<-1/Distance
Speed1<-1/Speed
mod6<-lm(Dist1~Speed1,data = stopping)
plot(mod6)
LS0tCnRpdGxlOiAiTGVhcm5pbmcgTG9nIDE0IgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgpUb2RheSB3ZSBjb3ZlciB0aGUgY29uY2VwdHMgb2YgbW9kZWwgYnVpbGRpbmcgYW5kIHJlc2lkdWFsIGFuYWx5c2lzLiBTcGVjaWZpY2FsbHkgd2UgdGFsa2VkIHdlbnQgb3ZlciB0aGUgaWRlYXMgb2YgZGlhZ25vc3RpYyBwbG90cywgdHJhbnNmb3JtYXRpb24gb2YgdmFyaWFibGVzIGFuZCBvdXRsaWVycy4gV2UgdXNlIGRpYWdub3N0aWMgcGxvdHMgd2hpY2ggY29udGFpbnMgcmVzaWR1YWwgYW5hbHlzaXMuIAoKRGlhZ25vc3RpYyBQbG90CkluIGNsYXNzLCB3ZSBkaXNjdXNzZWQgdGhlIGZvdXIgZGlmZmVyZW50IHBsb3RzIHdoaWNoIGFyZSBnaXZlbiBieSByIHdoZW4gd2UgdXNlIHRoZSBwbG90IGNvbW1hbmQgdG8gcGxvdCBvdXIgbGluZWFyIG1vZGVsLiBTZWUgc2Vjb25kIHIgY2h1bmsgY29kZSBmb3IgZXhhbXBsZS4gV2UgZGlzY3Vzc2VkIHRoYXQgdGhlIG1vc3QgaW1wb3J0YW50IHBsb3QgdG8gZm9jdXMgb24gaXMgdGhlIHJlc2lkdWFsIHZzLiBmaXR0ZWQgdmFsdWVzLiBXZSBzaG91bGRuJ3Qgc2VlIGFueSB0cmVuZHMgYW5kIGl0IHNob3VsZCBoYXZlIGVxdWFsIHNwcmVhZCBpbiBvcmRlciBmb3IgaXQgdG8gYmUgdGhlIHJpZ2h0IG1lYW4gZnVuY3Rpb24uCgpUcmFuc2Zvcm1hdGlvbiBvZiB2YXJpYWJsZXMKVGhpcyBpcyB1c2VkIHdoZW4gd2Ugb2JzZXJ2ZWQgdGhlIHBsb3RzIGFuZCByZWFsaXplZCB0aGF0IHRoZSBtZWFuIGZ1bmN0aW9uIGZvbGxvd3MgYSB0cmVuZCBhbmQgdGhlIGNvb2sncyBkaXN0YW5jZSBoYXMgc29tZSB2YXJpYWJsZXMgdGhhdCBhcmUgbW9yZSBpbmZsdWVudGlhbCB0aGFuIG90aGVycy4gV2UgZGlzY3Vzc2VkIDMgY29tbW9uIHdheXMgdG8gdHJhbnNmb3JtIHRoZSB2YXJpYWJsZXMgaW4gaG9wZXMgb2YgaW1wcm92aW5nIHRoZSBoZXRlcm9zY2VkYXN0aWNpdHkuIFNlZSByIGNodW5rIDMgZm9yIGRlbW9uc3RyYXRpb24KCk91dGxpZXJzCkxhc3QgYnV0IG5vdCBsZWFzdCBpcyB0aGUgY29uY2VwdCBvZiBvdXRsaWVycy4gVGhlIGJpZyBxdWVzdGlvbiBpcyBob3cgc2hvdWxkIHdlIGRlYWwgd2l0aCB0aGVtLiBXZSBkaXNjdXNzZWQgdGhhdCBpdCByZWFsbHkgZGVwZW5kcyBvbiB0aGUgc2l0dWF0aW9uLiBJbiB0aGUgeCBkaXJlY3Rpb24sIHRoZSBvdXRsaWVyIG1pZ2h0IG5vdCBoYXZlIGFuIGFmZmVjdCBiZWNhdXNlIGl0IGZvbGxvd3MgdGhlIGxpbmVhciBtb2RlbC4gSW4gdGhlIHkgZGlyZWN0aW9uLCB3ZSBoYXZlIHRvIGV4YW1pbmUgaXQgY2FzZSBieSBjYXNlLiBXZSBzaG91bGRuJ3QgZXhjbHVkZSB0aGUgb3V0bGllciB1bmxlc3MgdGhlcmUncyBjbGVhcmx5IGFuIGVycm9yIG9yIGlmIHdlIGNhbiBwcm92aWRlIGp1c3RpZmljYXRpb24gYXMgdG8gd2h5IHdlIGV4Y2x1ZGUgdGhlbS4gCgoKYGBge3J9CmxpYnJhcnkoYWxyMykKZGF0YShicmFpbnMpCmhlYWQoYnJhaW5zKQphdHRhY2goYnJhaW5zKQptb2QxPC1sbShCcmFpbld0fkJvZHlXdCxkYXRhID0gYnJhaW5zKQptb2QxCnBsb3QobW9kMSkKYGBgClRoZSBmaXJzdCBwbG90IGlzIHRoZSBtb3N0IGltcG9ydGFudC4gV2UgaGF2ZSB0byBtYWtlIHN1cmUgdGhlIHBvaW50IHByZWRpY3Rpb24vbWVhbiBmdW5jdGlvbiBpcyBjb3JyZWN0LiBOb3cgd2Ugc2hvdWxkIHRyeSBhbmQgaW1wcm92ZSB0aGUgbWVhbiBmdW5jdGlvbi4KTm90ZTogd2luKGJyYWlucyxwbG90KEJyYWluV3R+Qm9keVd0KSkgdGhpcyBpcyBhbiBhbHRlcm5hdGl2ZSBtZXRob2QgdG8gYXR0YWNoIGNvbW1hbmQuIFIgd2lsbCBnZXQgdG8gdGhlIGZpcnN0IHZhcmlhYmxlIGFuZCBwdWxsIHRoZSBkYXRhIGZyb20gYnJhaW5zLiAKYGBge3J9CnNtb2Q8LWxtKHNxcnQoQnJhaW5XdCl+c3FydChCb2R5V3QpLGRhdGEgPSBicmFpbnMpCnBsb3Qoc21vZCkKbG1vZDwtbG0obG9nKEJyYWluV3QpfmxvZyhCb2R5V3QpLGRhdGEgPSBicmFpbnMpCnBsb3QobG1vZCkKbDFtb2Q8LWxtKGxvZyhCcmFpbld0KX5sb2coQm9keVd0KSxkYXRhID0gYnJhaW5zKQpwbG90KGwxbW9kKQpicmFpbnc8LTEvQnJhaW5XdApib2R5dzwtMS9Cb2R5V3QKaW1vZDwtbG0oYnJhaW53fmJvZHl3LGRhdGEgPSBicmFpbnMpCnBsb3QoaW1vZCkKYGBgCndlIGNhbiB1c2UgZGlmZmVyZW50IHRyYW5zZm9ybWF0aW9ucyB0byB0cnkgYW5kIGhlbHAgd2l0aCBoZXRlcm9zY2VkYXN0aWNpdHkuIFdlIGNhbiB0cmFuc2Zvcm0gb25lIG9yIGJvdGggb2YgdGhlIHZhcmlhYmxlcy4gVGhlIGdvYWwgaXMgdG8gZ2V0IHRoZSBiZXN0IHRyYW5zZm9ybWF0aW9uLiBXZSBzZWUgdGhhdCB0aGUgbG9nIHRyYW5zZm9ybWF0aW9uIGlzIHRoZSBiZXN0IGJlY2F1c2UgYWxsIG9mIHRoZSB2YXJpYWJsZXMgYXJlIHdpdGhpbiB0aGUgZG90dGVkIGFyZWEgb2YgdGhlIGNvb2sncyBkaXN0YW5jZS4gTm9uZSBvZiB0aGUgcG9pbnRzIG5vdyBoYXZlIHRoYXQgZ3JlYXQgb2YgbGV2ZXJhZ2UKYGBge3J9CmRhdGEoc3RvcHBpbmcpCm5hbWVzKHN0b3BwaW5nKQphdHRhY2goc3RvcHBpbmcpCm1vZDI8LWxtKERpc3RhbmNlflNwZWVkKQpwbG90KG1vZDIpCm1vZDM8LWxtKHNxcnQoRGlzdGFuY2UpfnNxcnQoU3BlZWQpLGRhdGEgPSBzdG9wcGluZykKcGxvdChtb2QzKQptb2QzYTwtbG0oRGlzdGFuY2V+c3FydChTcGVlZCksZGF0YSA9IHN0b3BwaW5nKQpwbG90KG1vZDNhKQptb2QzYjwtbG0oc3FydChEaXN0YW5jZSl+U3BlZWQsZGF0YSA9IHN0b3BwaW5nKQpwbG90KG1vZDNiKSAjdGhpcyBpcyB0aGUgYmVzdCB0cmFuc2Zvcm1hdGlvbgptb2Q0PC1sbShsb2coRGlzdGFuY2UpfmxvZyhTcGVlZCksZGF0YSA9IHN0b3BwaW5nKQpwbG90KG1vZDQpCm1vZDU8LWxtKERpc3RhbmNlfmxvZyhEaXN0YW5jZSksZGF0YSA9IHN0b3BwaW5nKQpwbG90KG1vZDUpI3Rha2luZyB0aGUgbG9nIG9mIG9uZSBpc24ndCBiZXR0ZXIgdGhhbiBkb2luZyBpdCBmb3IgYm90aApEaXN0MTwtMS9EaXN0YW5jZQpTcGVlZDE8LTEvU3BlZWQKbW9kNjwtbG0oRGlzdDF+U3BlZWQxLGRhdGEgPSBzdG9wcGluZykKcGxvdChtb2Q2KQpgYGAKCgo=