This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code. Including data:

ggplot based scatter plot

R based scatter plot

Fit the data using simple linear regression, then create residual plots to check the assumptions. Is it a good fit?

artificial_fit <- lm(y~x, data = artificial)
summary(artificial_fit)

Call:
lm(formula = y ~ x, data = artificial)

Residuals:
      1       2       3       4       5       6       7 
-4.1027  4.2505  3.7638  1.0771 -0.7096 -1.1963 -3.0830 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   4.5160     3.4172   1.322    0.244
x             0.5867     0.4792   1.224    0.275

Residual standard error: 3.512 on 5 degrees of freedom
Multiple R-squared:  0.2307,    Adjusted R-squared:  0.07681 
F-statistic: 1.499 on 1 and 5 DF,  p-value: 0.2753

QQplot of residuals

qqnorm(residuals(artificial_fit), main = "QQplot of residuals")
plot(residuals(artificial_fit)~fitted(artificial_fit), ylab = "Residuals", xlab= "Fitted Values", main= "Residuals vrs Fitted values")

abline(h=0)  

For the model in part (b), compute the raw residuals and the externally studentized residuals with R. Compare the two types of residuals.

#Raw residuals
residuals_artificial <- residuals(artificial_fit)
residuals_artificial
         1          2          3          4          5          6          7 
-4.1026596  4.2505319  3.7638298  1.0771277 -0.7095745 -1.1962766 -3.0829787 
#Externally studentized residuals There are two ways of computing externally studentized residuals both of them equivalent. "studres funcion from MASS package" or rstudent from R basic functions
library(MASS)
externally_studntized_res <- studres(artificial_fit)
student <- rstudent(artificial_fit)
#dataframe of both residuals
residuals_dataframe <- data.frame(residuals_artificial, externally_studntized_res)
names(residuals_dataframe) <- c("Raw", "Ext. Studentized")
residuals_dataframe

Identify a possible outlier

reduced data set:

fit model with reduced data set

mod_red <- lm(yred~xred, data = red)
summary(mod_red)

Call:
lm(formula = yred ~ xred, data = red)

Residuals:
       1        2        3        4        5        6 
-0.24286  0.79429 -0.36857 -0.63143  0.40571  0.04286 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  16.6286     1.0843  15.336 0.000105 ***
xred         -0.9371     0.1410  -6.648 0.002658 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.5897 on 4 degrees of freedom
Multiple R-squared:  0.917, Adjusted R-squared:  0.8963 
F-statistic:  44.2 on 1 and 4 DF,  p-value: 0.002658

QQ plot and Residuals vrs Fitted

Leverage and the Cook’s distance

Which observations can be considered as potentially influential and/or influential? Leverage refers to “potential influence” Remember that a point is potentially influential if hi > 2(2/n) For this data set, n = 7, so leverage values (hat values or hi) greater than 4/7 = 0.57 are cause for concern. Hence the first point is a potential influuential point. For Cook’s distance, values greater than 1 are cause for concern, so here the first observations is also actually influential as well.

Transformations

The motivation for a transformation is to correct violations of the necessary assumptions i.e., nonlinearity of the data, unequal variance or nonnormal- ity in residuals. The most commonly used transformations are the square root and the log (base on the log doesn’t matter much), less common are inverse power or power transformations. Both y and x can be transformed. Transformations of y tend to affect both linearity and variability, whereas transformations of x tend to affect only linearity. Transformations can be motivated by theoretical arguments, empirical evidence, or both. It is not necessary to find a transformation that provides “perfectly” equal variance or normality, but linearity is quite important.

LS0tDQp0aXRsZTogIlN0YXRzIDU3MiBEaXNjdXNzaW9uIDIgQW5kcmV5IFZlZ2EiDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCi0tLQ0KDQpUaGlzIGlzIGFuIFtSIE1hcmtkb3duXShodHRwOi8vcm1hcmtkb3duLnJzdHVkaW8uY29tKSBOb3RlYm9vay4gV2hlbiB5b3UgZXhlY3V0ZSBjb2RlIHdpdGhpbiB0aGUgbm90ZWJvb2ssIHRoZSByZXN1bHRzIGFwcGVhciBiZW5lYXRoIHRoZSBjb2RlLiANCkluY2x1ZGluZyBkYXRhOg0KYGBge3IgcGFnZWQucHJpbnQ9VFJVRX0NCnggPC0gYygxLCA1LCA2LCA3LCA4LCA5LCAxMCkNCnkgPC0gYygxLCAxMS43LCAxMS44LCA5LjcsIDguNSwgOC42LCA3LjMpDQphcnRpZmljaWFsIDwtIGRhdGEuZnJhbWUoeCx5KQ0KYXJ0aWZpY2lhbA0KYGBgDQpnZ3Bsb3QgYmFzZWQgc2NhdHRlciBwbG90DQpgYGB7cn0NCmxpYnJhcnkoZ2dwbG90MikNCmFydGlmaWNpYWxfc2NhdHRlcl9wbG90IDwtIGdncGxvdChhcnRpZmljaWFsLCBhZXMoeD0geCwNCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHkgPSB5KSkgK2dlb21fcG9pbnQoKSsgDQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGxhYnMoeD0gIlgiLCB5ID0gIlkgKHVuaXRzKSIpKw0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgZ2VvbV9zbW9vdGgobWV0aG9kID0gbG0sIHNlPUYpDQphcnRpZmljaWFsX3NjYXR0ZXJfcGxvdA0KYGBgDQpSIGJhc2VkIHNjYXR0ZXIgcGxvdA0KYGBge3J9DQpwbG90KHgseSwgZGF0YT0gYXJ0aWZpY2lhbCwgeGxhYiA9ICJYIiwgeWxhYj0gIlkiLCBtYWluID0gIlNjYXR0ZXIgcGxvdCBvZiBYIHZycyBZIikNCmBgYA0KRml0IHRoZSBkYXRhIHVzaW5nIHNpbXBsZSBsaW5lYXIgcmVncmVzc2lvbiwgdGhlbiBjcmVhdGUgcmVzaWR1YWwgcGxvdHMgdG8gY2hlY2sgdGhlIGFzc3VtcHRpb25zLiBJcyBpdCBhIGdvb2QgZml0Pw0KYGBge3J9DQphcnRpZmljaWFsX2ZpdCA8LSBsbSh5fngsIGRhdGEgPSBhcnRpZmljaWFsKQ0Kc3VtbWFyeShhcnRpZmljaWFsX2ZpdCkNCmBgYA0KUVFwbG90IG9mIHJlc2lkdWFscw0KYGBge3J9DQpxcW5vcm0ocmVzaWR1YWxzKGFydGlmaWNpYWxfZml0KSwgbWFpbiA9ICJRUXBsb3Qgb2YgcmVzaWR1YWxzIikNCnBsb3QocmVzaWR1YWxzKGFydGlmaWNpYWxfZml0KX5maXR0ZWQoYXJ0aWZpY2lhbF9maXQpLCB5bGFiID0gIlJlc2lkdWFscyIsIHhsYWI9ICJGaXR0ZWQgVmFsdWVzIiwgbWFpbj0gIlJlc2lkdWFscyB2cnMgRml0dGVkIHZhbHVlcyIpDQphYmxpbmUoaD0wKSAgDQpgYGANCkZvciB0aGUgbW9kZWwgaW4gcGFydCAoYiksIGNvbXB1dGUgdGhlIHJhdyByZXNpZHVhbHMgYW5kIHRoZSBleHRlcm5hbGx5IHN0dWRlbnRpemVkIHJlc2lkdWFscyB3aXRoIFIuIENvbXBhcmUgdGhlIHR3byB0eXBlcyBvZiByZXNpZHVhbHMuDQpgYGB7cn0NCiNSYXcgcmVzaWR1YWxzDQpyZXNpZHVhbHNfYXJ0aWZpY2lhbCA8LSByZXNpZHVhbHMoYXJ0aWZpY2lhbF9maXQpDQpyZXNpZHVhbHNfYXJ0aWZpY2lhbA0KI0V4dGVybmFsbHkgc3R1ZGVudGl6ZWQgcmVzaWR1YWxzIFRoZXJlIGFyZSB0d28gd2F5cyBvZiBjb21wdXRpbmcgZXh0ZXJuYWxseSBzdHVkZW50aXplZCByZXNpZHVhbHMgYm90aCBvZiB0aGVtIGVxdWl2YWxlbnQuICJzdHVkcmVzIGZ1bmNpb24gZnJvbSBNQVNTIHBhY2thZ2UiIG9yIHJzdHVkZW50IGZyb20gUiBiYXNpYyBmdW5jdGlvbnMNCmxpYnJhcnkoTUFTUykNCmV4dGVybmFsbHlfc3R1ZG50aXplZF9yZXMgPC0gc3R1ZHJlcyhhcnRpZmljaWFsX2ZpdCkNCnN0dWRlbnQgPC0gcnN0dWRlbnQoYXJ0aWZpY2lhbF9maXQpDQojZGF0YWZyYW1lIG9mIGJvdGggcmVzaWR1YWxzDQpyZXNpZHVhbHNfZGF0YWZyYW1lIDwtIGRhdGEuZnJhbWUocmVzaWR1YWxzX2FydGlmaWNpYWwsIGV4dGVybmFsbHlfc3R1ZG50aXplZF9yZXMpDQpuYW1lcyhyZXNpZHVhbHNfZGF0YWZyYW1lKSA8LSBjKCJSYXciLCAiRXh0LiBTdHVkZW50aXplZCIpDQpyZXNpZHVhbHNfZGF0YWZyYW1lDQpgYGANCiNJZGVudGlmeSBhIHBvc3NpYmxlIG91dGxpZXIgDQpyZWR1Y2VkIGRhdGEgc2V0Og0KYGBge3J9DQp4cmVkIDwtIGMoIDUsIDYsIDcsIDgsIDksIDEwKQ0KeXJlZCA8LSBjKCAxMS43LCAxMS44LCA5LjcsIDguNSwgOC42LCA3LjMpDQpyZWQgPC0gZGF0YS5mcmFtZSh4cmVkLHlyZWQpDQpgYGANCmZpdCBtb2RlbCB3aXRoIHJlZHVjZWQgZGF0YSBzZXQNCmBgYHtyfQ0KbW9kX3JlZCA8LSBsbSh5cmVkfnhyZWQsIGRhdGEgPSByZWQpDQpzdW1tYXJ5KG1vZF9yZWQpDQpgYGANCiMjIFFRIHBsb3QgYW5kICBSZXNpZHVhbHMgdnJzIEZpdHRlZCANCmBgYHtyfQ0KcGFyKG1mcm93PWMoMSwyKSkNCnFxbm9ybShyZXNpZHVhbHMobW9kX3JlZCksIG1haW4gPSAiUVFwbG90IG9mIHJlc2lkdWFscyIpDQpwbG90KHJlc2lkdWFscyhtb2RfcmVkKX5maXR0ZWQobW9kX3JlZCksIHlsYWIgPSAiUmVzaWR1YWxzIiwgeGxhYj0gIkZpdHRlZCBWYWx1ZXMiLCBtYWluPSAiUmVzaWR1YWxzIHZycyBGaXR0ZWQgdmFsdWVzIikNCmBgYA0KDQojTGV2ZXJhZ2UgYW5kIHRoZSBDb29rJ3MgZGlzdGFuY2UgDQpXaGljaCBvYnNlcnZhdGlvbnMgY2FuIGJlIGNvbnNpZGVyZWQgYXMgcG90ZW50aWFsbHkgaW5mbHVlbnRpYWwgYW5kL29yIGluZmx1ZW50aWFsPw0KTGV2ZXJhZ2UgcmVmZXJzIHRvICJwb3RlbnRpYWwgaW5mbHVlbmNlIiBSZW1lbWJlciB0aGF0IGEgcG9pbnQgaXMgcG90ZW50aWFsbHkgaW5mbHVlbnRpYWwgaWYgaGkgPiAgMigyL24pIA0KRm9yIHRoaXMgZGF0YSBzZXQsIG4gPSA3LCBzbyBsZXZlcmFnZSB2YWx1ZXMgKGhhdCB2YWx1ZXMgb3IgaGkpIGdyZWF0ZXIgdGhhbg0KNC83ID0gMC41NyBhcmUgY2F1c2UgZm9yIGNvbmNlcm4uIEhlbmNlIHRoZSBmaXJzdCBwb2ludCBpcyBhIHBvdGVudGlhbA0KaW5mbHV1ZW50aWFsIHBvaW50LiBGb3IgQ29vaydzIGRpc3RhbmNlLCB2YWx1ZXMgZ3JlYXRlciB0aGFuIDEgYXJlIGNhdXNlIGZvcg0KY29uY2Vybiwgc28gaGVyZSB0aGUgZmlyc3Qgb2JzZXJ2YXRpb25zIGlzIGFsc28gYWN0dWFsbHkgaW5mbHVlbnRpYWwgYXMgd2VsbC4NCmBgYHtyfQ0KY29vayA8LSBjb29rcy5kaXN0YW5jZShhcnRpZmljaWFsX2ZpdCkgI2Nvb2QgZGlzdGFuY2VzDQpsZXZlcmFnZSA8LSBoYXQobW9kZWwubWF0cml4KGFydGlmaWNpYWxfZml0KSkgI2xldmVyYWdlICANCmRhdGEuZnJhbWUoY29vayxsZXZlcmFnZSkNCmBgYA0KI1RyYW5zZm9ybWF0aW9ucw0KVGhlIG1vdGl2YXRpb24gZm9yIGEgdHJhbnNmb3JtYXRpb24gaXMgdG8gY29ycmVjdCB2aW9sYXRpb25zIG9mIHRoZSBuZWNlc3NhcnkNCmFzc3VtcHRpb25zIGkuZS4sIG5vbmxpbmVhcml0eSBvZiB0aGUgZGF0YSwgdW5lcXVhbCB2YXJpYW5jZSBvciBub25ub3JtYWwtDQppdHkgaW4gcmVzaWR1YWxzLg0KVGhlIG1vc3QgY29tbW9ubHkgdXNlZCB0cmFuc2Zvcm1hdGlvbnMgYXJlIHRoZSBzcXVhcmUgcm9vdCBhbmQgdGhlDQpsb2cgKGJhc2Ugb24gdGhlIGxvZyBkb2Vzbid0IG1hdHRlciBtdWNoKSwgbGVzcyBjb21tb24gYXJlIGludmVyc2UgcG93ZXINCm9yIHBvd2VyIHRyYW5zZm9ybWF0aW9ucy4NCkJvdGggeSBhbmQgeCBjYW4gYmUgdHJhbnNmb3JtZWQuIFRyYW5zZm9ybWF0aW9ucyBvZiB5IHRlbmQgdG8gYWZmZWN0IGJvdGgNCmxpbmVhcml0eSBhbmQgdmFyaWFiaWxpdHksIHdoZXJlYXMgdHJhbnNmb3JtYXRpb25zIG9mIHggdGVuZCB0byBhZmZlY3Qgb25seQ0KbGluZWFyaXR5Lg0KIFRyYW5zZm9ybWF0aW9ucyBjYW4gYmUgbW90aXZhdGVkIGJ5IHRoZW9yZXRpY2FsIGFyZ3VtZW50cywgZW1waXJpY2FsIGV2aWRlbmNlLCBvciBib3RoLiBJdCBpcyBub3QgbmVjZXNzYXJ5IHRvIGZpbmQgYSB0cmFuc2Zvcm1hdGlvbiB0aGF0IHByb3ZpZGVzICJwZXJmZWN0bHkiIGVxdWFsIHZhcmlhbmNlIG9yIG5vcm1hbGl0eSwgYnV0IGxpbmVhcml0eSBpcyBxdWl0ZSBpbXBvcnRhbnQuDQogDQpgYGB7cn0NCnh0IDwtIGMoIDIsIDQsIDksIDE2LCAyNSwgMzYsIDQ5LCA2NCwgODAsIDEwMykNCnl0IDwtIGMoIDQuMywgNS42LCA3LjUsIDkuMSwgMTAuOSwgMTMuMywgMTUuOCwgMTcuMCwgMTkuMCwgMjEuNCkNCnBhcihtZnJvdz1jKDIsMikpDQpwbG90KHl0fnh0LCBtYWluPSAiU2NhdHRlciBQbG90IHkgdnJzIHgiKQ0KcGxvdChsb2cxMCh5dCl+bG9nKHh0KSwgbWFpbj0gIlNjYXR0ZXIgUGxvdCBsb2coeSkgdnJzIGxvZyh4KSIpDQpwbG90KHNxcnQoeXQpfnh0LCBtYWluPSAiU2NhdHRlciBQbG90IHNxcnQoeSkgdnJzIHgiKQ0KcGxvdCh5dH5zcXJ0KHh0KSwgbWFpbj0gIlNjYXR0ZXIgUGxvdCB5IHZycyBzcXVydCh4KSIpDQoNCmBgYA0KDQoNCg==