library(faraway)
Error in library(faraway) : there is no package called ‘faraway’

The goal of this article is to present a brief overview of simple linear regression and its assumptions. We will be using simulated data to run properly sepcified regression, look how their diagnostic tests look like ie: assumptions are met. I then proceed to adjust the model such that one of the assumptions is no longer met and look at how that reflects on the diagnostics plots and overall regression result.

The general regression assumptions are subject to a lot of debate, for the purposes of this post, the regression assumptions we will be assuming and testing are the following:

In the example below we simulate a regression between two variables perfectly linear i.e no error component

In the example below we simulate a regression between two variables with a noise component ie error

set.seed(1994)
obs <- 5000
x2 <- runif(obs, min=-20,max=20)
e1 <- rnorm(obs, mean=1,sd=10)
y2 <- 2*x2 + e1
plot(x2,y2)

model2 <- lm(y2~x2)
summary(model2)

Call:
lm(formula = y2 ~ x2)

Residuals:
    Min      1Q  Median      3Q     Max 
-36.324  -6.689   0.055   6.620  36.585 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.63324    0.14020   4.517 6.42e-06 ***
x2           1.99326    0.01206 165.238  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 9.909 on 4998 degrees of freedom
Multiple R-squared:  0.8453,    Adjusted R-squared:  0.8452 
F-statistic: 2.73e+04 on 1 and 4998 DF,  p-value: < 2.2e-16
plot(model2)

In the example below we simulate a mispecified regression. We achieve this by using a log function, with a normally distributed error component

set.seed(1994) #set seed for reproducible random number generation
obs <- 5000 #number of observatoins to generate 
x3 <- runif(obs, min=1,max=20) #Randomly generate 5000 obs from a uniform distribution
e1 <- rnorm(obs, mean=0,sd=0.2)
y3 <- log(x3) + e1 #create a quadratic function from the the data variable 
plot(x3,y3) #plot the graph of the relationship of Y with X

model3 <- lm(y3~x3) #run a linear regression of the quadratic function Y on the varible X
summary(model3) #Summary stats of previous regression model 

Call:
lm(formula = y3 ~ x3)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.42171 -0.17977  0.02925  0.21195  0.83003 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.8545813  0.0092315   92.57   <2e-16 ***
x3          0.1227817  0.0007882  155.77   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3075 on 4998 degrees of freedom
Multiple R-squared:  0.8292,    Adjusted R-squared:  0.8292 
F-statistic: 2.426e+04 on 1 and 4998 DF,  p-value: < 2.2e-16
plot(model3) #diagnostic plots for the regression 

In the example below, we simulate a regression with heteroskedasticity of the error term.

set.seed(1994)
obs <- 5000
x4 <- runif(obs, min=0,max=20)
e1 <- rnorm(obs, mean=1,sd=sqrt(x4)) 
y4 <- 2*x4 + e1
plot(x4,y4)

model4 <- lm(y4~x4)
summary(model4)

Call:
lm(formula = y4 ~ x4)

Residuals:
     Min       1Q   Median       3Q      Max 
-16.2464  -1.7825   0.0146   1.6955  14.9166 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.981376   0.086711   11.32   <2e-16 ***
x4          1.990039   0.007599  261.87   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.121 on 4998 degrees of freedom
Multiple R-squared:  0.9321,    Adjusted R-squared:  0.9321 
F-statistic: 6.858e+04 on 1 and 4998 DF,  p-value: < 2.2e-16
plot(model4)

The following example introduces collinearity between the variables x1 and x2 and x3. We simulated two variables x1 and x2 and we summed the two with a noise component to get x3.

vif(model1)
Error in vif(model1) : could not find function "vif"

Logistic Regression

References:

https://stats.stackexchange.com/questions/258485/simulate-linear-regression-with-heteroscedasticity How to simulate heteroscedasdity of the error term in R
https://aosmith.rbind.io/2018/08/29/getting-started-simulating-data/ How to generate random data from various distributions
https://www.r-bloggers.com/simulating-endogeneity/ Simluating endogeneity http://people.duke.edu/~rnau/testing.htm Regression assumptions https://math.stackexchange.com/questions/1530571/linear-regression-model-assumptions Regression Assumptions https://stats.stackexchange.com/questions/276831/assumptions-of-multiple-regression-how-is-normality-assumption-different-from-c/276888 Regression Asusmptions https://nwfsc-timeseries.github.io/atsa-labs/sec-tslab-random-walks-rw.html Simulating a Random Walk https://daviddalpiaz.github.io/appliedstats/collinearity.html simulating multicolinearity in R https://www.econometrics-with-r.org/6-4-ols-assumptions-in-multiple-regression.html simulating multicolinearity in R (imperfect multi colinearity)

https://stats.stackexchange.com/questions/46523/how-to-simulate-artificial-data-for-logistic-regression/46525 Simulating logistic regression data

https://daviddalpiaz.github.io/appliedstats/logistic-regression.html#binary-response

https://stats.stackexchange.com/questions/321770/simulating-data-for-an-ordered-logit-model simulate ordinal logistic regression

LS0tDQp0aXRsZTogIlNpbXVsYXRlZCBSZWdyZXNzaW9uIHRvIEluc3BlY3QgQXNzdW1wdGlvbnMiDQpvdXRwdXQ6DQogIGh0bWxfbm90ZWJvb2s6IGRlZmF1bHQNCiAgcGRmX2RvY3VtZW50OiBkZWZhdWx0DQotLS0NCg0KYGBge3J9DQpsaWJyYXJ5KGZhcmF3YXkpDQpgYGANCg0KDQpUaGUgZ29hbCBvZiB0aGlzIGFydGljbGUgaXMgdG8gcHJlc2VudCBhIGJyaWVmIG92ZXJ2aWV3IG9mIHNpbXBsZSBsaW5lYXIgcmVncmVzc2lvbiBhbmQgaXRzIGFzc3VtcHRpb25zLiBXZSB3aWxsIGJlIHVzaW5nIHNpbXVsYXRlZCBkYXRhIHRvIHJ1biBwcm9wZXJseSBzZXBjaWZpZWQgcmVncmVzc2lvbiwgbG9vayBob3cgdGhlaXIgZGlhZ25vc3RpYyB0ZXN0cyBsb29rIGxpa2UgaWU6IGFzc3VtcHRpb25zIGFyZSBtZXQuIEkgdGhlbiBwcm9jZWVkIHRvIGFkanVzdCB0aGUgbW9kZWwgc3VjaCB0aGF0IG9uZSBvZiB0aGUgYXNzdW1wdGlvbnMgaXMgbm8gbG9uZ2VyIG1ldCBhbmQgbG9vayBhdCBob3cgdGhhdCByZWZsZWN0cyBvbiB0aGUgZGlhZ25vc3RpY3MgcGxvdHMgYW5kIG92ZXJhbGwgcmVncmVzc2lvbiByZXN1bHQuIA0KDQpUaGUgZ2VuZXJhbCByZWdyZXNzaW9uIGFzc3VtcHRpb25zIGFyZSBzdWJqZWN0IHRvIGEgbG90IG9mIGRlYmF0ZSwgZm9yIHRoZSBwdXJwb3NlcyBvZiB0aGlzIHBvc3QsIHRoZSByZWdyZXNzaW9uIGFzc3VtcHRpb25zIHdlIHdpbGwgYmUgYXNzdW1pbmcgYW5kIHRlc3RpbmcgYXJlIHRoZSBmb2xsb3dpbmc6IA0KDQoqIExpbmVhciByZWxhdGlvbnNoaXAgYmV0d2VlbiBZIGFuZCBYIA0KKiBIb21vc2NlZGFzdGljaXR5IG9mIHRoZSBlcnJvciB0ZXJtIA0KKiBFeG9nZW5laXR5IG9mIHRoZSBlcnJvci4gTWVhbiBvZiB0aGUgZXJyb3IgdGVybSBpcyB6ZXJvDQoNCg0KDQoqKkluIHRoZSBleGFtcGxlIGJlbG93IHdlIHNpbXVsYXRlIGEgcmVncmVzc2lvbiBiZXR3ZWVuIHR3byB2YXJpYWJsZXMgcGVyZmVjdGx5IGxpbmVhciBpLmUgbm8gZXJyb3IgY29tcG9uZW50KiogDQoNCmBgYHtyfQ0Kc2V0LnNlZWQoMTk5NCkgI3NldCBzZWVkIGZvciByZXByb2R1Y2libGUgcmFuZG9tIG51bWJlciBnZW5lcmF0aW9uDQpvYnMgPC0gMTAwICNudW1iZXIgb2Ygb2JzZXJ2YXRvaW5zIHRvIGdlbmVyYXRlIA0KeDEgPC0gcnVuaWYob2JzLCBtaW49LTIwLG1heD0yMCkgI1JhbmRvbWx5IGdlbmVyYXRlIDUwMDAgb2JzIGZyb20gYSB1bmlmb3JtIGRpc3RyaWJ1dGlvbg0KeTEgPC0geDENCnBsb3QoeDEseTEpDQptb2RlbDEgPC0gbG0oeTF+eDEpDQpzdW1tYXJ5KG1vZGVsMSkNCnBsb3QobW9kZWwxKQ0KYGBgDQoNCioqSW4gdGhlIGV4YW1wbGUgYmVsb3cgd2Ugc2ltdWxhdGUgYSByZWdyZXNzaW9uIGJldHdlZW4gdHdvIHZhcmlhYmxlcyB3aXRoIGEgbm9pc2UgY29tcG9uZW50IGllIGVycm9yKioNCg0KYGBge3J9DQpzZXQuc2VlZCgxOTk0KQ0Kb2JzIDwtIDUwMDANCngyIDwtIHJ1bmlmKG9icywgbWluPS0yMCxtYXg9MjApDQplMSA8LSBybm9ybShvYnMsIG1lYW49MSxzZD0xMCkNCnkyIDwtIDIqeDIgKyBlMQ0KcGxvdCh4Mix5MikNCm1vZGVsMiA8LSBsbSh5Mn54MikNCnN1bW1hcnkobW9kZWwyKQ0KcGxvdChtb2RlbDIpDQpgYGANCg0KDQoqKkluIHRoZSBleGFtcGxlIGJlbG93IHdlIHNpbXVsYXRlIGEgbWlzcGVjaWZpZWQgcmVncmVzc2lvbi4gV2UgYWNoaWV2ZSB0aGlzIGJ5IHVzaW5nIGEgbG9nIGZ1bmN0aW9uLCB3aXRoIGEgbm9ybWFsbHkgZGlzdHJpYnV0ZWQgZXJyb3IgY29tcG9uZW50KiogDQoNCmBgYHtyfQ0Kc2V0LnNlZWQoMTk5NCkgI3NldCBzZWVkIGZvciByZXByb2R1Y2libGUgcmFuZG9tIG51bWJlciBnZW5lcmF0aW9uDQpvYnMgPC0gNTAwMCAjbnVtYmVyIG9mIG9ic2VydmF0b2lucyB0byBnZW5lcmF0ZSANCngzIDwtIHJ1bmlmKG9icywgbWluPTEsbWF4PTIwKSAjUmFuZG9tbHkgZ2VuZXJhdGUgNTAwMCBvYnMgZnJvbSBhIHVuaWZvcm0gZGlzdHJpYnV0aW9uDQplMSA8LSBybm9ybShvYnMsIG1lYW49MCxzZD0wLjIpDQp5MyA8LSBsb2coeDMpICsgZTEgI2NyZWF0ZSBhIHF1YWRyYXRpYyBmdW5jdGlvbiBmcm9tIHRoZSB0aGUgZGF0YSB2YXJpYWJsZSANCnBsb3QoeDMseTMpICNwbG90IHRoZSBncmFwaCBvZiB0aGUgcmVsYXRpb25zaGlwIG9mIFkgd2l0aCBYDQptb2RlbDMgPC0gbG0oeTN+eDMpICNydW4gYSBsaW5lYXIgcmVncmVzc2lvbiBvZiB0aGUgcXVhZHJhdGljIGZ1bmN0aW9uIFkgb24gdGhlIHZhcmlibGUgWA0Kc3VtbWFyeShtb2RlbDMpICNTdW1tYXJ5IHN0YXRzIG9mIHByZXZpb3VzIHJlZ3Jlc3Npb24gbW9kZWwgDQpwbG90KG1vZGVsMykgI2RpYWdub3N0aWMgcGxvdHMgZm9yIHRoZSByZWdyZXNzaW9uIA0KYGBgDQoNCioqSW4gdGhlIGV4YW1wbGUgYmVsb3csIHdlIHNpbXVsYXRlIGEgcmVncmVzc2lvbiB3aXRoIGhldGVyb3NrZWRhc3RpY2l0eSBvZiB0aGUgZXJyb3IgdGVybS4qKiANCg0KYGBge3J9DQpzZXQuc2VlZCgxOTk0KQ0Kb2JzIDwtIDUwMDANCng0IDwtIHJ1bmlmKG9icywgbWluPTAsbWF4PTIwKQ0KZTEgPC0gcm5vcm0ob2JzLCBtZWFuPTEsc2Q9c3FydCh4NCkpIA0KeTQgPC0gMip4NCArIGUxDQpwbG90KHg0LHk0KQ0KbW9kZWw0IDwtIGxtKHk0fng0KQ0Kc3VtbWFyeShtb2RlbDQpDQpwbG90KG1vZGVsNCkNCmBgYA0KDQoNCg0KVGhlIGZvbGxvd2luZyBleGFtcGxlIGludHJvZHVjZXMgY29sbGluZWFyaXR5IGJldHdlZW4gdGhlIHZhcmlhYmxlcyB4MSBhbmQgeDIgYW5kIHgzLiBXZSBzaW11bGF0ZWQgdHdvIHZhcmlhYmxlcyB4MSBhbmQgeDIgYW5kIHdlIHN1bW1lZCB0aGUgdHdvIHdpdGggYSBub2lzZSANCmNvbXBvbmVudCB0byBnZXQgeDMuIA0KDQpgYGB7cn0NCnNldC5zZWVkKDE5OTQpDQpvYnMgPC0gMTAwMA0KeCA8LSBydW5pZihvYnMsIDEwLDEwMCkNCg0KeDEgPC0gMip4DQp4MiA8LSAzKnggKyAzICsgZQ0KeDMgPC0geDEgKyBlMQ0KDQpkYXRhIDwtIGRhdGEuZnJhbWUoeDEseDIseDMpDQoNCmUgPC0gcm5vcm0ob2JzLDEwLDMwKQ0KZTEgPC0gcm5vcm0ob2JzLCA1LCA0NSkNCg0KeSA8LSB4MSArIHgyICsgeDMgKyBlICANCg0KbW9kZWwxIDwtIGxtKHkgfiB4MSArIHgyKQ0Kc3VtbWFyeShtb2RlbDEpDQp2aWYobW9kZWwxKQ0KcGxvdChtb2RlbDEpDQoNCg0KbW9kZWwyIDwtIGxtKHkgfiB4MSArIHgyICsgeDMpDQpzdW1tYXJ5KG1vZGVsMikNCnBsb3QobW9kZWwyKQ0KDQpwYWlycyhkYXRhKQ0KICAgICAgICAgICAgIA0KICANCnBsb3QoeDIseDEpDQpgYGANCg0KIyBMb2dpc3RpYyBSZWdyZXNzaW9uIA0KDQpgYGB7cn0NCnNldC5zZWVkKDY2NikNCngxID0gcm5vcm0oMTAwMCkgICAgICAgICAgICMgc29tZSBjb250aW51b3VzIHZhcmlhYmxlcyANCngyID0gcm5vcm0oMTAwMCkNCnogPSAxICsgMip4MSArIDMqeDIgICAgICAgICMgbGluZWFyIGNvbWJpbmF0aW9uIHdpdGggYSBiaWFzDQpwciA9IDEvKDErZXhwKC16KSkgICAgICAgICAjIHBhc3MgdGhyb3VnaCBhbiBpbnYtbG9naXQgZnVuY3Rpb24NCnkgPSByYmlub20oMTAwMCwxLHByKSANCnBsb3Qoeix5KQ0Kc3VtbWFyeShsbSh5fngxK3gyKSkNCmBgYA0KDQoNCg0KDQoqKlJlZmVyZW5jZXM6KioNCg0KaHR0cHM6Ly9zdGF0cy5zdGFja2V4Y2hhbmdlLmNvbS9xdWVzdGlvbnMvMjU4NDg1L3NpbXVsYXRlLWxpbmVhci1yZWdyZXNzaW9uLXdpdGgtaGV0ZXJvc2NlZGFzdGljaXR5IEhvdyB0byBzaW11bGF0ZSBoZXRlcm9zY2VkYXNkaXR5IG9mIHRoZSBlcnJvciB0ZXJtIGluIFIgIA0KaHR0cHM6Ly9hb3NtaXRoLnJiaW5kLmlvLzIwMTgvMDgvMjkvZ2V0dGluZy1zdGFydGVkLXNpbXVsYXRpbmctZGF0YS8gSG93IHRvIGdlbmVyYXRlIHJhbmRvbSBkYXRhIGZyb20gdmFyaW91cyBkaXN0cmlidXRpb25zICANCmh0dHBzOi8vd3d3LnItYmxvZ2dlcnMuY29tL3NpbXVsYXRpbmctZW5kb2dlbmVpdHkvIFNpbWx1YXRpbmcgZW5kb2dlbmVpdHkgDQpodHRwOi8vcGVvcGxlLmR1a2UuZWR1L35ybmF1L3Rlc3RpbmcuaHRtIFJlZ3Jlc3Npb24gYXNzdW1wdGlvbnMgDQpodHRwczovL21hdGguc3RhY2tleGNoYW5nZS5jb20vcXVlc3Rpb25zLzE1MzA1NzEvbGluZWFyLXJlZ3Jlc3Npb24tbW9kZWwtYXNzdW1wdGlvbnMgUmVncmVzc2lvbiBBc3N1bXB0aW9ucyANCmh0dHBzOi8vc3RhdHMuc3RhY2tleGNoYW5nZS5jb20vcXVlc3Rpb25zLzI3NjgzMS9hc3N1bXB0aW9ucy1vZi1tdWx0aXBsZS1yZWdyZXNzaW9uLWhvdy1pcy1ub3JtYWxpdHktYXNzdW1wdGlvbi1kaWZmZXJlbnQtZnJvbS1jLzI3Njg4OCBSZWdyZXNzaW9uIEFzdXNtcHRpb25zIA0KaHR0cHM6Ly9ud2ZzYy10aW1lc2VyaWVzLmdpdGh1Yi5pby9hdHNhLWxhYnMvc2VjLXRzbGFiLXJhbmRvbS13YWxrcy1ydy5odG1sIFNpbXVsYXRpbmcgYSBSYW5kb20gV2FsayANCmh0dHBzOi8vZGF2aWRkYWxwaWF6LmdpdGh1Yi5pby9hcHBsaWVkc3RhdHMvY29sbGluZWFyaXR5Lmh0bWwgc2ltdWxhdGluZyBtdWx0aWNvbGluZWFyaXR5IGluIFIgDQpodHRwczovL3d3dy5lY29ub21ldHJpY3Mtd2l0aC1yLm9yZy82LTQtb2xzLWFzc3VtcHRpb25zLWluLW11bHRpcGxlLXJlZ3Jlc3Npb24uaHRtbCBzaW11bGF0aW5nIG11bHRpY29saW5lYXJpdHkgaW4gUiAoaW1wZXJmZWN0IG11bHRpIGNvbGluZWFyaXR5KQ0KDQpodHRwczovL3N0YXRzLnN0YWNrZXhjaGFuZ2UuY29tL3F1ZXN0aW9ucy80NjUyMy9ob3ctdG8tc2ltdWxhdGUtYXJ0aWZpY2lhbC1kYXRhLWZvci1sb2dpc3RpYy1yZWdyZXNzaW9uLzQ2NTI1IFNpbXVsYXRpbmcgbG9naXN0aWMgcmVncmVzc2lvbiBkYXRhDQoNCmh0dHBzOi8vZGF2aWRkYWxwaWF6LmdpdGh1Yi5pby9hcHBsaWVkc3RhdHMvbG9naXN0aWMtcmVncmVzc2lvbi5odG1sI2JpbmFyeS1yZXNwb25zZQ0KDQpodHRwczovL3N0YXRzLnN0YWNrZXhjaGFuZ2UuY29tL3F1ZXN0aW9ucy8zMjE3NzAvc2ltdWxhdGluZy1kYXRhLWZvci1hbi1vcmRlcmVkLWxvZ2l0LW1vZGVsIHNpbXVsYXRlIG9yZGluYWwgbG9naXN0aWMgcmVncmVzc2lvbiANCg0KDQoNCg==