- This means one unit increase in x leads to beta2 increase in y
- Beta 1 is the intercept: if beta2 is zero y is beta1
- Variable type: outcome must be continuous. Predictors can be continuous or dichotomous
- Non-zero variance: Predictors MUST NOT HAVE zero variance
- Independent errors: For any pair of observation, the error terms are uncorrelated
- Linear relationship between x and y (check with scatterplot)
- No or little multicollinearity: Predictors x1, x2.. xn must not be highly correlated -
corr ()
- Homoscedasticity: For each value of the predictors the variance of the error term should be constant
- Normally-distributed Errors
A note about sample size:
In Linear regression the sample size rule of thumb is that the regression analysis requires at least 20 or 30 cases/observations per independent variable in the analysis.
- Correlation matrices:
corr()
Tolerance statistic: The % of variance in the independent variable that cannot be accounted for by the other predictors (smaller values indicate that a predictor is redundant). It should be at least higher than 0.1 or 0.2
Variance Inflation Factor (VIF): - The inverse of the tolerance statistic (higher values indicate that a predictor is redundant): Should be less then 10 or 20. In R
vif(lmodel)
is in thecar
package
- Standardized residuals:
- 95% ofstandardized residuals should lie between +-2
- 99% should lie between +-2.5
- standardized residuals of 3 or more are outliers
- Cook's distance: measures the influence of a single case on the model as a whole
- values higher than 4/n may be cause for concern
plot(lmmodel,which=4,id.n = 5)
- Homoscedasticity/indipendence of errors:
- residual versus fittet plot; plots standardized residuals against standardized predicted values
plot(lmmodel,which=1)
which=1 odnosi se na residuals vs fitted
plot (x,y)
The correlation for this plot is zero, this is why we need to plot data not just calculate corr