- Ordinary Least Square Regression
- You want to explain something: dependent quantitative variable (outcome, response)
- Fit a model with independent variable(s) (predictor, explanatory)
Henk Harmsen
December 23, 2015
There are underlying assumptions for using OLS, which are often violated:
Can you explain the maize acreage from other variables?
You may just type:
fit = lm(maize.acres ~ bags.store.2013 + savings.2013 + cows + solar.light, data = df2)
Why is this not a good idea?
Always visualize your dataset before you work with it!
## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 6.857085e-01 7.974795e-02 8.5984468 1.686361e-15 ## bags.store.2013 1.453749e-02 1.494103e-02 0.9729909 3.316512e-01 ## savings.2013 -1.490552e-05 5.672169e-05 -0.2627834 7.929692e-01 ## cows1 1.007916e-01 1.083842e-01 0.9299478 3.534411e-01 ## solar.light1 5.816559e-01 1.027453e-01 5.6611421 4.777664e-08
Standard R offers diagnostic plots that you get by typing:
plot(fit)
A detailed summary is obtained by typing:
summary(fit)
Always check the p-value for the F-statistic first! It evaluates whether the results that you have were obtained by chance alone. With p > 0.05 you do not need to examine the outputs any further.
Normality is violated: this can be remediated by a so-called Box-Cox transformation.
The linearity assumption is violated; this can be remediated with a so-called Box-Tidwell transformation of the predictor variables.
The command is shown but not run for technical reasons:
library(car) # dfx2 = dfx2[bags.store.2013 != 0 & savings.2013 != 0 # & cows != 0 & solar.light != 0,] # boxTidwell(maize.acres ~., data = dfx2)
The plot on the next slide shows many of the things discussed in one plot. In particular:
You can conclude that: