We start the analysis with installing the packages: “lmtest” and “tseries” to find several useful diagnostics for example the Breusch-Pagan test of homoscedasticity.
Call the data:
data = read.csv("C:/CAMPUS/PROFESSIONAL PREPARATION/Portfolio/RPubs/SIMPLE LINEAR REGRESSION/Region,GVA,Labor Productivity,Busin.txt")
data = data.frame(data)
data
## Region GVA Labor.Productivity Business.Birth.Rate
## 1 Wales 3.6 81.5 9.3
## 2 Scotland 8.3 96.9 10.9
## 3 Northern Ireland 2.3 82.9 6.5
## 4 North of England 3.2 86.2 11.2
## 5 North West England 9.5 88.6 11.1
## 6 Yorkshire & Humberside 6.9 84.7 10.5
## 7 East Midlands 6.2 89.2 10.3
## 8 West Midlands 7.3 89.1 10.5
## 9 East Anglia 8.7 96.8 10.5
## 10 Greater London 21.6 139.7 14.6
## 11 South East England 14.7 108.3 10.8
## 12 South West England 7.7 89.8 9.6
Estimate the model that explains the GVA as a function of labor productivity
Call the data that use for analysis (the data that we will only use are GVA as the dependent variable (y) and Labor Productivity as the independent variable (x) which the Business Birth Rate will we use in other publication with the topic: Multiple Linear Regression)
y = data$GVA
x = data$Labor.Productivity
Plot the data on a scatter diagram
plot(x,y)
estimate a simple regression model and the fitted regression model line
model.1 <- lm(y ~ x)
plot(x, y)
abline(model.1, col = "blue", lwd = 2)
summary of the model, which contains point estimates of the parameters, their significance, the R-squared, the adjusted R-squared, and the F-test.
summary(model.1)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.5300 -0.8375 -0.4193 1.0386 3.0149
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -21.38866 3.19237 -6.700 5.36e-05 ***
## x 0.31460 0.03335 9.432 2.71e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.773 on 10 degrees of freedom
## Multiple R-squared: 0.899, Adjusted R-squared: 0.8889
## F-statistic: 88.97 on 1 and 10 DF, p-value: 2.708e-06
AIC(model.1)
## [1] 51.61663
BIC(model.1)
## [1] 53.07135
The more low AIC and BIC of the model, the more model is efficient.
confint is used to make sure that the slope is not equals to zero.
confint(model.1)
## 2.5 % 97.5 %
## (Intercept) -28.5016956 -14.2756203
## x 0.2402859 0.3889174
The effect of x on y is statistically significant and positive.
Breusch-Pagan heterogeneity test H0: There is no heteroscedasticity H1: There is heteroscedasticity
bptest(model.1)
##
## studentized Breusch-Pagan test
##
## data: model.1
## BP = 0.52669, df = 1, p-value = 0.468
The assumption of constant variance is met.
Jarque-Bera Normality test for regression model residuals. H0: The distribution of residuals is normal H1: The distribution of residuals is not normal
jarque.bera.test(model.1$residuals)
##
## Jarque Bera Test
##
## data: model.1$residuals
## X-squared = 0.37587, df = 2, p-value = 0.8287
you can use other normality test like Kolmogorov-Smirnov using fuction ks.test()
So the model that can we’ve founded is: y = -21.38866 + 0.31460x + e with all parameters are significant through F-stat (overall) and t-test (individuals), R-squared is 0.8889 which close to 1 means the model has good representation, no heteroscedasticity and the residuals is normal.
Reference: Piras, Gianfranco, and Giuseppe Arbia. A Primer for Spatial Econometrics: With Applications in R. Palgrave Macmillan, 2021.