education <- read.csv(url("https://www.dropbox.com/s/n3h7asxcdqai63z/time-series-educ-spending-and-gdp.csv?dl=1"), header = TRUE)
str(education)
## 'data.frame': 76 obs. of 13 variables:
## $ Year : int 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 ...
## $ GDP.Growth: num 3.68 4.57 3.72 3.94 2.79 2.99 2.19 2.21 3.13 3.31 ...
## $ Spending0 : num 8.71 10.37 8.96 8.54 16.25 ...
## $ Spending1 : num 14.02 8.71 10.37 8.96 8.54 ...
## $ Spending2 : num 22.37 14.02 8.71 10.37 8.96 ...
## $ Spending3 : num 16.29 22.37 14.02 8.71 10.37 ...
## $ Spending4 : num 22.2 16.29 22.37 14.02 8.71 ...
## $ Spending5 : num 18.6 22.2 16.3 22.4 14 ...
## $ Spending6 : num 18.1 18.6 22.2 16.3 22.4 ...
## $ Spending7 : num 9.67 18.09 18.59 22.2 16.29 ...
## $ Spending8 : num 11.3 9.67 18.09 18.59 22.2 ...
## $ Spending9 : num 9.9 11.3 9.67 18.09 18.59 ...
## $ Spending10: num 7.12 9.9 11.3 9.67 18.09 ...
The datasets contains the following varaibles:
model01 <- lm(GDP.Growth ~ .-Year, education)
summary(model01)
##
## Call:
## lm(formula = GDP.Growth ~ . - Year, data = education)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.72661 -0.20794 -0.00384 0.27241 0.73019
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.4903699 0.3153620 1.555 0.124891
## Spending0 0.0052824 0.0109759 0.481 0.631964
## Spending1 0.0083359 0.0120813 0.690 0.492699
## Spending2 0.0002438 0.0119456 0.020 0.983780
## Spending3 0.0229749 0.0113929 2.017 0.047938 *
## Spending4 0.0454640 0.0111519 4.077 0.000129 ***
## Spending5 0.0536340 0.0112994 4.747 1.2e-05 ***
## Spending6 0.0430536 0.0111948 3.846 0.000279 ***
## Spending7 0.0164127 0.0112916 1.454 0.150962
## Spending8 -0.0168607 0.0116929 -1.442 0.154190
## Spending9 0.0162755 0.0116351 1.399 0.166692
## Spending10 -0.0086728 0.0105007 -0.826 0.411916
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3602 on 64 degrees of freedom
## Multiple R-squared: 0.7443, Adjusted R-squared: 0.7003
## F-statistic: 16.93 on 11 and 64 DF, p-value: 5.076e-15
A multiple linear regression model denoted by model01 was calculated to model the GDP growth based on education spending per child in dollars for the past ten years. A highly significant regression equation was found, F\((11,64) = 16.93\), p\(<0.001\). The multiple R\(^2\) and adjusted R\(^2\) associated with the model is 0.7443 and 0.7003, respectively, which means 74.43% of the variance from the data was accounted by the model. However, several parameters are not significant. This means that the model may be improved by removing these insignificant variables out of the model.
model02 <- lm(GDP.Growth ~ Spending3 + Spending4 + Spending5 + Spending6, education)
summary(model02)
##
## Call:
## lm(formula = GDP.Growth ~ Spending3 + Spending4 + Spending5 +
## Spending6, data = education)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.75774 -0.25407 0.00984 0.27048 0.71564
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.712811 0.192566 3.702 0.000420 ***
## Spending3 0.028370 0.009984 2.841 0.005858 **
## Spending4 0.037798 0.010477 3.608 0.000571 ***
## Spending5 0.058398 0.010558 5.531 4.99e-07 ***
## Spending6 0.045468 0.009848 4.617 1.69e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.361 on 71 degrees of freedom
## Multiple R-squared: 0.715, Adjusted R-squared: 0.6989
## F-statistic: 44.53 on 4 and 71 DF, p-value: < 2.2e-16
Multiple regression denoted by model02 was calculated to model the GDP growth based on the spending in education per child last three, four, five, and six years ago. A highly significant regression equation was found, F\((4,71) = 44.53\), p\(<0.001\). The multiple R\(^2\) and adjusted R\(^2\) associated with the model is 0.715 and 0.6989, respectively, which means that 71.5% of the variance from the data was accounted by the model and.
In addition, the parameters associate with Intercept (\(\hat{\beta_0} = 0.71\), p\(<0.001\)), spending in education three years ago (\(\hat{\beta_3} = 0.03\), p\(=0.006\)), four years ago (\(\hat{\beta_4} = 0.0.04\), p\(<0.001\)), five years ago (\(\hat{\beta_0} = 0.0.06\), p\(<0.001\)), and six years ago (\(\hat{\beta_0} = 0.05\), p\(<0.001\)) are all significant. With that, the model associated with the dataset is given by
\[ \text{GDP Growth} = 0.71 + 0.03 \text{Spending}_3 + 0.04 \text{Spending}_4 + 0.06 \text{Spending}_5 + 0.05 \text{ Spending}_6 \]
which means that for every one dollar increase in the education spending per child, the GDP after 3 years will increase by 0.03 percent, on average. After 4 years, GDP growth will increase by 0.04%, on average. After 5 years, GDP growth will increase by 0.06%, on average. After 6 years, GDP growth will increase by 0.05%, on average.
par(mfrow=c(2,2))
plot(model02)
Residual analysis is performed to assess model02. Looking at the Residuals vs Fitted values plot there seems to be no funneling nor any pattern in the residuals which means that the residuals are randomly distributed. However, in the normal Q-Q plot, the graph suggests a slightly non-normal behavior. Cook’s distance suggests that no outliers is present in the residuals.
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
Acf(model02$residuals)
Autocorrelation plot was used to assess if there exist serial correlation in the residuals. Since all of the lags are within the acceptable region, this suggests that there exist no serial correlation in the residuals.
library(car)
## Loading required package: carData
vif(model02)
## Spending3 Spending4 Spending5 Spending6
## 1.219276 1.399989 1.416191 1.242929
Variance inflation factors (VIF) were used to assess if multicollinearity is present in the variables. Since the VIF of all the variables are less than 5, then we may assume that there is no multicollinearity in the variables.
sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_Philippines.1252 LC_CTYPE=English_Philippines.1252
## [3] LC_MONETARY=English_Philippines.1252 LC_NUMERIC=C
## [5] LC_TIME=English_Philippines.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] car_3.0-10 carData_3.0-4 forecast_8.13
##
## loaded via a namespace (and not attached):
## [1] zoo_1.8-8 tidyselect_1.1.0 xfun_0.18 purrr_0.3.4
## [5] urca_1.3-0 haven_2.3.1 lattice_0.20-41 colorspace_1.4-1
## [9] vctrs_0.3.4 generics_0.0.2 htmltools_0.5.0 yaml_2.2.1
## [13] rlang_0.4.8 pillar_1.4.6 foreign_0.8-80 glue_1.4.2
## [17] TTR_0.24.2 readxl_1.3.1 lifecycle_0.2.0 quantmod_0.4.17
## [21] stringr_1.4.0 timeDate_3043.102 munsell_0.5.0 gtable_0.3.0
## [25] cellranger_1.1.0 zip_2.1.1 evaluate_0.14 knitr_1.30
## [29] rio_0.5.16 tseries_0.10-47 forcats_0.5.0 lmtest_0.9-38
## [33] parallel_4.0.3 curl_4.3 xts_0.12.1 Rcpp_1.0.5
## [37] scales_1.1.1 abind_1.4-5 fracdiff_1.5-1 ggplot2_3.3.2
## [41] hms_0.5.3 digest_0.6.27 stringi_1.5.3 openxlsx_4.2.2
## [45] dplyr_1.0.2 grid_4.0.3 quadprog_1.5-8 tools_4.0.3
## [49] magrittr_1.5 tibble_3.0.4 crayon_1.3.4 pkgconfig_2.0.3
## [53] ellipsis_0.3.1 data.table_1.13.2 rmarkdown_2.5 R6_2.5.0
## [57] nnet_7.3-14 nlme_3.1-149 compiler_4.0.3