Education Spending and GDP Growth

Dataset

education <- read.csv(url("https://www.dropbox.com/s/n3h7asxcdqai63z/time-series-educ-spending-and-gdp.csv?dl=1"), header = TRUE)
str(education)

## 'data.frame':    76 obs. of  13 variables:
##  $ Year      : int  1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 ...
##  $ GDP.Growth: num  3.68 4.57 3.72 3.94 2.79 2.99 2.19 2.21 3.13 3.31 ...
##  $ Spending0 : num  8.71 10.37 8.96 8.54 16.25 ...
##  $ Spending1 : num  14.02 8.71 10.37 8.96 8.54 ...
##  $ Spending2 : num  22.37 14.02 8.71 10.37 8.96 ...
##  $ Spending3 : num  16.29 22.37 14.02 8.71 10.37 ...
##  $ Spending4 : num  22.2 16.29 22.37 14.02 8.71 ...
##  $ Spending5 : num  18.6 22.2 16.3 22.4 14 ...
##  $ Spending6 : num  18.1 18.6 22.2 16.3 22.4 ...
##  $ Spending7 : num  9.67 18.09 18.59 22.2 16.29 ...
##  $ Spending8 : num  11.3 9.67 18.09 18.59 22.2 ...
##  $ Spending9 : num  9.9 11.3 9.67 18.09 18.59 ...
##  $ Spending10: num  7.12 9.9 11.3 9.67 18.09 ...

The datasets contains the following varaibles:

Year: The year the data was recorded.
GDP.Growth: The GDP growth of that year.
Spending0: The amount spent in education for that year.
Spending1: The amount spent in education for the previous year.
Spending2: The amount spent in education for the last two year.
Spending3: The amount spent in education for the last three year.
Spending4: The amount spent in education for the last four year.
Spending5: The amount spent in education for the last five year.
Spending5: The amount spent in education for the last six year.
Spending5: The amount spent in education for the last seven year.
Spending5: The amount spent in education for the last eight year.
Spending5: The amount spent in education for the last nine year.
Spending5: The amount spent in education for the last ten year.

Fit a Linear Model using All the Independent Variables

model01 <- lm(GDP.Growth ~ .-Year, education)
summary(model01)

## 
## Call:
## lm(formula = GDP.Growth ~ . - Year, data = education)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.72661 -0.20794 -0.00384  0.27241  0.73019 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.4903699  0.3153620   1.555 0.124891    
## Spending0    0.0052824  0.0109759   0.481 0.631964    
## Spending1    0.0083359  0.0120813   0.690 0.492699    
## Spending2    0.0002438  0.0119456   0.020 0.983780    
## Spending3    0.0229749  0.0113929   2.017 0.047938 *  
## Spending4    0.0454640  0.0111519   4.077 0.000129 ***
## Spending5    0.0536340  0.0112994   4.747  1.2e-05 ***
## Spending6    0.0430536  0.0111948   3.846 0.000279 ***
## Spending7    0.0164127  0.0112916   1.454 0.150962    
## Spending8   -0.0168607  0.0116929  -1.442 0.154190    
## Spending9    0.0162755  0.0116351   1.399 0.166692    
## Spending10  -0.0086728  0.0105007  -0.826 0.411916    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3602 on 64 degrees of freedom
## Multiple R-squared:  0.7443, Adjusted R-squared:  0.7003 
## F-statistic: 16.93 on 11 and 64 DF,  p-value: 5.076e-15

A multiple linear regression model denoted by model01 was calculated to model the GDP growth based on education spending per child in dollars for the past ten years. A highly significant regression equation was found, F\((11,64) = 16.93\), p\(<0.001\). The multiple R\(^2\) and adjusted R\(^2\) associated with the model is 0.7443 and 0.7003, respectively, which means 74.43% of the variance from the data was accounted by the model. However, several parameters are not significant. This means that the model may be improved by removing these insignificant variables out of the model.

Fit a Linear Model using All the Significant Variables

model02 <- lm(GDP.Growth ~ Spending3 + Spending4 + Spending5 + Spending6, education)
summary(model02)

## 
## Call:
## lm(formula = GDP.Growth ~ Spending3 + Spending4 + Spending5 + 
##     Spending6, data = education)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.75774 -0.25407  0.00984  0.27048  0.71564 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.712811   0.192566   3.702 0.000420 ***
## Spending3   0.028370   0.009984   2.841 0.005858 ** 
## Spending4   0.037798   0.010477   3.608 0.000571 ***
## Spending5   0.058398   0.010558   5.531 4.99e-07 ***
## Spending6   0.045468   0.009848   4.617 1.69e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.361 on 71 degrees of freedom
## Multiple R-squared:  0.715,  Adjusted R-squared:  0.6989 
## F-statistic: 44.53 on 4 and 71 DF,  p-value: < 2.2e-16

Multiple regression denoted by model02 was calculated to model the GDP growth based on the spending in education per child last three, four, five, and six years ago. A highly significant regression equation was found, F\((4,71) = 44.53\), p\(<0.001\). The multiple R\(^2\) and adjusted R\(^2\) associated with the model is 0.715 and 0.6989, respectively, which means that 71.5% of the variance from the data was accounted by the model and.

In addition, the parameters associate with Intercept (\(\hat{\beta_0} = 0.71\), p\(<0.001\)), spending in education three years ago (\(\hat{\beta_3} = 0.03\), p\(=0.006\)), four years ago (\(\hat{\beta_4} = 0.0.04\), p\(<0.001\)), five years ago (\(\hat{\beta_0} = 0.0.06\), p\(<0.001\)), and six years ago (\(\hat{\beta_0} = 0.05\), p\(<0.001\)) are all significant. With that, the model associated with the dataset is given by

\[ \text{GDP Growth} = 0.71 + 0.03 \text{Spending}_3 + 0.04 \text{Spending}_4 + 0.06 \text{Spending}_5 + 0.05 \text{ Spending}_6 \]

which means that for every one dollar increase in the education spending per child, the GDP after 3 years will increase by 0.03 percent, on average. After 4 years, GDP growth will increase by 0.04%, on average. After 5 years, GDP growth will increase by 0.06%, on average. After 6 years, GDP growth will increase by 0.05%, on average.

Diagnostic Checking

par(mfrow=c(2,2))
plot(model02)

Residual analysis is performed to assess model02. Looking at the Residuals vs Fitted values plot there seems to be no funneling nor any pattern in the residuals which means that the residuals are randomly distributed. However, in the normal Q-Q plot, the graph suggests a slightly non-normal behavior. Cook’s distance suggests that no outliers is present in the residuals.

library(forecast)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

Acf(model02$residuals)

Autocorrelation plot was used to assess if there exist serial correlation in the residuals. Since all of the lags are within the acceptable region, this suggests that there exist no serial correlation in the residuals.

library(car)

## Loading required package: carData

vif(model02)

## Spending3 Spending4 Spending5 Spending6 
##  1.219276  1.399989  1.416191  1.242929

Variance inflation factors (VIF) were used to assess if multicollinearity is present in the variables. Since the VIF of all the variables are less than 5, then we may assume that there is no multicollinearity in the variables.

Session Info for Reproducibility

sessionInfo()

## R version 4.0.3 (2020-10-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 18363)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_Philippines.1252  LC_CTYPE=English_Philippines.1252   
## [3] LC_MONETARY=English_Philippines.1252 LC_NUMERIC=C                        
## [5] LC_TIME=English_Philippines.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] car_3.0-10    carData_3.0-4 forecast_8.13
## 
## loaded via a namespace (and not attached):
##  [1] zoo_1.8-8         tidyselect_1.1.0  xfun_0.18         purrr_0.3.4      
##  [5] urca_1.3-0        haven_2.3.1       lattice_0.20-41   colorspace_1.4-1 
##  [9] vctrs_0.3.4       generics_0.0.2    htmltools_0.5.0   yaml_2.2.1       
## [13] rlang_0.4.8       pillar_1.4.6      foreign_0.8-80    glue_1.4.2       
## [17] TTR_0.24.2        readxl_1.3.1      lifecycle_0.2.0   quantmod_0.4.17  
## [21] stringr_1.4.0     timeDate_3043.102 munsell_0.5.0     gtable_0.3.0     
## [25] cellranger_1.1.0  zip_2.1.1         evaluate_0.14     knitr_1.30       
## [29] rio_0.5.16        tseries_0.10-47   forcats_0.5.0     lmtest_0.9-38    
## [33] parallel_4.0.3    curl_4.3          xts_0.12.1        Rcpp_1.0.5       
## [37] scales_1.1.1      abind_1.4-5       fracdiff_1.5-1    ggplot2_3.3.2    
## [41] hms_0.5.3         digest_0.6.27     stringi_1.5.3     openxlsx_4.2.2   
## [45] dplyr_1.0.2       grid_4.0.3        quadprog_1.5-8    tools_4.0.3      
## [49] magrittr_1.5      tibble_3.0.4      crayon_1.3.4      pkgconfig_2.0.3  
## [53] ellipsis_0.3.1    data.table_1.13.2 rmarkdown_2.5     R6_2.5.0         
## [57] nnet_7.3-14       nlme_3.1-149      compiler_4.0.3