ADEC 7320: Module 1 Discussion

Author

Will Brewster

I. Introduction

I chose the data set “World Development Indicators, 2011” (OpenIntro) which is composed of data from the World Bank. It is cross-sectional because it collects indicators from the world’s countries at one point in time. Some of the variables that are included in the data set are:

  • infant mortality (deaths per 1000 live births)
  • life expectancy (years)
  • sanitation access (% of population)
  • adolescent fertility rate (per 1000 women)
  • birth rate (per 1000 people)
  • adult literacy rate
  • GDP per capita (U.S. dollars)
Rows: 165
Columns: 13
$ country.name        <chr> "Afghanistan", "Albania", "Algeria", "Angola", "Ar…
$ inf.mort            <dbl> 73.4, 14.3, 22.8, 106.8, 12.7, 15.3, 3.8, 3.4, 32.…
$ life.expect         <dbl> 59.32795, 77.24059, 74.07000, 51.05932, 75.64905, …
$ sanit.access        <dbl> 29.9, 91.5, 86.8, 47.6, 95.2, 89.5, 100.0, 100.0, …
$ adol.fert           <dbl> 93.7132, 20.3584, 11.0822, 178.4360, 63.2980, 26.0…
$ edu.expend          <dbl> 4.08791, NA, NA, NA, 4.98632, 3.14385, 5.10608, 5.…
$ adult.lit           <dbl> 31.74112, 96.84530, NA, NA, NA, 99.74442, NA, NA, …
$ prim.edu.fem        <dbl> NA, NA, 95.26551, 39.77358, 110.72836, NA, NA, 96.…
$ birth.rate          <dbl> 37.636, 12.651, 24.921, 47.018, 18.018, 13.652, 13…
$ log.gdp.per.capita  <dbl> 6.433550, 8.397917, 8.602894, 8.527884, 9.502481, …
$ sanit.access.factor <chr> "low", "high", "high", "low", "high", "high", "hig…
$ sanit.access.num    <int> 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1,…
$ gdp.per.capita      <dbl> 622.3797, 4437.8120, 5447.4040, 5053.7386, 13392.9…

II. Data selection

The dependent variable for this analysis is GDP per capita. Since the data is incomplete for the categories for the education variables (adult literary, education spending, female education), I will not include these in the analysis, and instead use infant mortality, life expectancy, birth rate, and sanitation access.

\[ GDP\;per\;capita = \beta_0\; + \;\beta_1Infant\;Mortality \; +\;\beta_2Life\;Expectancy \;+\;\beta_3Sanitation\;+\;\beta_4Birth\;Rate\ \]

III. Multivariate Regression

Applying the lm() command and backwards elimination, we have the following :


Results
===========================================================================================================================================
                                                                        gdp.per.capita                                                     
                              (1)                     (2)                     (3)                     (4)                     (5)          
-------------------------------------------------------------------------------------------------------------------------------------------
inf.mort                   284.579*                                        -105.535                259.093*                261.475*        
                           (152.933)                                       (127.354)               (148.969)               (143.158)       
                                                                                                                                           
life.expect              1,741.744***            1,269.412***                                    1,853.415***            1,777.310***      
                           (416.314)               (332.525)                                       (397.974)               (407.258)       
                                                                                                                                           
sanit.access                80.077                  52.967                 178.673*                                         91.518         
                           (89.955)                (89.452)                (91.208)                                        (85.850)        
                                                                                                                                           
birth.rate                 -119.287                 55.960                 -342.102                -175.747                                
                           (272.698)               (257.893)               (280.984)               (257.280)                               
                                                                                                                                           
Constant                -118,563.700***         -79,607.960***            12,132.810            -118,752.000***         -123,880.300***    
                         (32,888.890)            (25,561.190)            (10,805.920)            (32,576.400)            (30,482.930)      
                                                                                                                                           
Observations                  163                     163                     163                     165                     163          
R2                           0.375                   0.361                   0.306                   0.376                   0.374         
Adjusted R2                  0.359                   0.349                   0.292                   0.365                   0.362         
Residual Std. Error  16,551.800 (df = 158)   16,679.490 (df = 159)   17,389.600 (df = 159)   16,454.950 (df = 161)   16,509.660 (df = 159) 
F Statistic         23.681*** (df = 4; 158) 29.957*** (df = 3; 159) 23.320*** (df = 3; 159) 32.364*** (df = 3; 161) 31.672*** (df = 3; 159)
-------------------------------------------------------------------------------------------------------------------------------------------
Notes:              ***Significant at the 1 percent level.                                                                                 
                    **Significant at the 5 percent level.                                                                                  
                    *Significant at the 10 percent level.                                                                                  

Results
===================================================================================================================
                                                            gdp.per.capita                                         
                              (1)                     (2)                     (3)                     (4)          
-------------------------------------------------------------------------------------------------------------------
inf.mort                   259.093*                                        -218.418*                217.520        
                           (148.969)                                       (114.762)               (135.746)       
                                                                                                                   
life.expect              1,853.415***            1,377.004***                                    1,933.981***      
                           (397.974)               (290.505)                                       (379.470)       
                                                                                                                   
birth.rate                 -175.747                  7.062                -530.835**                               
                           (257.280)               (236.293)               (260.941)                               
                                                                                                                   
Constant                -118,752.000***         -82,271.740***           32,121.100***          -127,153.800***    
                         (32,576.400)            (25,080.720)             (3,633.609)            (30,115.540)      
                                                                                                                   
Observations                  165                     165                     165                     165          
R2                           0.376                   0.364                   0.292                   0.374         
Adjusted R2                  0.365                   0.357                   0.283                   0.367         
Residual Std. Error  16,454.950 (df = 161)   16,557.470 (df = 162)   17,474.100 (df = 162)   16,427.840 (df = 162) 
F Statistic         32.364*** (df = 3; 161) 46.452*** (df = 2; 162) 33.432*** (df = 2; 162) 48.472*** (df = 2; 162)
-------------------------------------------------------------------------------------------------------------------
Notes:              ***Significant at the 1 percent level.                                                         
                    **Significant at the 5 percent level.                                                          
                    *Significant at the 10 percent level.                                                          

Call:
lm(formula = gdp.per.capita ~ inf.mort + life.expect, data = WDI2011)

Residuals:
   Min     1Q Median     3Q    Max 
-20901 -11028  -3787   6640  83373 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -127153.8    30115.5  -4.222 4.02e-05 ***
inf.mort        217.5      135.7   1.602    0.111    
life.expect    1934.0      379.5   5.097 9.54e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 16430 on 162 degrees of freedom
Multiple R-squared:  0.3744,    Adjusted R-squared:  0.3667 
F-statistic: 48.47 on 2 and 162 DF,  p-value: < 2.2e-16

Results
===========================================================================================
                                                gdp.per.capita                             
                              (1)                     (2)                     (3)          
-------------------------------------------------------------------------------------------
inf.mort                    217.520                                       -424.810***      
                           (135.746)                                       (54.152)        
                                                                                           
life.expect              1,933.981***            1,369.430***                              
                           (379.470)               (141.641)                               
                                                                                           
Constant                -127,153.800***         -81,584.130***           26,016.640***     
                         (30,115.540)             (9,957.242)             (2,068.678)      
                                                                                           
Observations                  165                     165                     165          
R2                           0.374                   0.364                   0.274         
Adjusted R2                  0.367                   0.361                   0.270         
Residual Std. Error  16,427.840 (df = 162)   16,506.650 (df = 163)   17,641.530 (df = 163) 
F Statistic         48.472*** (df = 2; 162) 93.477*** (df = 1; 163) 61.540*** (df = 1; 163)
-------------------------------------------------------------------------------------------
Notes:              ***Significant at the 1 percent level.                                 
                    **Significant at the 5 percent level.                                  
                    *Significant at the 10 percent level.                                  

Given that we cannot improve the \(R^2\) value any more, the best regression equation that we have is :

\[ GDP\;per\;capita = \beta_0\;+\; \beta_1Infant\;Mortality \; +\;\beta_2Life\;Expectancy \]

IV. Matrix Algebra Application

Only keeping the variables of Infant Mortality and Life Expectancy, the following confirms the previous answer using the formula \(A = (X^TX)^{-1}X^TY\) :

                 Betas Linear.Model
int         -127153.78   -127153.78
inf.mort        217.52       217.52
life.expect    1933.98      1933.98

V. Estimating standard error

Applying the R code from our R pubs document, code to my data set, we have the following:

[1] 165
[1] 269873795
                  int inf.mort life.expect
int         906945928 -3860413   -11404547
inf.mort     -3860413    18427       47826
life.expect -11404547    47826      143998
        int    inf.mort life.expect 
 30115.5430    135.7464    379.4702 

The standard errors of the coefficients match what the lm() function provides. So far we’ve learned that the slope of a linear equation \(\beta_1\) equals \(cov(x,y)/var(x)\). In calculating the standard errors of the additional variables, I have tried to think of it as each variable having it’s own slope. Given the formulas for matrix multiplication, it would make sense that the standard error for the entire equation would need to be spread among the variables that the equation is composed of. I’m not sure, however, if this reasoning is correct. Taking \((X^{T}X)^{-1}\) would summarize the variables in the equation that comprise the relationship of the linear regression. Would multiplying these numbers by the variance \(\sigma^2\) term provide each variable’s standard error - what are your thoughts?