Introduction

This project serves to explore the factors that explain fertiility rates when compared against other variables in the Human Developement Data set from the Human Development Report curated by the United Nations Development Programme.

Background and Importance

Understanding fertility data is incredibly important to monitoring population growth and developing polices and procedures to care for citizens and improve quality of life. Fertility data helps to determine governmental budgeting, welfare programs, and more.

Data

This data comes from the United Nation’s’ HUman Develpoment Data set from the Human Development Report curated by the United Naitons Development Programme.

Methods

What human development indicators explain a rise or fall in fertility rates?

To answer this question, I am begining by preforming a linear regression using the Total Fertility Rate from 1995-2000 (FERTILIT), against all other human development indicators that help to identify the indicators that are signficiant to the fertility rate.

Once sigficiant variables have been identifies, I will run a linear regression using only the signficiant variables to isolate the indicators that impact fertility, and compare the two models.

Results

11 indicators have been identified as significant in relation to the raise and fall of fertility measures. They are as follow:

Public Telephones per 1,00 people 1996-1998 (PUB_PHON)
Industry as % GDP 1998 (IND98)
ODAS as % of GNP 1998 (AID92)
ODA per capita US$ 1998 (AID_CAP9)
ODA as % of GNP 1992 (AID92)
Annual Population Growth Rate 1975-1998 (PRATE759)
Annual Population Growth Rate 1998-2015 (PRATE98)
Population age 65 and above 1998 (POP65_98)
Daily per capita supply of calories 1997 (CALOR97)
Public expenditure on Education as % of GNP 1995-1997 (EDUCA95)
Life expectancy at birth average 1990-2000 (LIFE_EXP_95_00)

I have decidede to split the variables into two subsections: structural explanations, and ‘people’ explanations. Structural explanations include issues like infrastructure and include variables 1-5 (public telephones, industry, aid, etc.), and ‘people’ explanations relating to human behavior and include variables 6-11 (population growth, calorie supply, etc.).

Linear Regression Model 1: Structural Factors

## 
## Call:
## lm(formula = FERTILIT ~ PUB_PHON + IND98 + AID_CAP9 + AID92, 
##     data = HDR_sub[, 3:78])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0721 -1.0945 -0.1800  0.8116  3.7887 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.4760352  0.2663526  13.051  < 2e-16 ***
## PUB_PHON    -0.3037579  0.0535765  -5.670 6.47e-08 ***
## IND98        0.0006216  0.0076964   0.081    0.936    
## AID_CAP9     0.0025065  0.0020250   1.238    0.218    
## AID92        0.0368125  0.0087644   4.200 4.40e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.395 on 161 degrees of freedom
##   (8 observations deleted due to missingness)
## Multiple R-squared:  0.3624, Adjusted R-squared:  0.3466 
## F-statistic: 22.88 on 4 and 161 DF,  p-value: 5.571e-15

Controlling for Variance Inflation for Model 1- Structural Factors

Durbin-Watson Test (testing for presence of auto-correlation) for Model 1 - Structural Factors

##  lag Autocorrelation D-W Statistic p-value
##    1       0.3389241      1.312965       0
##  Alternative hypothesis: rho != 0

par(mfrow=c(2, 2)) # Set the plot window to display a 2x2 matrix of plots. 
plot(model1)

par(mfrow=c(1, 1))

Linear Regression for Model 1: “People” Factors

## 
## Call:
## lm(formula = FERTILIT ~ PRATE98_ + POP65_98 + CALOR97 + EDUC95_97 + 
##     LIFE_EXP_95_00, data = HDR_sub[, 3:78])
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.83940 -0.20683 -0.02074  0.17748  1.23108 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     5.723e+00  2.754e-01  20.777  < 2e-16 ***
## PRATE98_        1.383e+00  5.612e-02  24.644  < 2e-16 ***
## POP65_98        7.969e-02  1.244e-02   6.405  1.6e-09 ***
## CALOR97        -1.131e-04  4.391e-05  -2.577   0.0109 *  
## EDUC95_97       2.511e-02  1.337e-02   1.878   0.0622 .  
## LIFE_EXP_95_00 -6.920e-02  3.421e-03 -20.230  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3635 on 160 degrees of freedom
##   (8 observations deleted due to missingness)
## Multiple R-squared:  0.957,  Adjusted R-squared:  0.9556 
## F-statistic: 711.5 on 5 and 160 DF,  p-value: < 2.2e-16

dwt(model2)

##  lag Autocorrelation D-W Statistic p-value
##    1      -0.1185452      2.211776   0.164
##  Alternative hypothesis: rho != 0

par(mfrow=c(2, 2)) # Set the plot window to display a 2x2 matrix of plots. 
plot(model2)

par(mfrow=c(1, 1))

Interpretation

Model 2 is a consideribly better predictor of fertility than model 1.

When comparing the R-square statistics:

It can be understood that Model 2 (“People Factors”) can explain nearly 95% of the rate of change in fertility within the five factors:

Annual Population Growth Rate 1975-1998 (PRATE759)
Annual Population Growth Rate 1998-2015 (PRATE98)
Population age 65 and above 1998 (POP65_98)
Daily per capita supply of calories 1997 (CALOR97)
Public expenditure on Education as % of GNP 1995-1997 (EDUCA95)
Life expectancy at birth average 1990-2000 (LIFE_EXP_95_00)

Whereas, only 36% of the rate of change in fertilty can be explained by Model 2 (Structural Factors).

Assessing the plots:

Additionally, when comparing the residual plots of model 1 and 2, model 1 displays significantly more heteroskedasticity than model 2. Although the Q-Q plot of model 1, is more linear than model 2.

Assessing VIF:

Neither set of variables is above 10 when evaluated for VIF, therefore none of the variables are impacting each other.

vif(model1)

## PUB_PHON    IND98 AID_CAP9    AID92 
## 1.105846 1.023149 1.355676 1.446694

vif(model2)

##       PRATE98_       POP65_98        CALOR97      EDUC95_97 LIFE_EXP_95_00 
##       3.972260       4.056760       1.393828       1.233753       1.806295

Durbin Watson Test

Ideally we want the D-W statistic to fall between 1.5-2.5, which only model 2 does. This indicates that there is no autocorrelation, or that the variables are ‘double’ counting for either impact on the dependent variable (fertility).

Assignment 1

Nona-Marie Jones

3/11/2020