This project serves to explore the factors that explain fertiility rates when compared against other variables in the Human Developement Data set from the Human Development Report curated by the United Nations Development Programme.
Understanding fertility data is incredibly important to monitoring population growth and developing polices and procedures to care for citizens and improve quality of life. Fertility data helps to determine governmental budgeting, welfare programs, and more.
This data comes from the United Nation’s’ HUman Develpoment Data set from the Human Development Report curated by the United Naitons Development Programme.
What human development indicators explain a rise or fall in fertility rates?
To answer this question, I am begining by preforming a linear regression using the Total Fertility Rate from 1995-2000 (FERTILIT), against all other human development indicators that help to identify the indicators that are signficiant to the fertility rate.
Once sigficiant variables have been identifies, I will run a linear regression using only the signficiant variables to isolate the indicators that impact fertility, and compare the two models.
11 indicators have been identified as significant in relation to the raise and fall of fertility measures. They are as follow:
Public Telephones per 1,00 people 1996-1998 (PUB_PHON)
Industry as % GDP 1998 (IND98)
ODAS as % of GNP 1998 (AID92)
ODA per capita US$ 1998 (AID_CAP9)
ODA as % of GNP 1992 (AID92)
Annual Population Growth Rate 1975-1998 (PRATE759)
Annual Population Growth Rate 1998-2015 (PRATE98)
Population age 65 and above 1998 (POP65_98)
Daily per capita supply of calories 1997 (CALOR97)
Public expenditure on Education as % of GNP 1995-1997 (EDUCA95)
Life expectancy at birth average 1990-2000 (LIFE_EXP_95_00)
I have decidede to split the variables into two subsections: structural explanations, and ‘people’ explanations. Structural explanations include issues like infrastructure and include variables 1-5 (public telephones, industry, aid, etc.), and ‘people’ explanations relating to human behavior and include variables 6-11 (population growth, calorie supply, etc.).
##
## Call:
## lm(formula = FERTILIT ~ PUB_PHON + IND98 + AID_CAP9 + AID92,
## data = HDR_sub[, 3:78])
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.0721 -1.0945 -0.1800 0.8116 3.7887
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.4760352 0.2663526 13.051 < 2e-16 ***
## PUB_PHON -0.3037579 0.0535765 -5.670 6.47e-08 ***
## IND98 0.0006216 0.0076964 0.081 0.936
## AID_CAP9 0.0025065 0.0020250 1.238 0.218
## AID92 0.0368125 0.0087644 4.200 4.40e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.395 on 161 degrees of freedom
## (8 observations deleted due to missingness)
## Multiple R-squared: 0.3624, Adjusted R-squared: 0.3466
## F-statistic: 22.88 on 4 and 161 DF, p-value: 5.571e-15
## lag Autocorrelation D-W Statistic p-value
## 1 0.3389241 1.312965 0
## Alternative hypothesis: rho != 0
par(mfrow=c(2, 2)) # Set the plot window to display a 2x2 matrix of plots.
plot(model1)
par(mfrow=c(1, 1))
##
## Call:
## lm(formula = FERTILIT ~ PRATE98_ + POP65_98 + CALOR97 + EDUC95_97 +
## LIFE_EXP_95_00, data = HDR_sub[, 3:78])
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.83940 -0.20683 -0.02074 0.17748 1.23108
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.723e+00 2.754e-01 20.777 < 2e-16 ***
## PRATE98_ 1.383e+00 5.612e-02 24.644 < 2e-16 ***
## POP65_98 7.969e-02 1.244e-02 6.405 1.6e-09 ***
## CALOR97 -1.131e-04 4.391e-05 -2.577 0.0109 *
## EDUC95_97 2.511e-02 1.337e-02 1.878 0.0622 .
## LIFE_EXP_95_00 -6.920e-02 3.421e-03 -20.230 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3635 on 160 degrees of freedom
## (8 observations deleted due to missingness)
## Multiple R-squared: 0.957, Adjusted R-squared: 0.9556
## F-statistic: 711.5 on 5 and 160 DF, p-value: < 2.2e-16
dwt(model2)
## lag Autocorrelation D-W Statistic p-value
## 1 -0.1185452 2.211776 0.164
## Alternative hypothesis: rho != 0
par(mfrow=c(2, 2)) # Set the plot window to display a 2x2 matrix of plots.
plot(model2)
par(mfrow=c(1, 1))
Model 2 is a consideribly better predictor of fertility than model 1.
It can be understood that Model 2 (“People Factors”) can explain nearly 95% of the rate of change in fertility within the five factors:
Annual Population Growth Rate 1975-1998 (PRATE759)
Annual Population Growth Rate 1998-2015 (PRATE98)
Population age 65 and above 1998 (POP65_98)
Daily per capita supply of calories 1997 (CALOR97)
Public expenditure on Education as % of GNP 1995-1997 (EDUCA95)
Life expectancy at birth average 1990-2000 (LIFE_EXP_95_00)
Whereas, only 36% of the rate of change in fertilty can be explained by Model 2 (Structural Factors).
Additionally, when comparing the residual plots of model 1 and 2, model 1 displays significantly more heteroskedasticity than model 2. Although the Q-Q plot of model 1, is more linear than model 2.
Neither set of variables is above 10 when evaluated for VIF, therefore none of the variables are impacting each other.
vif(model1)
## PUB_PHON IND98 AID_CAP9 AID92
## 1.105846 1.023149 1.355676 1.446694
vif(model2)
## PRATE98_ POP65_98 CALOR97 EDUC95_97 LIFE_EXP_95_00
## 3.972260 4.056760 1.393828 1.233753 1.806295