DATA 606 Final Project

Introduction

As human populations grow, human demands for resources like water, land, trees, and energy also grow. Unfortunately, the price of all this “growth” is paid for by other endangered plants and animals and an increasingly volatile and dangerous climate.The population is one of the important factors which helps to balance the environment, the population should in a balance with the means and resources. If the population will be balanced, then all the needs and demand of the people can be easily fulfilled, which helps to preserve the environment of the country. I decided to study how the population has changed over time, and this is important as it allows us to study how changes to the population, such as the aging population phenomenon we are witnessing, can lead to a decrease in GDP and also an increase in mechanisation.

Research question

My research question is to investigate how has fertility rate changed over the past 9 years around the world,the most important reasons which caused the change, and if the result has lead to an aging/younger population.

Cases

What are the cases, and how many are there? In this dataset there are 47 groups based on Geographical area and economic status.

Data collection

Describe the method of data collection. Data is collected from The World Bank databank World Development Indicators.

Type of study

This is an observational study

Data Source

Data can be found at: (https://databank.worldbank.org/source/world-development-indicators).

Dependent Variable

The dependent variables is Fertility rate and it’s numerical.

Independent Variable

The Independent Variable are Geographical area and economic status.(categorical)

Data Preparation

#load fertility data
fertility = read.csv("https://raw.githubusercontent.com/ErindaB/Data-606/master/Fertility.csv",header = TRUE, sep = ",")

#Rename column
names(fertility)<- c("Country","Country.Code","Series.Name","Series.Code","YR1970","YR1980","YR1990","YR2000","YR2010","YR2017")

fertility <- fertility[,c('Country','Series.Code','YR1970','YR1980','YR1990','YR2000','YR2010','YR2017')]
fertility <- fertility %>% 
  mutate_all(na_if,"..")%>%
  select(-Series.Code)%>%
   mutate_if(is.factor, as.character)%>%
    mutate(Country = as.factor(Country))%>%
mutate_if(is.character,as.numeric)
datatable(fertility)

#load population young age data
young_pop_raw = read.csv("https://raw.githubusercontent.com/ErindaB/Data-606/master/Young_pop.csv",header = TRUE, sep = ",")
#rename column
names(young_pop_raw)<- c("Country","Country.Code","Series.Name","Series.Code","YR1970","YR1980","YR1990","YR2000","YR2010","YR2017")

young_pop <- young_pop_raw[,c('Country','Series.Code','YR1970','YR1980','YR1990','YR2000','YR2010','YR2017')]
young_pop <- young_pop %>% 
  mutate_all(na_if,"..")%>%
  select(-Series.Code)%>%
   mutate_if(is.factor, as.character)%>%
    mutate(Country = as.factor(Country))%>%

    mutate_if(is.character,as.numeric)

fertility$Code<-NULL#remove code column
fert <- gather(fertility, "Year","Rate",-Country)
fert_US <- fert %>% filter (Country == "North America")
fert_EU <- fert %>% filter (Country == "European Union")
fert_W <- fert %>% filter (Country == "World")
F_High_income <- fert %>% filter (Country == "High income")
F_low_income <- fert %>% filter (Country == "Low income")

young_pop <- gather(young_pop, "Year","Rate",-Country)

Exploratory Analysis

I decided to study the fertility rate in USA and Europe.

ggplot(fert_US, aes(Year, Rate, group = 1, color = Rate)) +
         geom_point() +
         geom_line() +
         labs(x = "Year", y = " Fertility Rate in US", 
         title = " Fertility Rate in US from 1970-2017") +
         theme(plot.title = element_text(hjust = 0.5))

As the plot shows there is a sharp decrease of the fertility rate from 1970-2017.Eventhough we see a slight increase during 1990-2000 the trend went down again.

ggplot(fert_EU, aes(Year, Rate, group = 1, color = Rate)) +
         geom_point() +
         geom_line() +
         labs(x = "Year", y = "Fertility Rate in EU", 
         title = "Fertility Rate in EU from 1970-2017") +
         theme(plot.title = element_text(hjust = 0.5))

According to the plot there is a sharp decline in the fertility rate in EU from 1970-2010 but surprisingly after 2010 the trend is going up.

Let’s have a look on how incomes may have an impact on the fertility rate.

ggplot(F_High_income, aes(Year, Rate, group = 1, color = Rate)) +
         geom_point() +
         geom_line() +
         labs(x = "Year", y = " World Fertility Rate  ", 
         title = " World Fertility Rate  from 1970-2017 for women with high incomes") +
         theme(plot.title = element_text(hjust = 0.5))

According to the plot there is a decrease of families having babies with high incomes from 1970-2010 and going forward this trend is steady.

ggplot(F_low_income, aes(Year, Rate, group = 1, color = Rate)) +
         geom_point() +
         geom_line() +
         labs(x = "Year", y = " World  Fertility Rate in US", 
         title = "World Fertility Rate  from 1970-2017 for women with low incomes ") +
         theme(plot.title = element_text(hjust = 0.5))

Surprisingly for the families with low incomes we notice a rise from 1970-1990 and then the trend goes down sharply.

Linear Regression

For the Regression model I want to see if incomes(low&high) have an impact on the world fertility rate.

H0:β1−A1=0 No Difference between lower incomes and world fertility rate

H0:β1−A1≠0 Difference exists between higher incomes and world fertility rate

all_data <- cbind(F_low_income,fert_W$Rate)
names(all_data) <- c("categ", "Year", "low_income_rate","world_tot_rate")

wide_data <- gather(all_data,"type", "value", 3,4)
wide_data$categ<-NULL
wide_data

##      Year            type    value
## 1  YR1970 low_income_rate 6.544771
## 2  YR1980 low_income_rate 6.661355
## 3  YR1990 low_income_rate 6.614926
## 4  YR2000 low_income_rate 6.369016
## 5  YR2010 low_income_rate 5.858910
## 6  YR2017 low_income_rate 5.144005
## 7  YR1970  world_tot_rate 4.979307
## 8  YR1980  world_tot_rate 4.777407
## 9  YR1990  world_tot_rate 3.713659
## 10 YR2000  world_tot_rate 3.248356
## 11 YR2010  world_tot_rate 2.695863
## 12 YR2017  world_tot_rate 2.516273

ggplot(data = wide_data, aes(x = Year, y = value, fill = type)) +
                 geom_bar(stat="identity", position="dodge") +
                 theme(legend.position = "dodge") +
                 xlab("Year") + ylab("Low income Rate and World fertility  Rate ") + 
                 ggtitle("Low income Rate and World fertility  Rate from  1970-2017") +
                 theme(plot.title = element_text(hjust = 0.5))

model <- lm(low_income_rate ~ world_tot_rate, data = all_data)
summary(model)

## 
## Call:
## lm(formula = low_income_rate ~ world_tot_rate, data = all_data)
## 
## Residuals:
##        1        2        3        4        5        6 
## -0.25732 -0.04875  0.38944  0.35551  0.09710 -0.53598 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)   
## (Intercept)      4.5336     0.6518   6.956  0.00224 **
## world_tot_rate   0.4556     0.1726   2.640  0.05761 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.401 on 4 degrees of freedom
## Multiple R-squared:  0.6353, Adjusted R-squared:  0.5441 
## F-statistic: 6.967 on 1 and 4 DF,  p-value: 0.05761

plot(all_data$low_income_rate ~ all_data$world_tot_rate)
abline(model)

all_data_h <- cbind(F_High_income,fert_W$Rate)
names(all_data_h) <- c("categ", "Year", "high_income_rate","world_tot_rate")

wide_data_h <- gather(all_data_h,"type", "value", 3,4)
wide_data_h$categ<-NULL
wide_data_h

##      Year             type    value
## 1  YR1970 high_income_rate 3.034193
## 2  YR1980 high_income_rate 2.535259
## 3  YR1990 high_income_rate 1.970714
## 4  YR2000 high_income_rate 1.851859
## 5  YR2010 high_income_rate 1.708876
## 6  YR2017 high_income_rate 1.705173
## 7  YR1970   world_tot_rate 4.979307
## 8  YR1980   world_tot_rate 4.777407
## 9  YR1990   world_tot_rate 3.713659
## 10 YR2000   world_tot_rate 3.248356
## 11 YR2010   world_tot_rate 2.695863
## 12 YR2017   world_tot_rate 2.516273

ggplot(data = wide_data_h, aes(x = Year, y = value, fill = type)) +
                 geom_bar(stat="identity", position="dodge") +
                 theme(legend.position = "dodge") +
                 xlab("Year") + ylab("High income Rate and World fertility  Rate ") + 
                 ggtitle("High income Rate and World fertility  Rate from  1970-2017") +
                 theme(plot.title = element_text(hjust = 0.5))

model_h <- lm(high_income_rate ~ world_tot_rate, data = all_data_h)
summary(model)

## 
## Call:
## lm(formula = low_income_rate ~ world_tot_rate, data = all_data)
## 
## Residuals:
##        1        2        3        4        5        6 
## -0.25732 -0.04875  0.38944  0.35551  0.09710 -0.53598 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)   
## (Intercept)      4.5336     0.6518   6.956  0.00224 **
## world_tot_rate   0.4556     0.1726   2.640  0.05761 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.401 on 4 degrees of freedom
## Multiple R-squared:  0.6353, Adjusted R-squared:  0.5441 
## F-statistic: 6.967 on 1 and 4 DF,  p-value: 0.05761

plot(all_data_h$high_income_rate ~ all_data_h$world_tot_rate)
abline(model_h)

The linear regression gave a low R2 of 0.5441 and a very low p-value ,and also from the shape of the plot it looks like the variable income (low or high) does have a significant impact on the fertility rate.It has a considerable good relationship with the fertility rate, therefore the null hypothesis falls down.

Is the world population ageing?

ggplot(young_pop, aes(Year, Rate, group = 1, color = Rate)) +
         geom_point() +
         geom_line() +
         labs(x = "Year", y = " World  young population rate", 
         title = "World  young population rate  from 1970-2017") +
         theme(plot.title = element_text(hjust = 0.5))

It is evident that world population is ageing as the young population rate has a sharp decline from 1970-2017.That means the world older population continues to grow at an unprecedented rate.

Conclusions

-The feritlity rate has a sharp decline that leads to a slowly population growth.

-Using the linear Regression we can conclude that the income variable(low or high) does have an impact on the fertility rate.

-The younger population rate continues to decrease at a very fast speed that means the world’s population is getting older.