As human populations grow, human demands for resources like water, land, trees, and energy also grow. Unfortunately, the price of all this “growth” is paid for by other endangered plants and animals and an increasingly volatile and dangerous climate.The population is one of the important factors which helps to balance the environment, the population should in a balance with the means and resources. If the population will be balanced, then all the needs and demand of the people can be easily fulfilled, which helps to preserve the environment of the country. I decided to study how the population has changed over time, and this is important as it allows us to study how changes to the population, such as the aging population phenomenon we are witnessing, can lead to a decrease in GDP and also an increase in mechanisation.
My research question is to investigate how has fertility rate changed over the past 9 years around the world,the most important reasons which caused the change, and if the result has lead to an aging/younger population.
What are the cases, and how many are there? In this dataset there are 47 groups based on Geographical area and economic status.
Describe the method of data collection. Data is collected from The World Bank databank World Development Indicators.
This is an observational study
Data can be found at: (https://databank.worldbank.org/source/world-development-indicators).
The dependent variables is Fertility rate and it’s numerical.
The Independent Variable are Geographical area and economic status.(categorical)
#load fertility data
fertility = read.csv("https://raw.githubusercontent.com/ErindaB/Data-606/master/Fertility.csv",header = TRUE, sep = ",")
#Rename column
names(fertility)<- c("Country","Country.Code","Series.Name","Series.Code","YR1970","YR1980","YR1990","YR2000","YR2010","YR2017")
fertility <- fertility[,c('Country','Series.Code','YR1970','YR1980','YR1990','YR2000','YR2010','YR2017')]
fertility <- fertility %>%
mutate_all(na_if,"..")%>%
select(-Series.Code)%>%
mutate_if(is.factor, as.character)%>%
mutate(Country = as.factor(Country))%>%
mutate_if(is.character,as.numeric)
datatable(fertility)
#load population young age data
young_pop_raw = read.csv("https://raw.githubusercontent.com/ErindaB/Data-606/master/Young_pop.csv",header = TRUE, sep = ",")
#rename column
names(young_pop_raw)<- c("Country","Country.Code","Series.Name","Series.Code","YR1970","YR1980","YR1990","YR2000","YR2010","YR2017")
young_pop <- young_pop_raw[,c('Country','Series.Code','YR1970','YR1980','YR1990','YR2000','YR2010','YR2017')]
young_pop <- young_pop %>%
mutate_all(na_if,"..")%>%
select(-Series.Code)%>%
mutate_if(is.factor, as.character)%>%
mutate(Country = as.factor(Country))%>%
mutate_if(is.character,as.numeric)
fertility$Code<-NULL#remove code column
fert <- gather(fertility, "Year","Rate",-Country)
fert_US <- fert %>% filter (Country == "North America")
fert_EU <- fert %>% filter (Country == "European Union")
fert_W <- fert %>% filter (Country == "World")
F_High_income <- fert %>% filter (Country == "High income")
F_low_income <- fert %>% filter (Country == "Low income")
young_pop <- gather(young_pop, "Year","Rate",-Country)
I decided to study the fertility rate in USA and Europe.
ggplot(fert_US, aes(Year, Rate, group = 1, color = Rate)) +
geom_point() +
geom_line() +
labs(x = "Year", y = " Fertility Rate in US",
title = " Fertility Rate in US from 1970-2017") +
theme(plot.title = element_text(hjust = 0.5))
As the plot shows there is a sharp decrease of the fertility rate from 1970-2017.Eventhough we see a slight increase during 1990-2000 the trend went down again.
ggplot(fert_EU, aes(Year, Rate, group = 1, color = Rate)) +
geom_point() +
geom_line() +
labs(x = "Year", y = "Fertility Rate in EU",
title = "Fertility Rate in EU from 1970-2017") +
theme(plot.title = element_text(hjust = 0.5))
According to the plot there is a sharp decline in the fertility rate in EU from 1970-2010 but surprisingly after 2010 the trend is going up.
Let’s have a look on how incomes may have an impact on the fertility rate.
ggplot(F_High_income, aes(Year, Rate, group = 1, color = Rate)) +
geom_point() +
geom_line() +
labs(x = "Year", y = " World Fertility Rate ",
title = " World Fertility Rate from 1970-2017 for women with high incomes") +
theme(plot.title = element_text(hjust = 0.5))
According to the plot there is a decrease of families having babies with high incomes from 1970-2010 and going forward this trend is steady.
ggplot(F_low_income, aes(Year, Rate, group = 1, color = Rate)) +
geom_point() +
geom_line() +
labs(x = "Year", y = " World Fertility Rate in US",
title = "World Fertility Rate from 1970-2017 for women with low incomes ") +
theme(plot.title = element_text(hjust = 0.5))
Surprisingly for the families with low incomes we notice a rise from 1970-1990 and then the trend goes down sharply.
For the Regression model I want to see if incomes(low&high) have an impact on the world fertility rate.
H0:β1−A1=0 No Difference between lower incomes and world fertility rate
H0:β1−A1≠0 Difference exists between higher incomes and world fertility rate
all_data <- cbind(F_low_income,fert_W$Rate)
names(all_data) <- c("categ", "Year", "low_income_rate","world_tot_rate")
wide_data <- gather(all_data,"type", "value", 3,4)
wide_data$categ<-NULL
wide_data
## Year type value
## 1 YR1970 low_income_rate 6.544771
## 2 YR1980 low_income_rate 6.661355
## 3 YR1990 low_income_rate 6.614926
## 4 YR2000 low_income_rate 6.369016
## 5 YR2010 low_income_rate 5.858910
## 6 YR2017 low_income_rate 5.144005
## 7 YR1970 world_tot_rate 4.979307
## 8 YR1980 world_tot_rate 4.777407
## 9 YR1990 world_tot_rate 3.713659
## 10 YR2000 world_tot_rate 3.248356
## 11 YR2010 world_tot_rate 2.695863
## 12 YR2017 world_tot_rate 2.516273
ggplot(data = wide_data, aes(x = Year, y = value, fill = type)) +
geom_bar(stat="identity", position="dodge") +
theme(legend.position = "dodge") +
xlab("Year") + ylab("Low income Rate and World fertility Rate ") +
ggtitle("Low income Rate and World fertility Rate from 1970-2017") +
theme(plot.title = element_text(hjust = 0.5))
model <- lm(low_income_rate ~ world_tot_rate, data = all_data)
summary(model)
##
## Call:
## lm(formula = low_income_rate ~ world_tot_rate, data = all_data)
##
## Residuals:
## 1 2 3 4 5 6
## -0.25732 -0.04875 0.38944 0.35551 0.09710 -0.53598
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.5336 0.6518 6.956 0.00224 **
## world_tot_rate 0.4556 0.1726 2.640 0.05761 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.401 on 4 degrees of freedom
## Multiple R-squared: 0.6353, Adjusted R-squared: 0.5441
## F-statistic: 6.967 on 1 and 4 DF, p-value: 0.05761
plot(all_data$low_income_rate ~ all_data$world_tot_rate)
abline(model)
all_data_h <- cbind(F_High_income,fert_W$Rate)
names(all_data_h) <- c("categ", "Year", "high_income_rate","world_tot_rate")
wide_data_h <- gather(all_data_h,"type", "value", 3,4)
wide_data_h$categ<-NULL
wide_data_h
## Year type value
## 1 YR1970 high_income_rate 3.034193
## 2 YR1980 high_income_rate 2.535259
## 3 YR1990 high_income_rate 1.970714
## 4 YR2000 high_income_rate 1.851859
## 5 YR2010 high_income_rate 1.708876
## 6 YR2017 high_income_rate 1.705173
## 7 YR1970 world_tot_rate 4.979307
## 8 YR1980 world_tot_rate 4.777407
## 9 YR1990 world_tot_rate 3.713659
## 10 YR2000 world_tot_rate 3.248356
## 11 YR2010 world_tot_rate 2.695863
## 12 YR2017 world_tot_rate 2.516273
ggplot(data = wide_data_h, aes(x = Year, y = value, fill = type)) +
geom_bar(stat="identity", position="dodge") +
theme(legend.position = "dodge") +
xlab("Year") + ylab("High income Rate and World fertility Rate ") +
ggtitle("High income Rate and World fertility Rate from 1970-2017") +
theme(plot.title = element_text(hjust = 0.5))
model_h <- lm(high_income_rate ~ world_tot_rate, data = all_data_h)
summary(model)
##
## Call:
## lm(formula = low_income_rate ~ world_tot_rate, data = all_data)
##
## Residuals:
## 1 2 3 4 5 6
## -0.25732 -0.04875 0.38944 0.35551 0.09710 -0.53598
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.5336 0.6518 6.956 0.00224 **
## world_tot_rate 0.4556 0.1726 2.640 0.05761 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.401 on 4 degrees of freedom
## Multiple R-squared: 0.6353, Adjusted R-squared: 0.5441
## F-statistic: 6.967 on 1 and 4 DF, p-value: 0.05761
plot(all_data_h$high_income_rate ~ all_data_h$world_tot_rate)
abline(model_h)
The linear regression gave a low R2 of 0.5441 and a very low p-value ,and also from the shape of the plot it looks like the variable income (low or high) does have a significant impact on the fertility rate.It has a considerable good relationship with the fertility rate, therefore the null hypothesis falls down.
ggplot(young_pop, aes(Year, Rate, group = 1, color = Rate)) +
geom_point() +
geom_line() +
labs(x = "Year", y = " World young population rate",
title = "World young population rate from 1970-2017") +
theme(plot.title = element_text(hjust = 0.5))
It is evident that world population is ageing as the young population rate has a sharp decline from 1970-2017.That means the world older population continues to grow at an unprecedented rate.
-The feritlity rate has a sharp decline that leads to a slowly population growth.
-Using the linear Regression we can conclude that the income variable(low or high) does have an impact on the fertility rate.
-The younger population rate continues to decrease at a very fast speed that means the world’s population is getting older.