Instructions to run this code Extract the data files from the zip data folder Make sure when extracted the data folder is in the same directory as this rmd file Make sure an extra data folder is not created where the path output is currrent_dir/data/data/fertility_data for example. Here is an example of a correct output upon extraction currrent_dir/data/fertility_data
This dataset below contains historical data from 1950 to 2024 for multiple countries, capturing key economic and demographic indicators.
This project investigates the relationship between population size and fertility rates across five countries: Bangladesh, China, Egypt, Japan, and Niger. Using data from multiple years all the way back to 1960, we apply linear regression models to examine how population size impacts fertility rates, with a particular focus on the statistical significance of the relationship. Our findings reveal that population size is a significant predictor of fertility rates in all countries analyzed, with very small p-values confirming the rejection of the null hypothesis. The models show varying degrees of fit, with countries like Bangladesh, Egypt, and Niger exhibiting high \(R^2\) values, indicating that population size explains a large portion of the variation in fertility rates. However, Japan shows a weaker relationship, suggesting the influence of other factors. Confidence intervals for the population coefficient further support the negative relationship between population size and fertility rates. While the models fit well overall, residual analysis suggests some unexplained variance, highlighting the need for further exploration of additional variables such as urbanization or economic development. This study contributes to understanding the demographic dynamics across different nations and suggests avenues for future research into other factors influencing fertility.
libraries
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.1
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.1
library(readr)
## Warning: package 'readr' was built under R version 4.4.1
library(scales)
## Warning: package 'scales' was built under R version 4.4.1
##
## Attaching package: 'scales'
## The following object is masked from 'package:readr':
##
## col_factor
library(lubridate)
## Warning: package 'lubridate' was built under R version 4.4.1
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(purrr)
## Warning: package 'purrr' was built under R version 4.4.1
##
## Attaching package: 'purrr'
## The following object is masked from 'package:scales':
##
## discard
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.4.1
## Warning: package 'tibble' was built under R version 4.4.1
## Warning: package 'tidyr' was built under R version 4.4.1
## Warning: package 'stringr' was built under R version 4.4.1
## Warning: package 'forcats' was built under R version 4.4.1
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ tibble 3.2.1
## ✔ stringr 1.5.1 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ scales::col_factor() masks readr::col_factor()
## ✖ purrr::discard() masks scales::discard()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggcorrplot)
## Warning: package 'ggcorrplot' was built under R version 4.4.2
Step 1:
load and merge the data
read_and_clean_csv_files <- function(data_directory, pattern, suffix) {
csv_files <- list.files(path = data_directory, pattern = pattern, full.names = TRUE)
read_and_clean_csv <- function(file) {
country_name <- gsub(suffix, "", basename(file))
country_name <- gsub("\\.csv$", "", country_name)
country_name <- gsub("-", " ", country_name)
country_name <- str_to_title(country_name)
df <- read.csv(file, stringsAsFactors = FALSE, sep = ",")
df$country_name <- country_name
return(df)
}
list_of_dfs <- map(csv_files, read_and_clean_csv)
combined_data <- bind_rows(list_of_dfs)
return(combined_data)
}
# Load different datasets using the generic function
combined_fertility_data <- read_and_clean_csv_files("data/fertility_replacement_rate", "*.csv", "_fertility_replacement_rate.csv")
combined_gdp_data <- read_and_clean_csv_files("data/gdp", "*.csv", "-gdp-gross-domestic-product.csv")
combined_population_data <- read_and_clean_csv_files("data/population", "*.csv", "-population-2024-10-12.csv")
combined_urbanization_data <- read_and_clean_csv_files("data/urbanization", "*.csv", "-urban-population.csv")
colnames(combined_fertility_data)
## [1] "date" "Births.per.Woman" "Annual...Change" "country_name"
colnames(combined_gdp_data)
## [1] "date" "GDP...Billions.of.US..."
## [3] "Per.Capita..US..." "Annual...Change"
## [5] "country_name"
colnames(combined_population_data)
## [1] "date" "Population" "Annual...Change" "country_name"
## [5] "Births.per.Woman"
colnames(combined_urbanization_data)
## [1] "date" "Urban.Population" "X..of.Total" "Annual...Change"
## [5] "country_name"
combined_fertility_data <- combined_fertility_data %>%
select(date, Births.per.Woman, country_name)
combined_gdp_data <- combined_gdp_data %>%
select(date, GDP...Billions.of.US..., country_name)
combined_population_data <- combined_population_data %>%
select(date, Population, country_name)
combined_urbanization_data <- combined_urbanization_data %>%
select(date, Urban.Population, country_name)
You should phrase your research question in a way that matches up with the scope of inference your dataset allows for. What is the relationship between fertility(replacement) rates and economic indicators such as GDP, population size, and urbanization across various countries from 1950 to 2024?
What are the cases, and how many are there? Cases: Each case is an annual observation of a country with data on fertility rate, GDP, population, and urbanization.
Total Cases: The dataset includes 6 countries data of fertility, gdp, urbanization and population over a 75-year period (1950–2024). Assuming data is available for each year and each country, there would be about 450 cases (6 countries × 75 years).
The data was collected from Macrotends(www.macrotrends.net). The website provided me with csv files to download
What type of study is this (observational/experiment)?
This is an observational study. The data was collected through historial data over time
Are they quantitative or qualitative
If you are are running a regression or similar model, which one is your dependent variable?
Quantitative Variables: Births per Woman (Fertility Rate): Represents the average number of children per woman GDP (Billions of US Dollars): The economic output of each country, measured in billions of U.S. dollars. Population: The total population of each country. Urban Population: The number of people living in urban areas.
Qualitative Variable: Country Name: Represents the name of the country
Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
fertility(replacement rate)
combined_fertility_data$date <- as.Date(combined_fertility_data$date, format = "%Y-%m-%d")
combined_fertility_data <- combined_fertility_data %>%
filter(!is.na(date) & !is.na(Births.per.Woman) &
date >= as.Date("1950-01-01") & date <= as.Date("2024-12-31"))
plot_country_trends <- function(df, x_column, y_column, title, x_label, y_label) {
ggplot(df, aes_string(x = x_column, y = y_column, color = "country_name", group = "country_name")) +
geom_line(size = 1) +
geom_point(size = 1.5) +
labs(title = title, x = x_label, y = y_label) +
scale_x_date(date_breaks = "10 years", date_labels = "%Y") +
theme_minimal() +
theme(legend.title = element_blank())
}
fertility_rate_plot <- plot_country_trends(
df = combined_fertility_data,
x_column = "date",
y_column = "Births.per.Woman",
title = "Fertility Rate by Country",
x_label = "Year",
y_label = "Births per Woman"
)
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
print(fertility_rate_plot)
gdp
combined_gdp_data$date <- as.Date(combined_gdp_data$date, format = "%Y-%m-%d")
combined_gdp_data <- combined_gdp_data %>%
filter(!is.na(date) & !is.na(GDP...Billions.of.US...) &
date >= as.Date("1950-01-01") & date <= as.Date("2024-12-31"))
plot_country_trends <- function(df, x_column, y_column, title, x_label, y_label) {
ggplot(df, aes_string(x = x_column, y = y_column, color = "country_name", group = "country_name")) +
geom_line(size = 1) +
geom_point(size = 1.5) +
labs(title = title, x = x_label, y = y_label) +
scale_x_date(date_breaks = "10 years", date_labels = "%Y") +
scale_y_continuous(labels = function(x) paste0(x / 1e3, "B")) +
theme_minimal() +
theme(legend.title = element_blank())
}
advanced_economies <- combined_gdp_data %>%
filter(country_name %in% c("United States", "China", "Japan"))
gdp_other_countries <- combined_gdp_data %>%
filter(!country_name %in% c("United States", "China", "Japan"))
advanced_economies_plot <- plot_country_trends(
df = advanced_economies,
x_column = "date",
y_column = "GDP...Billions.of.US...",
title = "GDP for USA, China, Japan",
x_label = "Year",
y_label = "GDP in Billions of US Dollars"
)
gdp_other_countries_plot <- plot_country_trends(
df = gdp_other_countries,
x_column = "date",
y_column = "GDP...Billions.of.US...",
title = "GDP for Other Countries",
x_label = "Year",
y_label = "GDP in Billions of US Dollars"
)
print(advanced_economies_plot)
print(gdp_other_countries_plot)
population
combined_population_data$date <- as.Date(combined_population_data$date, format = "%Y-%m-%d")
combined_population_data <- combined_population_data %>%
filter(!is.na(date) & !is.na(Population) &
date >= as.Date("1950-01-01") & date <= as.Date("2024-12-31"))
china_population_data <- combined_population_data %>%
filter(country_name == "China")
rest_of_countries_population_data <- combined_population_data %>%
filter(country_name != "China")
plot_country_trends <- function(df, x_column, y_column, title, x_label, y_label) {
ggplot(df, aes_string(x = x_column, y = y_column, color = "country_name", group = "country_name")) +
geom_line(size = 1) +
geom_point(size = 1.5) +
labs(title = title, x = x_label, y = y_label) +
scale_x_date(date_breaks = "10 years", date_labels = "%Y") +
scale_y_continuous(
breaks = seq(0, max(df[[y_column]], na.rm = TRUE), by = 50e6),
labels = function(x) paste0(x / 1e6, "M")
) +
theme_minimal() +
theme(legend.title = element_blank())
}
china_population_plot <- plot_country_trends(
df = china_population_data,
x_column = "date",
y_column = "Population",
title = "Population for China",
x_label = "Year",
y_label = "Population (in Millions)"
)
rest_of_countries_population_plot <- plot_country_trends(
df = rest_of_countries_population_data,
x_column = "date",
y_column = "Population",
title = "Population for Other Countries",
x_label = "Year",
y_label = "Population (in Millions)"
)
print(china_population_plot)
print(rest_of_countries_population_plot)
urbanization
combined_urbanization_data$date <- as.Date(combined_urbanization_data$date, format = "%Y-%m-%d")
combined_urbanization_data <- combined_urbanization_data %>%
filter(!is.na(date) & !is.na(Urban.Population) &
date >= as.Date("1950-01-01") & date <= as.Date("2024-12-31"))
china_urbanization_data <- combined_urbanization_data %>%
filter(country_name == "China")
japan_usa_urbanization_data <- combined_urbanization_data %>%
filter(country_name %in% c("Japan", "United States"))
rest_of_countries_urbanization_data <- combined_urbanization_data %>%
filter(!country_name %in% c("China", "Japan", "United States"))
plot_country_trends <- function(df, x_column, y_column, title, x_label, y_label, y_break) {
ggplot(df, aes_string(x = x_column, y = y_column, color = "country_name", group = "country_name")) +
geom_line(size = 1) +
geom_point(size = 1.5) +
labs(title = title, x = x_label, y = y_label) +
scale_x_date(date_breaks = "10 years", date_labels = "%Y") +
scale_y_continuous(
breaks = seq(0, max(df[[y_column]], na.rm = TRUE), by = y_break),
labels = function(x) paste0(x / 1e6, "M")
) +
theme_minimal() +
theme(legend.title = element_blank())
}
china_urban_population_plot <- plot_country_trends(
df = china_urbanization_data,
x_column = "date",
y_column = "Urban.Population",
title = "Urban Population for China",
x_label = "Year",
y_label = "Urban Population (in Millions)",
y_break = 1e8
)
japan_usa_urban_population_plot <- plot_country_trends(
df = japan_usa_urbanization_data,
x_column = "date",
y_column = "Urban.Population",
title = "Urban Population for Japan and USA",
x_label = "Year",
y_label = "Urban Population (in Millions)",
y_break = 5e7
)
rest_of_countries_urban_population_plot <- plot_country_trends(
df = rest_of_countries_urbanization_data,
x_column = "date",
y_column = "Urban.Population",
title = "Urban Population for Other Countries",
x_label = "Year",
y_label = "Urban Population (in Millions)",
y_break = 1e7
)
print(china_urban_population_plot)
print(japan_usa_urban_population_plot)
print(rest_of_countries_urban_population_plot)
summary(combined_fertility_data)
## date Births.per.Woman country_name
## Min. :1950-12-31 Min. :1.298 Length:450
## 1st Qu.:1968-12-31 1st Qu.:1.956 Class :character
## Median :1987-12-31 Median :3.277 Mode :character
## Mean :1987-12-31 Mean :4.083
## 3rd Qu.:2006-12-31 3rd Qu.:6.515
## Max. :2024-12-31 Max. :7.900
summary(combined_gdp_data)
## date GDP...Billions.of.US... country_name
## Min. :1960-12-31 Min. : 0.00 Length:378
## 1st Qu.:1975-12-31 1st Qu.: 13.42 Class :character
## Median :1991-12-31 Median : 151.97 Mode :character
## Mean :1991-12-31 Mean : 2398.80
## 3rd Qu.:2007-12-31 3rd Qu.: 2720.92
## Max. :2022-12-31 Max. :25439.70
summary(combined_population_data)
## date Population country_name
## Min. :1950-12-31 Min. :2.569e+06 Length:375
## 1st Qu.:1968-12-31 1st Qu.:3.360e+07 Class :character
## Median :1987-12-31 Median :9.596e+07 Mode :character
## Mean :1987-12-31 Mean :2.674e+08
## 3rd Qu.:2006-12-31 3rd Qu.:1.418e+08
## Max. :2024-12-31 Max. :1.426e+09
summary(combined_urbanization_data)
## date Urban.Population country_name
## Min. :1960-12-31 Min. : 202606 Length:378
## 1st Qu.:1975-12-31 1st Qu.: 14072736 Class :character
## Median :1991-12-31 Median : 62633360 Mode :character
## Mean :1991-12-31 Mean :122819890
## 3rd Qu.:2007-12-31 3rd Qu.:159357196
## Max. :2022-12-31 Max. :897578430
# Histogram for Fertility Rates
ggplot(combined_fertility_data, aes(x = `Births.per.Woman`)) +
geom_histogram(binwidth = 0.5, fill = "skyblue", color = "black") +
labs(title = "Histogram of Fertility Rates", x = "Births per Woman", y = "Frequency") +
theme_minimal()
# Histogram for GDP
ggplot(combined_gdp_data, aes(x = `GDP...Billions.of.US...`)) +
geom_histogram(binwidth = 500, fill = "lightgreen", color = "black") +
labs(title = "Histogram of GDP", x = "GDP (Billions of US Dollars)", y = "Frequency") +
theme_minimal()
1. Histogram of Fertility Rates: The distribution of fertility rates is
bimodal: One peak is around 2 births per woman, likely reflecting
countries with lower fertility rates for example, developed nations.
Another peak is around 6–7 births per woman, likely representing
countries with higher fertility rates in developing nations The spread
of the data shows significant variation across countries and time
periods, indicating diverse demographic trends. Lower fertility rates
could indicate countries with higher urbanization or GDP, as suggested
by global trends. 2. Histogram of GDP: The GDP data is highly skewed to
the right: Most observations fall under lower GDP values (closer to 0),
which likely represent smaller or developing economies. A small number
of observations have extremely high GDP like the United States and
China. This highlights the disparity in economic output between
countries and the concentration of wealth in a few nations. Insights:
Fertility rates appear to cluster into two groups, reflecting global
demographic differences between countries. GDP distribution is highly
uneven, which may impact fertility rates due to economic factors like
income levels, education, and healthcare access. These distributions
suggest that relationships between fertility rates and GDP might not be
linear and I will show the regression results in the analysis
section.
# Boxplot for Fertility Rates by Country
ggplot(combined_fertility_data, aes(x = country_name, y = Births.per.Woman, fill = country_name)) +
geom_boxplot() +
labs(title = "Boxplot of Fertility Rates by Country", x = "Country", y = "Births per Woman") +
theme_minimal() +
theme(legend.position = "none")
# Boxplot for GDP by Country
ggplot(combined_gdp_data, aes(x = country_name, y = GDP...Billions.of.US..., fill = country_name)) +
geom_boxplot() +
labs(title = "Boxplot of GDP by Country", x = "Country", y = "GDP (Billions of US Dollars)") +
theme_minimal() +
theme(legend.position = "none")
Boxplot of Fertility Rates by Country: Bangladesh, Egypt, and Niger:
These countries have high median fertility rates, with Niger having the
highest median and range, indicating consistently high fertility rates
over time. China and the United States: These countries exhibit lower
fertility rates, with China showing a larger range, indicating a
significant drop over time (consistent with China’s one-child policy
era). The United States has a smaller range with a lower median,
indicating relatively stable fertility rates. Japan: Japan has the
lowest fertility rates and the least variability, indicating a
long-standing low fertility trend. Outliers:
combined_fertility_data$date <- as.Date(combined_fertility_data$date, format = "%Y-%m-%d")
combined_gdp_data$date <- as.Date(combined_gdp_data$date, format = "%Y-%m-%d")
combined_population_data$date <- as.Date(combined_population_data$date, format = "%Y-%m-%d")
combined_urbanization_data$date <- as.Date(combined_urbanization_data$date, format = "%Y-%m-%d")
correlation_data <- combined_fertility_data %>%
left_join(combined_gdp_data, by = c("date", "country_name")) %>%
left_join(combined_population_data, by = c("date", "country_name")) %>%
left_join(combined_urbanization_data, by = c("date", "country_name")) %>%
select(Births.per.Woman, GDP...Billions.of.US..., Population, Urban.Population)
correlation_data <- na.omit(correlation_data)
cor_matrix <- cor(correlation_data, use = "complete.obs")
print(cor_matrix)
## Births.per.Woman GDP...Billions.of.US... Population
## Births.per.Woman 1.0000000 -0.4945149 -0.4247461
## GDP...Billions.of.US... -0.4945149 1.0000000 0.4779446
## Population -0.4247461 0.4779446 1.0000000
## Urban.Population -0.5097860 0.7622453 0.8965122
## Urban.Population
## Births.per.Woman -0.5097860
## GDP...Billions.of.US... 0.7622453
## Population 0.8965122
## Urban.Population 1.0000000
ggcorrplot(cor_matrix, method = "circle", lab = TRUE, lab_size = 3, title = "Correlation Matrix")
Fertility Rate (Births per Woman):
Negatively correlated with: GDP (Billions of US $) (-0.49): Countries with higher GDP tend to have lower fertility rates. Population (-0.42): Higher population countries generally have lower fertility rates. Urban Population (-0.51): Urbanization is associated with reduced fertility rates. GDP:
Positively correlated with: Population (0.48): Larger populations tend to have higher GDPs. Urban Population (0.76): Urbanization strongly correlates with higher GDP, reflecting economic growth tied to urban centers. Population:
Positively correlated with: Urban Population (0.90): Countries with larger populations also tend to have higher urban populations. Urban Population:
Strongly correlated with GDP and Population, reinforcing the relationship between urbanization, population size, and economic development. Key Insights: There is a clear negative relationship between fertility rates and economic/urban indicators, suggesting that development and urbanization reduce fertility rates. Urbanization is a central driver of GDP growth and population dynamics, as shown by the strong correlations. This analysis supports the broader hypothesis that demographic and economic transitions are closely linked. Let me know if you’d like further elaboration!
merged_data_gdp <- combined_fertility_data %>%
left_join(combined_gdp_data, by = c("country_name", "date"))
# Run regression: Fertility Rate vs. GDP
country_models_gdp <- merged_data_gdp %>%
group_by(country_name) %>%
do(model = lm(Births.per.Woman ~ GDP...Billions.of.US..., data = .))
model_summaries_gdp <- country_models_gdp %>%
summarise(
country_name,
model_summary = list(summary(model)),
confint = list(confint(model, level = 0.95)) # Add confidence intervals
)
# Print results for each country
for (i in 1:nrow(model_summaries_gdp)) {
cat("Country:", model_summaries_gdp$country_name[i], "\n")
print(model_summaries_gdp$model_summary[[i]])
cat("Confidence Intervals (95%):\n")
print(model_summaries_gdp$confint[[i]])
cat("\n")
}
## Country: Bangladesh
##
## Call:
## lm(formula = Births.per.Woman ~ GDP...Billions.of.US..., data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9664 -1.4030 0.2369 1.3193 2.1942
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.531044 0.209800 26.363 < 2e-16 ***
## GDP...Billions.of.US... -0.012541 0.001573 -7.973 4.89e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.358 on 61 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.5103, Adjusted R-squared: 0.5023
## F-statistic: 63.57 on 1 and 61 DF, p-value: 4.888e-11
##
## Confidence Intervals (95%):
## 2.5 % 97.5 %
## (Intercept) 5.11152307 5.950564663
## GDP...Billions.of.US... -0.01568588 -0.009395725
##
## Country: China
##
## Call:
## lm(formula = Births.per.Woman ~ GDP...Billions.of.US..., data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.668 -1.233 -0.589 1.067 2.860
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.452e+00 2.261e-01 15.269 < 2e-16 ***
## GDP...Billions.of.US... -1.590e-04 3.903e-05 -4.072 0.000136 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.519 on 61 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.2137, Adjusted R-squared: 0.2009
## F-statistic: 16.58 on 1 and 61 DF, p-value: 0.0001364
##
## Confidence Intervals (95%):
## 2.5 % 97.5 %
## (Intercept) 2.9995988169 3.903676e+00
## GDP...Billions.of.US... -0.0002370127 -8.090344e-05
##
## Country: Egypt
##
## Call:
## lm(formula = Births.per.Woman ~ GDP...Billions.of.US..., data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.6934 -0.8831 0.2757 0.6862 1.4725
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.4348670 0.1521020 35.732 < 2e-16 ***
## GDP...Billions.of.US... -0.0078373 0.0009681 -8.096 3.01e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9306 on 61 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.5179, Adjusted R-squared: 0.51
## F-statistic: 65.54 on 1 and 61 DF, p-value: 3.013e-11
##
## Confidence Intervals (95%):
## 2.5 % 97.5 %
## (Intercept) 5.130720138 5.739013829
## GDP...Billions.of.US... -0.009773115 -0.005901437
##
## Country: Japan
##
## Call:
## lm(formula = Births.per.Woman ~ GDP...Billions.of.US..., data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.160696 -0.067655 0.009144 0.043463 0.202611
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.034e+00 1.743e-02 116.69 <2e-16 ***
## GDP...Billions.of.US... -1.341e-04 4.830e-06 -27.77 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.08139 on 61 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.9267, Adjusted R-squared: 0.9255
## F-statistic: 771.3 on 1 and 61 DF, p-value: < 2.2e-16
##
## Confidence Intervals (95%):
## 2.5 % 97.5 %
## (Intercept) 1.9989708370 2.0686739102
## GDP...Billions.of.US... -0.0001438083 -0.0001244905
##
## Country: Niger
##
## Call:
## lm(formula = Births.per.Woman ~ GDP...Billions.of.US..., data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.32325 -0.16246 0.05984 0.14845 0.21632
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.789679 0.028945 269.12 <2e-16 ***
## GDP...Billions.of.US... -0.058786 0.004992 -11.78 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1633 on 61 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.6945, Adjusted R-squared: 0.6895
## F-statistic: 138.7 on 1 and 61 DF, p-value: < 2.2e-16
##
## Confidence Intervals (95%):
## 2.5 % 97.5 %
## (Intercept) 7.73179960 7.84755884
## GDP...Billions.of.US... -0.06876775 -0.04880395
##
## Country: United States
##
## Call:
## lm(formula = Births.per.Woman ~ GDP...Billions.of.US..., data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.50632 -0.25975 0.00922 0.09319 1.11102
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.348e+00 7.286e-02 32.228 < 2e-16 ***
## GDP...Billions.of.US... -2.967e-05 6.758e-06 -4.391 4.57e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3746 on 61 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.2402, Adjusted R-squared: 0.2277
## F-statistic: 19.28 on 1 and 61 DF, p-value: 4.568e-05
##
## Confidence Intervals (95%):
## 2.5 % 97.5 %
## (Intercept) 2.202412e+00 2.493793e+00
## GDP...Billions.of.US... -4.318991e-05 -1.616109e-05
-0.0125414.89e-11 (highly significant)0.5103 (GDP
explains 51.03% of the variation in fertility rate)-1.590e-040.000136 (highly significant)0.2137 (GDP
explains 21.37% of the variation in fertility rate)-0.00783733.01e-11 (highly significant)0.5179 (GDP
explains 51.79% of the variation in fertility rate)-1.341e-04< 2e-16 (highly significant)0.9267 (GDP
explains 92.67% of the variation in fertility rate)-0.058786< 2e-16 (highly significant)0.6945 (GDP
explains 69.45% of the variation in fertility rate)-2.967e-054.57e-05 (highly significant)0.2402 (GDP
explains 24.02% of the variation in fertility rate)The results from the regression analysis demonstrate a significant relationship between GDP and fertility rates in all the countries tested. The p-values for GDP are all well below 0.05, which leads us to reject the null hypothesis in every case.
For most countries like Japan, Niger, Bangladesh, GDP explains a significant portion of the variation in fertility rates, with Japan showing the highest explanatory power at 92.67%. While the relationship is statistically significant across the board, the strength of the relationship (as indicated by the \(R^2\) values) varies, with some countries like China and the United States having weaker relationships.
The residuals indicate that the models fit the data reasonably well in most cases, with some deviations, particularly in countries with weaker relationships like the U.S. and China. The residuals for Japan and Niger, however, suggest a very good fit.
In conclusion, GDP does have a significant impact on fertility rates across the countries analyzed, but the strength of this relationship varies widely. The relationship is stronger in wealthier nations like Japan and Niger, while it is weaker in countries like the United States and China.
gdp_coefficients_df <- data.frame(
country_name = c("Bangladesh", "China", "Egypt", "Japan", "Niger", "United States"),
coefficient = c(-0.012541, -1.590e-04, -0.0078373, -1.341e-04, -0.058786, -2.967e-05)
)
ggplot(gdp_coefficients_df, aes(x = reorder(country_name, coefficient), y = coefficient)) +
geom_bar(stat = "identity", fill = "lightgreen") +
coord_flip() +
labs(
title = "Effect of GDP on Fertility Rate by Country",
x = "Country",
y = "Coefficient of GDP"
) +
theme_minimal()
Hypothesis 2: Regression: Fertility Rate vs. Urbanization
# Merge relevant data (including fertility rate, GDP, urban population, and birth rate)
merged_data <- combined_fertility_data %>%
left_join(combined_gdp_data, by = c("country_name", "date")) %>%
left_join(combined_population_data, by = c("country_name", "date")) %>%
left_join(combined_urbanization_data, by = c("country_name", "date"))
# Run regression: Birth rate vs. Urban Population
country_models_urban <- merged_data %>%
group_by(country_name) %>%
do(model = lm(Births.per.Woman ~ Urban.Population, data = .))
# Model summaries and confidence intervals
model_summaries_urban <- country_models_urban %>%
summarise(
country_name,
model_summary = list(summary(model)),
confidence_intervals = list(confint(model))
)
for (i in 1:nrow(model_summaries_urban)) {
cat("Country:", model_summaries_urban$country_name[i], "\n")
print(model_summaries_urban$model_summary[[i]])
cat("\n")
cat("Confidence Intervals (95%):\n")
print(model_summaries_urban$confidence_intervals[[i]])
cat("\n")
}
## Country: Bangladesh
##
## Call:
## lm(formula = Births.per.Woman ~ Urban.Population, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8933 -0.5679 0.1041 0.4577 1.3617
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.997e+00 1.235e-01 56.65 <2e-16 ***
## Urban.Population -9.422e-08 3.829e-09 -24.61 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.587 on 61 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.9085, Adjusted R-squared: 0.907
## F-statistic: 605.6 on 1 and 61 DF, p-value: < 2.2e-16
##
##
## Confidence Intervals (95%):
## 2.5 % 97.5 %
## (Intercept) 6.750218e+00 7.244209e+00
## Urban.Population -1.018743e-07 -8.656208e-08
##
## Country: China
##
## Call:
## lm(formula = Births.per.Woman ~ Urban.Population, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.3635 -1.0731 -0.3049 1.0053 2.0697
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.918e+00 2.702e-01 18.200 < 2e-16 ***
## Urban.Population -5.028e-09 5.857e-10 -8.585 4.36e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.153 on 61 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.5471, Adjusted R-squared: 0.5397
## F-statistic: 73.7 on 1 and 61 DF, p-value: 4.365e-12
##
##
## Confidence Intervals (95%):
## 2.5 % 97.5 %
## (Intercept) 4.377556e+00 5.458202e+00
## Urban.Population -6.198969e-09 -3.856727e-09
##
## Country: Egypt
##
## Call:
## lm(formula = Births.per.Woman ~ Urban.Population, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8023 -0.4461 0.1369 0.2948 0.9268
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.648e+00 1.580e-01 48.42 <2e-16 ***
## Urban.Population -1.133e-07 5.523e-09 -20.52 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4768 on 61 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.8734, Adjusted R-squared: 0.8714
## F-statistic: 421 on 1 and 61 DF, p-value: < 2.2e-16
##
##
## Confidence Intervals (95%):
## 2.5 % 97.5 %
## (Intercept) 7.332175e+00 7.963901e+00
## Urban.Population -1.243580e-07 -1.022715e-07
##
## Country: Japan
##
## Call:
## lm(formula = Births.per.Woman ~ Urban.Population, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.22311 -0.09698 0.02950 0.07112 0.26859
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.140e+00 8.557e-02 36.69 <2e-16 ***
## Urban.Population -1.579e-08 8.882e-10 -17.78 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1209 on 61 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.8383, Adjusted R-squared: 0.8356
## F-statistic: 316.2 on 1 and 61 DF, p-value: < 2.2e-16
##
##
## Confidence Intervals (95%):
## 2.5 % 97.5 %
## (Intercept) 2.968527e+00 3.310727e+00
## Urban.Population -1.756894e-08 -1.401699e-08
##
## Country: Niger
##
## Call:
## lm(formula = Births.per.Woman ~ Urban.Population, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.44860 -0.18782 0.09362 0.17200 0.23158
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.840e+00 4.432e-02 176.892 < 2e-16 ***
## Urban.Population -1.796e-07 2.229e-08 -8.057 3.51e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2057 on 61 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.5155, Adjusted R-squared: 0.5076
## F-statistic: 64.91 on 1 and 61 DF, p-value: 3.512e-11
##
##
## Confidence Intervals (95%):
## 2.5 % 97.5 %
## (Intercept) 7.750994e+00 7.928235e+00
## Urban.Population -2.241239e-07 -1.349942e-07
##
## Country: United States
##
## Call:
## lm(formula = Births.per.Woman ~ Urban.Population, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.52687 -0.26031 0.05616 0.13296 0.93667
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.205e+00 1.953e-01 16.410 < 2e-16 ***
## Urban.Population -5.526e-09 9.559e-10 -5.781 2.72e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3454 on 61 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.354, Adjusted R-squared: 0.3434
## F-statistic: 33.42 on 1 and 61 DF, p-value: 2.716e-07
##
##
## Confidence Intervals (95%):
## 2.5 % 97.5 %
## (Intercept) 2.814622e+00 3.595763e+00
## Urban.Population -7.437610e-09 -3.614797e-09
urban_coefficients_df <- data.frame(
country_name = c("Bangladesh", "China", "Egypt", "Japan", "Niger", "United States"),
coefficient = c(-9.422e-08, -5.028e-09, -1.133e-07, -1.579e-08, -1.796e-07, -5.526e-09)
)
ggplot(urban_coefficients_df, aes(x = reorder(country_name, coefficient), y = coefficient)) +
geom_bar(stat = "identity", fill = "lightblue") +
coord_flip() + # Flip coordinates for better visibility
labs(
title = "Effect of Urbanization on Fertility Rate by Country",
x = "Country",
y = "Coefficient of Urban Population"
) +
theme_minimal()
-9.422e-08< 2e-16 (highly significant)0.9085 (Urban
Population explains 90.85% of the variation in fertility rate)-5.028e-094.36e-12 (highly significant)0.5471 (Urban
Population explains 54.71% of the variation in fertility rate)-1.133e-07< 2e-16 (highly significant)0.8734 (Urban
Population explains 87.34% of the variation in fertility rate)-1.579e-08< 2e-16 (highly significant)0.8383 (Urban
Population explains 83.83% of the variation in fertility rate)-1.796e-073.51e-11 (highly significant)0.5155 (Urban
Population explains 51.55% of the variation in fertility rate)-5.526e-092.72e-07 (highly significant)0.354 (Urban
Population explains 35.4% of the variation in fertility rate)This version should now include the conclusions for each country, accounting for p-value, confidence interval, and residuals.
# Merge the relevant data for Fertility Rate and Population
merged_data_population <- combined_fertility_data %>%
left_join(combined_population_data, by = c("country_name", "date"))
merged_data_population <- merged_data_population %>%
filter(!is.na(Births.per.Woman) & !is.na(Population))
# Run the regression: Fertility Rate vs. Population
country_models_population <- merged_data_population %>%
group_by(country_name) %>%
do(model = lm(Births.per.Woman ~ Population, data = .))
# Model summaries and confidence intervals
model_summaries_population <- country_models_population %>%
summarise(
country_name,
model_summary = list(summary(model)),
conf_int = list(confint(model))
)
for (i in 1:nrow(model_summaries_population)) {
cat("Country:", model_summaries_population$country_name[i], "\n")
print(model_summaries_population$model_summary[[i]])
cat("\n")
cat("Confidence Interval (95%) for Model Parameters:\n")
print(model_summaries_population$conf_int[[i]])
cat("\n")
}
## Country: Bangladesh
##
## Call:
## lm(formula = Births.per.Woman ~ Population, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.2836 -0.3316 -0.1135 0.4228 0.9054
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.220e+00 1.657e-01 55.65 <2e-16 ***
## Population -4.370e-08 1.495e-09 -29.23 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5461 on 73 degrees of freedom
## Multiple R-squared: 0.9213, Adjusted R-squared: 0.9202
## F-statistic: 854.4 on 1 and 73 DF, p-value: < 2.2e-16
##
##
## Confidence Interval (95%) for Model Parameters:
## 2.5 % 97.5 %
## (Intercept) 8.890291e+00 9.550705e+00
## Population -4.667394e-08 -4.071552e-08
##
## Country: China
##
## Call:
## lm(formula = Births.per.Woman ~ Population, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.9440 -0.3945 -0.1001 0.4946 1.2974
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.833e+00 2.531e-01 38.85 <2e-16 ***
## Population -6.190e-09 2.323e-10 -26.65 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5789 on 73 degrees of freedom
## Multiple R-squared: 0.9068, Adjusted R-squared: 0.9055
## F-statistic: 710.3 on 1 and 73 DF, p-value: < 2.2e-16
##
##
## Confidence Interval (95%) for Model Parameters:
## 2.5 % 97.5 %
## (Intercept) 9.328920e+00 1.033792e+01
## Population -6.653303e-09 -5.727455e-09
##
## Country: Egypt
##
## Call:
## lm(formula = Births.per.Woman ~ Population, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.87464 -0.33059 0.07188 0.32104 0.93949
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.707e+00 1.292e-01 59.65 <2e-16 ***
## Population -4.847e-08 2.004e-09 -24.19 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4838 on 73 degrees of freedom
## Multiple R-squared: 0.8891, Adjusted R-squared: 0.8875
## F-statistic: 585 on 1 and 73 DF, p-value: < 2.2e-16
##
##
## Confidence Interval (95%) for Model Parameters:
## 2.5 % 97.5 %
## (Intercept) 7.449150e+00 7.964109e+00
## Population -5.246459e-08 -4.447656e-08
##
## Country: Japan
##
## Call:
## lm(formula = Births.per.Woman ~ Population, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.34457 -0.07359 -0.00070 0.10718 0.64608
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.560e+00 1.754e-01 31.7 <2e-16 ***
## Population -3.286e-08 1.514e-09 -21.7 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1823 on 73 degrees of freedom
## Multiple R-squared: 0.8658, Adjusted R-squared: 0.8639
## F-statistic: 470.9 on 1 and 73 DF, p-value: < 2.2e-16
##
##
## Confidence Interval (95%) for Model Parameters:
## 2.5 % 97.5 %
## (Intercept) 5.210239e+00 5.909426e+00
## Population -3.587897e-08 -2.984279e-08
##
## Country: Niger
##
## Call:
## lm(formula = Births.per.Woman ~ Population, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.53236 -0.19215 0.05962 0.23109 0.30805
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.795e+00 5.107e-02 152.65 < 2e-16 ***
## Population -3.007e-08 4.153e-09 -7.24 3.73e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.252 on 73 degrees of freedom
## Multiple R-squared: 0.4179, Adjusted R-squared: 0.4099
## F-statistic: 52.41 on 1 and 73 DF, p-value: 3.735e-10
##
##
## Confidence Interval (95%) for Model Parameters:
## 2.5 % 97.5 %
## (Intercept) 7.693625e+00 7.897178e+00
## Population -3.834316e-08 -2.178941e-08
-2.967e-054.57e-05 (highly significant)0.2402
(Population explains 24.02% of the variation in fertility rate)-3.553e-08<2e-16 (highly significant)0.9377
(Population explains 93.77% of the variation in fertility rate)-4.426e-09<2e-16 (highly significant)0.5536
(Population explains 55.36% of the variation in fertility rate)-2.416e-08<2e-16 (highly significant)0.8527
(Population explains 85.27% of the variation in fertility rate)-9.243e-092.06e-08 (highly significant)0.1908
(Population explains 19.08% of the variation in fertility rate)-3.836e-08<2e-16 (highly significant)0.9351
(Population explains 93.51% of the variation in fertility rate)Statistical Significance: The p-values for all countries are very small (less than 0.05), so we reject the null hypothesis. This means population size has a significant effect on fertility rates in all the countries analyzed.
Strength of the Relationship: The relationship is strongest in Bangladesh, Egypt, and Niger, where population explains most of the variation in fertility rates. In Japan, the relationship is weaker, with population explaining less of the variation.
Model Fit: The residuals show that the models fit the data well, though there are some outliers, especially in China and Japan. This suggests that while population size is important, other factors may also influence fertility rates.
population_coefficients_df <- data.frame(
country_name = c("Bangladesh", "China", "Egypt", "Japan", "Niger", "United States"),
coefficient = c(-3.553e-08, -4.426e-09, -2.416e-08, -9.243e-09, -3.836e-08, -2.967e-05)
)
ggplot(population_coefficients_df, aes(x = reorder(country_name, coefficient), y = coefficient)) +
geom_bar(stat = "identity", fill = "skyblue") +
coord_flip() + # Flip coordinates for better visibility
labs(
title = "Effect of Population on Fertility Rate by Country",
x = "Country",
y = "Coefficient of Population"
) +
theme_minimal()
I omitted the United States from this Dataframe due to it being a massive outlier
# Create a data frame for Population coefficients
population_coefficients_df <- data.frame(
country_name = c("Bangladesh", "China", "Egypt", "Japan", "Niger"),
coefficient = c(-3.553e-08, -4.426e-09, -2.416e-08, -9.243e-09, -3.836e-08)
)
# Create a bar plot for the coefficient of Population vs. Fertility Rate by Country
ggplot(population_coefficients_df, aes(x = reorder(country_name, coefficient), y = coefficient)) +
geom_bar(stat = "identity", fill = "skyblue") +
coord_flip() + # Flip coordinates for better visibility
labs(
title = "Effect of Population on Fertility Rate by Country",
x = "Country",
y = "Coefficient of Population"
) +
theme_minimal()
In this analysis, we explored the relationship between fertility rates, urbanization, GDP, and population growth across six countries: Bangladesh, China, Egypt, Japan, Niger, and the United States.
Bangladesh, Egypt, and Niger show strong relationships, with high \(R^2\) values indicating that urbanization and population size explain a significant portion of the variation in fertility rates. The government’s efforts to manage population growth through education and financial incentives, such as Bangladesh sending women to pursue higher education and Egypt’s $1000 marriage payment, play a role in these trends. These policies are more focused on controlling population growth than the government’s official explanations suggest.
In Japan, the relationship between population size and fertility rates is weaker. Despite urbanization, Japan’s extreme work culture, where long hours are expected, likely contributes to delayed marriages and lower fertility rates. This cultural pressure to prioritize career over family may explain why Japan doesn’t fully align with the trends seen in other countries.
China, while initially benefiting from the one-child policy, has recently shifted to a more open stance toward family planning. Still, urbanization remains a complex factor affecting fertility rates.
Overall, the p-values for all countries are highly significant, allowing us to reject the null hypothesis and conclude that urbanization and population size influence fertility rates. As we expand this analysis to a global scale, it’s likely that we will observe similar trends in other countries, where societal factors such as government policies and cultural norms shape population dynamics and fertility decisions.
Although we rejected the null hypothesis today in the month of December 2024, we are likely to not reject this null hypothesis in the upcoming decades. Taking a look at the global overall data, we see that there was indeed a heavy crash in fertility since 1960. All countries that were explored in this project are seeing this downward trend. Only time will give us the oppurtunity on how we change our conclusion for this experiement.
Future plans to expand on this project is to take a look at how the declining sperm count in men has effected fertility rates. What are the factors that effected male fertility from the lens of biology. We can explore the increase in microplastics in our enviornment, the lack of physcial activity, and the increase in criteria to find a partner.