A total fertility rate is the rate at which children are being born in a given country, estimating the total average number of children a woman will have over her lifetime. According to the World Population Prospects report from the United Nations, a total fertility rate of 2.1 births per woman is generally considered the threshold needed to maintain a stable population, without growth or decline. This is so each couple can replace themselves, with the extra .01 to account for external factors such as infant mortality and the fact that not everyone will have kids. When a population’s total fertility rate falls below 2.1, it is considered to be a decreasing population, and therefore risky for the long-term survival of such a population. In the Republic of China (Taiwan), data from the country’s statistical bureau measures the total fertility rate for 2023 as 0.865, well below 2.1, and one of the lowest total fertility rates in the world. This brings into question Taiwan’s relationship to societal factors that influence childbirth, such as education and nuptiality. In general, education, especially higher education, has been linked to delayed childbirth and lower fertility rates (World Economic Forum, 2023). At the same time, a lower rate of marriage can also negatively impact family planning. Is Taiwan’s fertility rate impacted by increasing levels of participation of women in higher education and declining marriage rates?
The declining fertility rate in Taiwan poses many issues for the country, including problems regarding the workforce and economic growth. A decreasing population due a decline in fertility rate can cause productivity to lessen and prevent economic expansion. Therefore, it is crucial to understand the role in which higher education and marriage rates play in the trend. While higher education empowers women and opens up career opportunities, it may also delay family planning and contribute to total fertility rates. Similarly, decreasing marriage rates further exacerbate the problem, as marriage is traditionally linked to family formation.
Our primary objective is to use linear regression to determine the effect women’s participation in higher education, and marriage rates, have on Taiwan’s fertility rate. Our secondary objective is to use time series forecasting to develop a model to illustrate the relationship between said factors, and forecast future fertility rates.
library("readxl") #import excel files
library(dplyr) #data manipulation, mutate()
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(zoo) #working with time series data
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(forecast) #time series forecasting
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
regression <- read_excel("C:/Users/nikol/Documents/Statistical Analysis/Project/regression.xlsx")
summary(regression)
## Year MDR EduRate TFR
## Min. :1997 Min. :0.4824 Min. :0.07898 Min. :0.865
## 1st Qu.:2004 1st Qu.:0.4974 1st Qu.:0.22025 1st Qu.:1.050
## Median :2010 Median :0.5120 Median :0.41527 Median :1.115
## Mean :2010 Mean :0.5226 Mean :0.44003 Mean :1.177
## 3rd Qu.:2016 3rd Qu.:0.5469 3rd Qu.:0.63624 3rd Qu.:1.252
## Max. :2023 Max. :0.5842 Max. :0.95019 Max. :1.770
pairs(~MDR + TFR + EduRate, data = regression) # Scatter Plot Correlation Matrix
corr <- cor(regression[ , -which(names(regression) == "Year")])
corr
## MDR EduRate TFR
## MDR 1.0000000 -0.9455417 0.8536562
## EduRate -0.9455417 1.0000000 -0.7603049
## TFR 0.8536562 -0.7603049 1.0000000
r_squared = corr^2
r_squared # We can see that our independent variables Marriage Rate and Education Rate are collinear. We will drop Education Attainment.
## MDR EduRate TFR
## MDR 1.0000000 0.8940491 0.7287289
## EduRate 0.8940491 1.0000000 0.5780636
## TFR 0.7287289 0.5780636 1.0000000
modelMDR <- lm(TFR ~ MDR, data = regression)
summary(modelMDR) # Coefficient = 6.24, P = 0.00
##
## Call:
## lm(formula = TFR ~ MDR, data = regression)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.21578 -0.10201 -0.02720 0.09028 0.22079
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.0870 0.3990 -5.231 2.05e-05 ***
## MDR 6.2454 0.7621 8.195 1.51e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1221 on 25 degrees of freedom
## Multiple R-squared: 0.7287, Adjusted R-squared: 0.7179
## F-statistic: 67.16 on 1 and 25 DF, p-value: 1.51e-08
# Regression Equation: TFR = -2.09 + 6.25x1
# Model Accuracy - R-Square is 0.73 - Model is significant.
forecasting <- read_excel("Forecasting.xlsx")
plot(forecasting$Year, forecasting$TFR, type = "o", col = "blue",
xlab = "Year", ylab = "Total Fertility Rate",
main = "Total Fertility Rate Over Time")
Interpretation: the time series plot exhibits a Trend Pattern as it is gradually decreasing.
tfr <- forecasting$TFR # Dependent variable
years <- forecasting$Year # Independent variable
#MSE
forecast_a <- tfr[-length(tfr)] # Exclude the last TFR
actual_a <- tfr[-1] # Exclude the first TFR
mse_a <- mean((actual_a - forecast_a)^2)
cat("Mean Square Error (MSE):", round(mse_a, 2), "\n")
## Mean Square Error (MSE): 0.01
# MAPE
mape_a <- mean(abs((actual_a - forecast_a) / actual_a)) * 100
cat("Mean Absolute Percentage Error (MAPE):", round(mape_a, 2), "%\n")
## Mean Absolute Percentage Error (MAPE): 6.43 %
# MAE
mae_a <- mean(abs(actual_a - forecast_a))
cat("Mean Absolute Error (MAE):", round(mae_a, 2), "\n")
## Mean Absolute Error (MAE): 0.09
#MSE
cumulative_average <- cumsum(tfr[-length(tfr)]) / (1:(length(tfr) - 1))
cumulative_average
## [1] 2.455000 2.387500 2.315000 2.250000 2.176000 2.093333 2.037143 2.014375
## [9] 1.977222 1.960500 1.938636 1.921250 1.908846 1.897857 1.889667 1.881563
## [17] 1.875000 1.852222 1.836579 1.828750 1.808333 1.787045 1.763043 1.738750
## [25] 1.713800 1.690769 1.668889 1.646786 1.625517 1.601167 1.583871 1.574062
## [33] 1.558636 1.547059 1.536429 1.526250 1.515405 1.503421 1.491795 1.479250
## [41] 1.466951 1.452738
forecast_b <- cumulative_average
actual_b <- tfr[-1] # Exclude the first TFR
mse_b <- mean((actual_b - forecast_b)^2)
cat("Mean Square Error (MSE):", round(mse_b, 2), "\n")
## Mean Square Error (MSE): 0.18
# MAPE
mape_b <- mean(abs((actual_b - forecast_b) / actual_b)) * 100
cat("Mean Absolute Percentage Error (MAPE):", round(mape_b, 2), "%\n")
## Mean Absolute Percentage Error (MAPE): 32.49 %
# MAE
mae_b <- mean(abs(actual_b - forecast_b))
cat("Mean Absolute Error (MAE):", round(mae_b, 2), "\n")
## Mean Absolute Error (MAE): 0.39
#Extract TFR and Year
forecasting1 <- data.frame(Year = forecasting$Year, TFR = forecasting$TFR)
forecasting1$avg_tfr3 <- c(
NA, NA, NA,
(forecasting1$TFR[1] + forecasting1$TFR[2] + forecasting1$TFR[3]) / 3,
(forecasting1$TFR[2] + forecasting1$TFR[3] + forecasting1$TFR[4]) / 3,
(forecasting1$TFR[3] + forecasting1$TFR[4] + forecasting1$TFR[5]) / 3,
(forecasting1$TFR[4] + forecasting1$TFR[5] + forecasting1$TFR[6]) / 3,
(forecasting1$TFR[5] + forecasting1$TFR[6] + forecasting1$TFR[7]) / 3,
(forecasting1$TFR[6] + forecasting1$TFR[7] + forecasting1$TFR[8]) / 3,
(forecasting1$TFR[7] + forecasting1$TFR[8] + forecasting1$TFR[9]) / 3,
(forecasting1$TFR[8] + forecasting1$TFR[9] + forecasting1$TFR[10]) / 3,
(forecasting1$TFR[9] + forecasting1$TFR[10] + forecasting1$TFR[11]) / 3,
(forecasting1$TFR[10] + forecasting1$TFR[11] + forecasting1$TFR[12]) / 3,
(forecasting1$TFR[11] + forecasting1$TFR[12] + forecasting1$TFR[13]) / 3,
(forecasting1$TFR[12] + forecasting1$TFR[13] + forecasting1$TFR[14]) / 3,
(forecasting1$TFR[13] + forecasting1$TFR[14] + forecasting1$TFR[15]) / 3,
(forecasting1$TFR[14] + forecasting1$TFR[15] + forecasting1$TFR[16]) / 3,
(forecasting1$TFR[15] + forecasting1$TFR[16] + forecasting1$TFR[17]) / 3,
(forecasting1$TFR[16] + forecasting1$TFR[17] + forecasting1$TFR[18]) / 3,
(forecasting1$TFR[17] + forecasting1$TFR[18] + forecasting1$TFR[19]) / 3,
(forecasting1$TFR[18] + forecasting1$TFR[19] + forecasting1$TFR[20]) / 3,
(forecasting1$TFR[19] + forecasting1$TFR[20] + forecasting1$TFR[21]) / 3,
(forecasting1$TFR[20] + forecasting1$TFR[21] + forecasting1$TFR[22]) / 3,
(forecasting1$TFR[21] + forecasting1$TFR[22] + forecasting1$TFR[23]) / 3,
(forecasting1$TFR[22] + forecasting1$TFR[23] + forecasting1$TFR[24]) / 3,
(forecasting1$TFR[23] + forecasting1$TFR[24] + forecasting1$TFR[25]) / 3,
(forecasting1$TFR[24] + forecasting1$TFR[25] + forecasting1$TFR[26]) / 3,
(forecasting1$TFR[25] + forecasting1$TFR[26] + forecasting1$TFR[27]) / 3,
(forecasting1$TFR[26] + forecasting1$TFR[27] + forecasting1$TFR[28]) / 3,
(forecasting1$TFR[27] + forecasting1$TFR[28] + forecasting1$TFR[29]) / 3,
(forecasting1$TFR[28] + forecasting1$TFR[29] + forecasting1$TFR[30]) / 3,
(forecasting1$TFR[29] + forecasting1$TFR[30] + forecasting1$TFR[31]) / 3,
(forecasting1$TFR[30] + forecasting1$TFR[31] + forecasting1$TFR[32]) / 3,
(forecasting1$TFR[31] + forecasting1$TFR[32] + forecasting1$TFR[33]) / 3,
(forecasting1$TFR[32] + forecasting1$TFR[33] + forecasting1$TFR[34]) / 3,
(forecasting1$TFR[33] + forecasting1$TFR[34] + forecasting1$TFR[35]) / 3,
(forecasting1$TFR[34] + forecasting1$TFR[35] + forecasting1$TFR[36]) / 3,
(forecasting1$TFR[35] + forecasting1$TFR[36] + forecasting1$TFR[37]) / 3,
(forecasting1$TFR[36] + forecasting1$TFR[37] + forecasting1$TFR[38]) / 3,
(forecasting1$TFR[37] + forecasting1$TFR[38] + forecasting1$TFR[39]) / 3,
(forecasting1$TFR[38] + forecasting1$TFR[39] + forecasting1$TFR[40]) / 3,
(forecasting1$TFR[39] + forecasting1$TFR[40] + forecasting1$TFR[41]) / 3,
(forecasting1$TFR[40] + forecasting1$TFR[41] + forecasting1$TFR[42]) / 3
)
# Calculate the squared errors (only for years where moving average is available)
forecasting1 <- forecasting1 %>%
mutate(
squared_error = ifelse(is.na(avg_tfr3), NA, (TFR - avg_tfr3)^2)
)
# Compute MSE (excluding the initial years with NA)
mse_ma <- mean(forecasting1$squared_error, na.rm = TRUE)
cat("Mean Square Error (MSE)", round(mse_ma,2), "\n")
## Mean Square Error (MSE) 0.02
#MAPE
mape_ma <- mean(abs((forecasting1$TFR[!is.na(forecasting1$avg_tfr3)] - forecasting1$avg_tfr3[!is.na(forecasting1$avg_tfr3)]) / forecasting1$TFR[!is.na(forecasting1$avg_tfr3)])) * 100
cat("Mean Absolute Percentage Error (MAPE):", round(mape_ma ,2), "%\n")
## Mean Absolute Percentage Error (MAPE): 7.95 %
# MAE
mae_ma <- mean(abs(forecasting1$TFR[!is.na(forecasting1$avg_tfr3)] - forecasting1$avg_tfr3[!is.na(forecasting1$avg_tfr3)]))
cat("Mean Absolute Error (MAE):", round(mae_ma, 2), "\n")
## Mean Absolute Error (MAE): 0.11
arima_model <- auto.arima(tfr)
summary(arima_model)
## Series: tfr
## ARIMA(0,1,0) with drift
##
## Coefficients:
## drift
## -0.0379
## s.e. 0.0173
##
## sigma^2 = 0.0129: log likelihood = 32.27
## AIC=-60.55 AICc=-60.24 BIC=-57.07
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 5.797339e-05 0.1109015 0.08460947 0.03704928 6.018981 0.9501598
## ACF1
## Training set -0.1651491
rmse_value <- accuracy(arima_model)["Training set", "RMSE"]
cat("Mean Square Error (MSE):", rmse_value^2, "\n")
## Mean Square Error (MSE): 0.01229915
# ETS
tfr_data <- c(tfr)
tfr_ts <- ts(tfr_data, start = 1981, frequency = 1)
# Model
ets_model <- ets(tfr_ts)
summary(ets_model)
## ETS(A,Ad,N)
##
## Call:
## ets(y = tfr_ts)
##
## Smoothing parameters:
## alpha = 0.2898
## beta = 0.2898
## phi = 0.8076
##
## Initial states:
## l = 2.7882
## b = -0.3214
##
## sigma: 0.109
##
## AIC AICc BIC
## -22.19980 -19.86646 -11.63260
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.01209323 0.1024611 0.07186762 -1.282984 5.334802 0.8070695
## ACF1
## Training set 0.1204511
rmse_value_ets <- accuracy(ets_model)["Training set", "RMSE"]
cat("Mean Square Error (MSE):", rmse_value_ets^2, "\n")
## Mean Square Error (MSE): 0.01049828
plot(ets_model)
# Residual Independence:
Box.test(residuals(ets_model), lag = 10, type = "Ljung-Box") #P-value above 0.05 means we can fail to reject the null and conclude the residuals are independent.
##
## Box-Ljung test
##
## data: residuals(ets_model)
## X-squared = 13.053, df = 10, p-value = 0.2207
# Residual Normality:
shapiro.test(residuals(ets_model)) # P-value below 0.05 means that we reject the null and conclude that the residuals are not normally distributed. However, visual inspection of the data through a histogram shows rough normality.
##
## Shapiro-Wilk normality test
##
## data: residuals(ets_model)
## W = 0.93537, p-value = 0.01784
hist(residuals(ets_model), main = "Histogram of Residuals", xlab = "Residuals")
qqnorm(residuals(ets_model))
qqline(residuals(ets_model))
# Constant Variance:
plot(residuals(ets_model), main = "Residuals Over Time", ylab = "Residuals") # Since we can see the residuals are centered around 0 and fluctuate symmetrically we can confirm the residuals do have constant variance.
ets_forecast <- forecast(ets_model, h = 10)
summary(ets_forecast)
##
## Forecast method: ETS(A,Ad,N)
##
## Model Information:
## ETS(A,Ad,N)
##
## Call:
## ets(y = tfr_ts)
##
## Smoothing parameters:
## alpha = 0.2898
## beta = 0.2898
## phi = 0.8076
##
## Initial states:
## l = 2.7882
## b = -0.3214
##
## sigma: 0.109
##
## AIC AICc BIC
## -22.19980 -19.86646 -11.63260
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.01209323 0.1024611 0.07186762 -1.282984 5.334802 0.8070695
## ACF1
## Training set 0.1204511
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2024 0.8285385 0.6888575 0.9682196 0.61491478 1.042162
## 2025 0.7953304 0.6376436 0.9530173 0.55416918 1.036492
## 2026 0.7685120 0.5820170 0.9550069 0.48329255 1.053731
## 2027 0.7468537 0.5246006 0.9691068 0.40694692 1.086760
## 2028 0.7293628 0.4676902 0.9910353 0.32916916 1.129556
## 2029 0.7152373 0.4126100 1.0178645 0.25240886 1.178066
## 2030 0.7038297 0.3599771 1.0476823 0.17795255 1.229707
## 2031 0.6946171 0.3100011 1.0792331 0.10639772 1.282837
## 2032 0.6871771 0.2626748 1.1116795 0.03795677 1.336398
## 2033 0.6811687 0.2178812 1.1444562 -0.02736840 1.389706
plot(ets_forecast)
fitted_values <- fitted(ets_model)
ets_model$states
## Time Series:
## Start = 1980
## End = 2023
## Frequency = 1
## l b
## 1980 2.7881927 -0.3214076111
## 1981 2.5072896 -0.2809030861
## 1982 2.2919020 -0.2153876880
## 1983 2.1330400 -0.1588619085
## 1984 2.0193096 -0.1137304566
## 1985 1.9137071 -0.1056024567
## 1986 1.7854089 -0.1282982754
## 1987 1.6870723 -0.0983365890
## 1988 1.6793399 -0.0077323985
## 1989 1.6750964 -0.0042435197
## 1990 1.7117592 0.0366628764
## 1991 1.7351751 0.0234158666
## 1992 1.7471052 0.0119301093
## 1993 1.7576847 0.0105794506
## 1994 1.7629743 0.0052896827
## 1995 1.7694934 0.0065190212
## 1996 1.7704810 0.0009876235
## 1997 1.7709080 0.0004270460
## 1998 1.6824971 -0.0884109012
## 1999 1.5948398 -0.0876573081
## 2000 1.5692454 -0.0255944455
## 2001 1.5055166 -0.0637287609
## 2002 1.4209970 -0.0845196407
## 2003 1.3186175 -0.1023794772
## 2004 1.2197258 -0.0988916484
## 2005 1.1326568 -0.0870690346
## 2006 1.0776022 -0.0550546495
## 2007 1.0525173 -0.0250848293
## 2008 1.0374006 -0.0151166829
## 2009 1.0265858 -0.0108148056
## 2010 0.9822480 -0.0443378749
## 2011 0.9808010 -0.0014469689
## 2012 1.0637845 0.0829834578
## 2013 1.1117310 0.0479465337
## 2014 1.1546682 0.0429372100
## 2015 1.1851868 0.0305185672
## 2016 1.1982891 0.0131022821
## 2017 1.1845637 -0.0137253785
## 2018 1.1405916 -0.0439720924
## 2019 1.0891173 -0.0514742433
## 2020 1.0308694 -0.0582478867
## 2021 0.9812704 -0.0495990778
## 2022 0.9205759 -0.0606945020
## 2023 0.8696586 -0.0509172242
# Phi Value
ets_model$par
## alpha beta phi l b
## 0.2898120 0.2898120 0.8075877 2.7881927 -0.3214076
# Interpretation: Taiwan's fertility rate has steadily declined from 1980 to 2020, with forecasts suggesting stabilization at a low level and increasing uncertainty in future trends.
Our analysis confirms a significant relationship between marriage rates and fertility in Taiwan. Time series forecasting shows a steady decline from 1980 to 2020, with predictions suggesting stabilization at historically low levels. These findings provide valuable insights for policymakers, indicating that addressing marriage trends could be crucial in managing Taiwan’s demographic challenges, while carefully balancing social progress and women’s advancement in modern society.
We recommend the Taiwanese government to implement the following:
Effect workplace policies that support married couples, such as more flexible work arrangements and paid parental leave. This would make family building more compatible with career advancement.
Develop housing assistance programs specifically for young married couples, easing one of the biggest barriers to family building.
Create incentive programs for companies that achieve high rates of employees who are married and/or have children. This encourages employers to have a more family-friendly corporate culture.