Introduction:

A total fertility rate is the rate at which children are being born in a given country, estimating the total average number of children a woman will have over her lifetime. According to the World Population Prospects report from the United Nations, a total fertility rate of 2.1 births per woman is generally considered the threshold needed to maintain a stable population, without growth or decline. This is so each couple can replace themselves, with the extra .01 to account for external factors such as infant mortality and the fact that not everyone will have kids. When a population’s total fertility rate falls below 2.1, it is considered to be a decreasing population, and therefore risky for the long-term survival of such a population. In the Republic of China (Taiwan), data from the country’s statistical bureau measures the total fertility rate for 2023 as 0.865, well below 2.1, and one of the lowest total fertility rates in the world. This brings into question Taiwan’s relationship to societal factors that influence childbirth, such as education and nuptiality. In general, education, especially higher education, has been linked to delayed childbirth and lower fertility rates (World Economic Forum, 2023). At the same time, a lower rate of marriage can also negatively impact family planning. Is Taiwan’s fertility rate impacted by increasing levels of participation of women in higher education and declining marriage rates?

Problem Statement:

The declining fertility rate in Taiwan poses many issues for the country, including problems regarding the workforce and economic growth. A decreasing population due a decline in fertility rate can cause productivity to lessen and prevent economic expansion. Therefore, it is crucial to understand the role in which higher education and marriage rates play in the trend. While higher education empowers women and opens up career opportunities, it may also delay family planning and contribute to total fertility rates. Similarly, decreasing marriage rates further exacerbate the problem, as marriage is traditionally linked to family formation.

Objective:

Our primary objective is to use linear regression to determine the effect women’s participation in higher education, and marriage rates, have on Taiwan’s fertility rate. Our secondary objective is to use time series forecasting to develop a model to illustrate the relationship between said factors, and forecast future fertility rates.

Regression

library("readxl") #import excel files
library(dplyr) #data manipulation, mutate()
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(zoo) #working with time series data
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(forecast) #time series forecasting
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
regression <- read_excel("C:/Users/nikol/Documents/Statistical Analysis/Project/regression.xlsx")

Descriptive Statistics and Data Visualization

summary(regression)
##       Year           MDR            EduRate             TFR       
##  Min.   :1997   Min.   :0.4824   Min.   :0.07898   Min.   :0.865  
##  1st Qu.:2004   1st Qu.:0.4974   1st Qu.:0.22025   1st Qu.:1.050  
##  Median :2010   Median :0.5120   Median :0.41527   Median :1.115  
##  Mean   :2010   Mean   :0.5226   Mean   :0.44003   Mean   :1.177  
##  3rd Qu.:2016   3rd Qu.:0.5469   3rd Qu.:0.63624   3rd Qu.:1.252  
##  Max.   :2023   Max.   :0.5842   Max.   :0.95019   Max.   :1.770
pairs(~MDR + TFR + EduRate, data = regression) # Scatter Plot Correlation Matrix

corr <- cor(regression[ , -which(names(regression) == "Year")])
corr
##                MDR    EduRate        TFR
## MDR      1.0000000 -0.9455417  0.8536562
## EduRate -0.9455417  1.0000000 -0.7603049
## TFR      0.8536562 -0.7603049  1.0000000
r_squared =  corr^2
r_squared # We can see that our independent variables Marriage Rate and Education Rate are collinear. We will drop Education Attainment.
##               MDR   EduRate       TFR
## MDR     1.0000000 0.8940491 0.7287289
## EduRate 0.8940491 1.0000000 0.5780636
## TFR     0.7287289 0.5780636 1.0000000

Building the Regression Model

modelMDR <- lm(TFR ~ MDR, data = regression)
summary(modelMDR) # Coefficient = 6.24, P = 0.00
## 
## Call:
## lm(formula = TFR ~ MDR, data = regression)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.21578 -0.10201 -0.02720  0.09028  0.22079 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -2.0870     0.3990  -5.231 2.05e-05 ***
## MDR           6.2454     0.7621   8.195 1.51e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1221 on 25 degrees of freedom
## Multiple R-squared:  0.7287, Adjusted R-squared:  0.7179 
## F-statistic: 67.16 on 1 and 25 DF,  p-value: 1.51e-08
# Regression Equation: TFR = -2.09 + 6.25x1

# Model Accuracy - R-Square is 0.73 - Model is significant.

Forecasting

forecasting <- read_excel("Forecasting.xlsx")

Time series plot of TFR

plot(forecasting$Year, forecasting$TFR, type = "o", col = "blue", 
     xlab = "Year", ylab = "Total Fertility Rate", 
     main = "Total Fertility Rate Over Time")

Interpretation: the time series plot exhibits a Trend Pattern as it is gradually decreasing.

Naive Forecast

tfr <- forecasting$TFR  # Dependent variable
years <- forecasting$Year  # Independent variable

Part A: Most Recent Value as Forecast

#MSE
forecast_a <- tfr[-length(tfr)]  # Exclude the last TFR
actual_a <- tfr[-1]  # Exclude the first TFR
mse_a <- mean((actual_a - forecast_a)^2)  
cat("Mean Square Error (MSE):", round(mse_a, 2), "\n")
## Mean Square Error (MSE): 0.01
# MAPE
mape_a <- mean(abs((actual_a - forecast_a) / actual_a)) * 100 
cat("Mean Absolute Percentage Error (MAPE):", round(mape_a, 2), "%\n") 
## Mean Absolute Percentage Error (MAPE): 6.43 %
# MAE
mae_a <- mean(abs(actual_a - forecast_a))
cat("Mean Absolute Error (MAE):", round(mae_a, 2), "\n")  
## Mean Absolute Error (MAE): 0.09

Part B: Average of All Data as Forecast

#MSE
cumulative_average <- cumsum(tfr[-length(tfr)]) / (1:(length(tfr) - 1))
cumulative_average  
##  [1] 2.455000 2.387500 2.315000 2.250000 2.176000 2.093333 2.037143 2.014375
##  [9] 1.977222 1.960500 1.938636 1.921250 1.908846 1.897857 1.889667 1.881563
## [17] 1.875000 1.852222 1.836579 1.828750 1.808333 1.787045 1.763043 1.738750
## [25] 1.713800 1.690769 1.668889 1.646786 1.625517 1.601167 1.583871 1.574062
## [33] 1.558636 1.547059 1.536429 1.526250 1.515405 1.503421 1.491795 1.479250
## [41] 1.466951 1.452738
forecast_b <- cumulative_average  
actual_b <- tfr[-1]  # Exclude the first TFR
mse_b <- mean((actual_b - forecast_b)^2) 
cat("Mean Square Error (MSE):", round(mse_b, 2), "\n")
## Mean Square Error (MSE): 0.18
# MAPE
mape_b <- mean(abs((actual_b - forecast_b) / actual_b)) * 100  
cat("Mean Absolute Percentage Error (MAPE):", round(mape_b, 2), "%\n")
## Mean Absolute Percentage Error (MAPE): 32.49 %
# MAE
mae_b <- mean(abs(actual_b - forecast_b))  
cat("Mean Absolute Error (MAE):", round(mae_b, 2), "\n") 
## Mean Absolute Error (MAE): 0.39

Moving Average(3-Year)

#Extract TFR and Year
forecasting1 <- data.frame(Year = forecasting$Year, TFR = forecasting$TFR)
forecasting1$avg_tfr3 <- c(
  NA, NA, NA,  
  (forecasting1$TFR[1] + forecasting1$TFR[2] + forecasting1$TFR[3]) / 3,
  (forecasting1$TFR[2] + forecasting1$TFR[3] + forecasting1$TFR[4]) / 3,
  (forecasting1$TFR[3] + forecasting1$TFR[4] + forecasting1$TFR[5]) / 3,
  (forecasting1$TFR[4] + forecasting1$TFR[5] + forecasting1$TFR[6]) / 3,
  (forecasting1$TFR[5] + forecasting1$TFR[6] + forecasting1$TFR[7]) / 3,
  (forecasting1$TFR[6] + forecasting1$TFR[7] + forecasting1$TFR[8]) / 3,
  (forecasting1$TFR[7] + forecasting1$TFR[8] + forecasting1$TFR[9]) / 3,
  (forecasting1$TFR[8] + forecasting1$TFR[9] + forecasting1$TFR[10]) / 3,
  (forecasting1$TFR[9] + forecasting1$TFR[10] + forecasting1$TFR[11]) / 3, 
  (forecasting1$TFR[10] + forecasting1$TFR[11] + forecasting1$TFR[12]) / 3,
  (forecasting1$TFR[11] + forecasting1$TFR[12] + forecasting1$TFR[13]) / 3,
  (forecasting1$TFR[12] + forecasting1$TFR[13] + forecasting1$TFR[14]) / 3,
  (forecasting1$TFR[13] + forecasting1$TFR[14] + forecasting1$TFR[15]) / 3,
  (forecasting1$TFR[14] + forecasting1$TFR[15] + forecasting1$TFR[16]) / 3,
  (forecasting1$TFR[15] + forecasting1$TFR[16] + forecasting1$TFR[17]) / 3,
  (forecasting1$TFR[16] + forecasting1$TFR[17] + forecasting1$TFR[18]) / 3,
  (forecasting1$TFR[17] + forecasting1$TFR[18] + forecasting1$TFR[19]) / 3,
  (forecasting1$TFR[18] + forecasting1$TFR[19] + forecasting1$TFR[20]) / 3,
  (forecasting1$TFR[19] + forecasting1$TFR[20] + forecasting1$TFR[21]) / 3,
  (forecasting1$TFR[20] + forecasting1$TFR[21] + forecasting1$TFR[22]) / 3,
  (forecasting1$TFR[21] + forecasting1$TFR[22] + forecasting1$TFR[23]) / 3,
  (forecasting1$TFR[22] + forecasting1$TFR[23] + forecasting1$TFR[24]) / 3,
  (forecasting1$TFR[23] + forecasting1$TFR[24] + forecasting1$TFR[25]) / 3,
  (forecasting1$TFR[24] + forecasting1$TFR[25] + forecasting1$TFR[26]) / 3,
  (forecasting1$TFR[25] + forecasting1$TFR[26] + forecasting1$TFR[27]) / 3,
  (forecasting1$TFR[26] + forecasting1$TFR[27] + forecasting1$TFR[28]) / 3,
  (forecasting1$TFR[27] + forecasting1$TFR[28] + forecasting1$TFR[29]) / 3,
  (forecasting1$TFR[28] + forecasting1$TFR[29] + forecasting1$TFR[30]) / 3,
  (forecasting1$TFR[29] + forecasting1$TFR[30] + forecasting1$TFR[31]) / 3,
  (forecasting1$TFR[30] + forecasting1$TFR[31] + forecasting1$TFR[32]) / 3,
  (forecasting1$TFR[31] + forecasting1$TFR[32] + forecasting1$TFR[33]) / 3,
  (forecasting1$TFR[32] + forecasting1$TFR[33] + forecasting1$TFR[34]) / 3,
  (forecasting1$TFR[33] + forecasting1$TFR[34] + forecasting1$TFR[35]) / 3,
  (forecasting1$TFR[34] + forecasting1$TFR[35] + forecasting1$TFR[36]) / 3,
  (forecasting1$TFR[35] + forecasting1$TFR[36] + forecasting1$TFR[37]) / 3,
  (forecasting1$TFR[36] + forecasting1$TFR[37] + forecasting1$TFR[38]) / 3,
  (forecasting1$TFR[37] + forecasting1$TFR[38] + forecasting1$TFR[39]) / 3,
  (forecasting1$TFR[38] + forecasting1$TFR[39] + forecasting1$TFR[40]) / 3,
  (forecasting1$TFR[39] + forecasting1$TFR[40] + forecasting1$TFR[41]) / 3,
  (forecasting1$TFR[40] + forecasting1$TFR[41] + forecasting1$TFR[42]) / 3
)

# Calculate the squared errors (only for years where moving average is available)
forecasting1 <- forecasting1 %>%
  mutate(
    squared_error = ifelse(is.na(avg_tfr3), NA, (TFR - avg_tfr3)^2)
  )
# Compute MSE (excluding the initial years with NA)
mse_ma <- mean(forecasting1$squared_error, na.rm = TRUE)
cat("Mean Square Error (MSE)", round(mse_ma,2), "\n")  
## Mean Square Error (MSE) 0.02
#MAPE
mape_ma <- mean(abs((forecasting1$TFR[!is.na(forecasting1$avg_tfr3)] - forecasting1$avg_tfr3[!is.na(forecasting1$avg_tfr3)]) / forecasting1$TFR[!is.na(forecasting1$avg_tfr3)])) * 100
cat("Mean Absolute Percentage Error (MAPE):", round(mape_ma ,2), "%\n") 
## Mean Absolute Percentage Error (MAPE): 7.95 %
# MAE 
mae_ma <- mean(abs(forecasting1$TFR[!is.na(forecasting1$avg_tfr3)] - forecasting1$avg_tfr3[!is.na(forecasting1$avg_tfr3)]))  
cat("Mean Absolute Error (MAE):", round(mae_ma, 2), "\n") 
## Mean Absolute Error (MAE): 0.11

ARIMA

arima_model <- auto.arima(tfr)
summary(arima_model)
## Series: tfr 
## ARIMA(0,1,0) with drift 
## 
## Coefficients:
##         drift
##       -0.0379
## s.e.   0.0173
## 
## sigma^2 = 0.0129:  log likelihood = 32.27
## AIC=-60.55   AICc=-60.24   BIC=-57.07
## 
## Training set error measures:
##                        ME      RMSE        MAE        MPE     MAPE      MASE
## Training set 5.797339e-05 0.1109015 0.08460947 0.03704928 6.018981 0.9501598
##                    ACF1
## Training set -0.1651491
rmse_value <- accuracy(arima_model)["Training set", "RMSE"]
cat("Mean Square Error (MSE):", rmse_value^2, "\n")
## Mean Square Error (MSE): 0.01229915

Exponential Smoothing

# ETS
tfr_data <- c(tfr)
tfr_ts <- ts(tfr_data, start = 1981, frequency = 1)
# Model
ets_model <- ets(tfr_ts)
summary(ets_model)
## ETS(A,Ad,N) 
## 
## Call:
## ets(y = tfr_ts)
## 
##   Smoothing parameters:
##     alpha = 0.2898 
##     beta  = 0.2898 
##     phi   = 0.8076 
## 
##   Initial states:
##     l = 2.7882 
##     b = -0.3214 
## 
##   sigma:  0.109
## 
##       AIC      AICc       BIC 
## -22.19980 -19.86646 -11.63260 
## 
## Training set error measures:
##                       ME      RMSE        MAE       MPE     MAPE      MASE
## Training set -0.01209323 0.1024611 0.07186762 -1.282984 5.334802 0.8070695
##                   ACF1
## Training set 0.1204511
rmse_value_ets <- accuracy(ets_model)["Training set", "RMSE"]
cat("Mean Square Error (MSE):", rmse_value_ets^2, "\n")
## Mean Square Error (MSE): 0.01049828
plot(ets_model)

Testing our Assumptions

# Residual Independence:
Box.test(residuals(ets_model), lag = 10, type = "Ljung-Box") #P-value above 0.05 means we can fail to reject the null and conclude the residuals are independent.
## 
##  Box-Ljung test
## 
## data:  residuals(ets_model)
## X-squared = 13.053, df = 10, p-value = 0.2207
# Residual Normality:
shapiro.test(residuals(ets_model)) # P-value below 0.05 means that we reject the null and conclude that the residuals are not normally distributed. However, visual inspection of the data through a histogram shows rough normality.
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(ets_model)
## W = 0.93537, p-value = 0.01784
hist(residuals(ets_model), main = "Histogram of Residuals", xlab = "Residuals")

qqnorm(residuals(ets_model))
qqline(residuals(ets_model))

# Constant Variance:
plot(residuals(ets_model), main = "Residuals Over Time", ylab = "Residuals") # Since we can see the residuals are centered around 0 and fluctuate symmetrically we can confirm the residuals do have constant variance. 

Forecast (Forecast 10-Year forecast 2025-2035)

ets_forecast <- forecast(ets_model, h = 10)
summary(ets_forecast)
## 
## Forecast method: ETS(A,Ad,N)
## 
## Model Information:
## ETS(A,Ad,N) 
## 
## Call:
## ets(y = tfr_ts)
## 
##   Smoothing parameters:
##     alpha = 0.2898 
##     beta  = 0.2898 
##     phi   = 0.8076 
## 
##   Initial states:
##     l = 2.7882 
##     b = -0.3214 
## 
##   sigma:  0.109
## 
##       AIC      AICc       BIC 
## -22.19980 -19.86646 -11.63260 
## 
## Error measures:
##                       ME      RMSE        MAE       MPE     MAPE      MASE
## Training set -0.01209323 0.1024611 0.07186762 -1.282984 5.334802 0.8070695
##                   ACF1
## Training set 0.1204511
## 
## Forecasts:
##      Point Forecast     Lo 80     Hi 80       Lo 95    Hi 95
## 2024      0.8285385 0.6888575 0.9682196  0.61491478 1.042162
## 2025      0.7953304 0.6376436 0.9530173  0.55416918 1.036492
## 2026      0.7685120 0.5820170 0.9550069  0.48329255 1.053731
## 2027      0.7468537 0.5246006 0.9691068  0.40694692 1.086760
## 2028      0.7293628 0.4676902 0.9910353  0.32916916 1.129556
## 2029      0.7152373 0.4126100 1.0178645  0.25240886 1.178066
## 2030      0.7038297 0.3599771 1.0476823  0.17795255 1.229707
## 2031      0.6946171 0.3100011 1.0792331  0.10639772 1.282837
## 2032      0.6871771 0.2626748 1.1116795  0.03795677 1.336398
## 2033      0.6811687 0.2178812 1.1444562 -0.02736840 1.389706
plot(ets_forecast)

fitted_values <- fitted(ets_model)
ets_model$states
## Time Series:
## Start = 1980 
## End = 2023 
## Frequency = 1 
##              l             b
## 1980 2.7881927 -0.3214076111
## 1981 2.5072896 -0.2809030861
## 1982 2.2919020 -0.2153876880
## 1983 2.1330400 -0.1588619085
## 1984 2.0193096 -0.1137304566
## 1985 1.9137071 -0.1056024567
## 1986 1.7854089 -0.1282982754
## 1987 1.6870723 -0.0983365890
## 1988 1.6793399 -0.0077323985
## 1989 1.6750964 -0.0042435197
## 1990 1.7117592  0.0366628764
## 1991 1.7351751  0.0234158666
## 1992 1.7471052  0.0119301093
## 1993 1.7576847  0.0105794506
## 1994 1.7629743  0.0052896827
## 1995 1.7694934  0.0065190212
## 1996 1.7704810  0.0009876235
## 1997 1.7709080  0.0004270460
## 1998 1.6824971 -0.0884109012
## 1999 1.5948398 -0.0876573081
## 2000 1.5692454 -0.0255944455
## 2001 1.5055166 -0.0637287609
## 2002 1.4209970 -0.0845196407
## 2003 1.3186175 -0.1023794772
## 2004 1.2197258 -0.0988916484
## 2005 1.1326568 -0.0870690346
## 2006 1.0776022 -0.0550546495
## 2007 1.0525173 -0.0250848293
## 2008 1.0374006 -0.0151166829
## 2009 1.0265858 -0.0108148056
## 2010 0.9822480 -0.0443378749
## 2011 0.9808010 -0.0014469689
## 2012 1.0637845  0.0829834578
## 2013 1.1117310  0.0479465337
## 2014 1.1546682  0.0429372100
## 2015 1.1851868  0.0305185672
## 2016 1.1982891  0.0131022821
## 2017 1.1845637 -0.0137253785
## 2018 1.1405916 -0.0439720924
## 2019 1.0891173 -0.0514742433
## 2020 1.0308694 -0.0582478867
## 2021 0.9812704 -0.0495990778
## 2022 0.9205759 -0.0606945020
## 2023 0.8696586 -0.0509172242
# Phi Value
ets_model$par
##      alpha       beta        phi          l          b 
##  0.2898120  0.2898120  0.8075877  2.7881927 -0.3214076
# Interpretation: Taiwan's fertility rate has steadily declined from 1980 to 2020, with forecasts suggesting stabilization at a low level and increasing uncertainty in future trends.

Conclusion and Reccomendations:

Our analysis confirms a significant relationship between marriage rates and fertility in Taiwan. Time series forecasting shows a steady decline from 1980 to 2020, with predictions suggesting stabilization at historically low levels. These findings provide valuable insights for policymakers, indicating that addressing marriage trends could be crucial in managing Taiwan’s demographic challenges, while carefully balancing social progress and women’s advancement in modern society.

We recommend the Taiwanese government to implement the following:

  1. Effect workplace policies that support married couples, such as more flexible work arrangements and paid parental leave. This would make family building more compatible with career advancement.

  2. Develop housing assistance programs specifically for young married couples, easing one of the biggest barriers to family building.

  3. Create incentive programs for companies that achieve high rates of employees who are married and/or have children. This encourages employers to have a more family-friendly corporate culture.