logo

Introduction:

In this document, I will be analysing the ‘birth’ dataset from ‘astsa’. The birth dataset contains U.S. Monthly Live Births in thousands for the United States between 1948-1979. I will begin by plotting the data to spot trends, seasonality and to see how birth rates evolved over 30 years. I will then run a linear regression, followed by forecasting the data for different time lengths, and finally, splitting the dataset into components and plotting this.

Below is a summary of the birth dataset from astsa:

summary(birth)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   238.0   284.0   310.0   310.9   336.0   399.0

We can see that highest number of births was 399 (thousand) and the lowest at 238 (thousand).

Plotting the dataset:

Below is a plot of births:

plot(birth, type = "l", col="pink",main="Monthly Live Births for the United States 1948-1979", xlab="Years",ylab="Births (Thousands)")

In this plot, we can clear identify the maximum of 399 births between 1960-1965. We can also see the minimum of 238 births in later years between 1970-1975.

Whilst it looks like there is a steady increase (aside from the seasonal changes) between 1949 and the early 1960s, there seems to be a fall in monthly births after the early 1960s until the late 1960s, where we see a sudden increase around the early 1970s.

Linear Regression Model:

Here we run a linear regression model on the dataset:

birth.df = data.frame(
    ds=zoo::as.yearmon(time(birth)),
    y=birth)
model <- lm(y~ds, data = birth.df)
summary(model)
## 
## Call:
## lm(formula = y ~ ds, data = birth.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -78.810 -22.411  -1.172  21.602  82.170 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4297.3111   342.2257   12.56   <2e-16 ***
## ds            -2.0303     0.1743  -11.65   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 30.2 on 371 degrees of freedom
## Multiple R-squared:  0.2678, Adjusted R-squared:  0.2658 
## F-statistic: 135.7 on 1 and 371 DF,  p-value: < 2.2e-16
  • The median residual at -1.172 is close to zero, suggesting that the predicted values are not far from the actual values.
  • From the initial plot of the dataset, we can see that births do decrease over time (aside the fluctuations), and in this summary, our slope coefficient sits at -2.0303 suggesting a slow fall in monthly live births.
  • The p-values are very small implying high significance among the coefficients.
  • Whilst the model may be statistically significant, the \(R^2\) and adjusted \(R^2\) suggest that only 27% of variability is explained. There may be other factors to consider such as policies or the economy etc.

Forecasting:

Below we generate a forecast for the next 12 months after 1979:

birth.df = data.frame(
    ds=zoo::as.yearmon(time(birth)),
    y=birth)
m = prophet::prophet(birth.df, weekly.seasonality = TRUE, daily.seasonality = FALSE)
f = prophet::make_future_dataframe(m, periods=12, freq="month")
p = predict(m, f)
plot(m,p,xlab="Years",ylab="Births (Thousands)")

The plot above forecasts monthly live births for the next 12 months after 1979. We can see the same seasonality as with the previous plotting of the data, with no sudden increases or decreases.

Now what if we forecast for an extended period of time to see if we observe the same fluctuations as in the past decades.

Above, we forecast the dataset for the next 5 years.

birth.df = data.frame(
    ds=zoo::as.yearmon(time(birth)),
    y=birth)
m = prophet::prophet(birth.df, weekly.seasonality = TRUE, daily.seasonality = FALSE)
f = prophet::make_future_dataframe(m, periods=5, freq="years")
p = predict(m, f)
plot(m,p,xlab="Years",ylab="Births (Thousands)")

Below, we forecast the dataset for the next 10 years.

birth.df = data.frame(
    ds=zoo::as.yearmon(time(birth)),
    y=birth)
m = prophet::prophet(birth.df, weekly.seasonality = TRUE, daily.seasonality = FALSE)
f = prophet::make_future_dataframe(m, periods=10, freq="years")
p = predict(m, f)
plot(m,p,xlab="Years",ylab="Births (Thousands)")

The Components:

birth.df = data.frame(
    ds=zoo::as.yearmon(time(birth)),
    y=birth)
m = prophet::prophet(birth.df, weekly.seasonality = TRUE, daily.seasonality = FALSE)
f = prophet::make_future_dataframe(m, periods=12, freq="month")
forecast <- predict(m,f)
#plot(m, forecast)
prophet_plot_components(m, forecast)

From looking at the yearly component, we can clearly see that the plot peaks around September/October, but dips around February/March.

The most common day of the week appears to be Tuesday, whilst the least common appears to be Friday.