In this document, I will be analysing the ‘birth’ dataset from ‘astsa’. The birth dataset contains U.S. Monthly Live Births in thousands for the United States between 1948-1979. I will begin by plotting the data to spot trends, seasonality and to see how birth rates evolved over 30 years. I will then run a linear regression, followed by forecasting the data for different time lengths, and finally, splitting the dataset into components and plotting this.
Below is a summary of the birth dataset from astsa:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 238.0 284.0 310.0 310.9 336.0 399.0
We can see that highest number of births was 399 (thousand) and the lowest at 238 (thousand).
Below is a plot of births:
plot(birth, type = "l", col="pink",main="Monthly Live Births for the United States 1948-1979", xlab="Years",ylab="Births (Thousands)")In this plot, we can clear identify the maximum of 399 births between 1960-1965. We can also see the minimum of 238 births in later years between 1970-1975.
Whilst it looks like there is a steady increase (aside from the seasonal changes) between 1949 and the early 1960s, there seems to be a fall in monthly births after the early 1960s until the late 1960s, where we see a sudden increase around the early 1970s.
Here we run a linear regression model on the dataset:
birth.df = data.frame(
ds=zoo::as.yearmon(time(birth)),
y=birth)
model <- lm(y~ds, data = birth.df)
summary(model)##
## Call:
## lm(formula = y ~ ds, data = birth.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -78.810 -22.411 -1.172 21.602 82.170
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4297.3111 342.2257 12.56 <2e-16 ***
## ds -2.0303 0.1743 -11.65 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 30.2 on 371 degrees of freedom
## Multiple R-squared: 0.2678, Adjusted R-squared: 0.2658
## F-statistic: 135.7 on 1 and 371 DF, p-value: < 2.2e-16
Below we generate a forecast for the next 12 months after 1979:
birth.df = data.frame(
ds=zoo::as.yearmon(time(birth)),
y=birth)
m = prophet::prophet(birth.df, weekly.seasonality = TRUE, daily.seasonality = FALSE)
f = prophet::make_future_dataframe(m, periods=12, freq="month")
p = predict(m, f)
plot(m,p,xlab="Years",ylab="Births (Thousands)")The plot above forecasts monthly live births for the next 12 months after 1979. We can see the same seasonality as with the previous plotting of the data, with no sudden increases or decreases.
Now what if we forecast for an extended period of time to see if we observe the same fluctuations as in the past decades.
Above, we forecast the dataset for the next 5 years.
birth.df = data.frame(
ds=zoo::as.yearmon(time(birth)),
y=birth)
m = prophet::prophet(birth.df, weekly.seasonality = TRUE, daily.seasonality = FALSE)
f = prophet::make_future_dataframe(m, periods=5, freq="years")
p = predict(m, f)
plot(m,p,xlab="Years",ylab="Births (Thousands)")Below, we forecast the dataset for the next 10 years.
birth.df = data.frame(
ds=zoo::as.yearmon(time(birth)),
y=birth)
m = prophet::prophet(birth.df, weekly.seasonality = TRUE, daily.seasonality = FALSE)
f = prophet::make_future_dataframe(m, periods=10, freq="years")
p = predict(m, f)
plot(m,p,xlab="Years",ylab="Births (Thousands)")birth.df = data.frame(
ds=zoo::as.yearmon(time(birth)),
y=birth)
m = prophet::prophet(birth.df, weekly.seasonality = TRUE, daily.seasonality = FALSE)
f = prophet::make_future_dataframe(m, periods=12, freq="month")
forecast <- predict(m,f)
#plot(m, forecast)
prophet_plot_components(m, forecast)From looking at the yearly component, we can clearly see that the plot peaks around September/October, but dips around February/March.
The most common day of the week appears to be Tuesday, whilst the least common appears to be Friday.