# Fit the Prophet model
m = prophet::prophet(death_statistics, weekly.seasonality = TRUE, daily.seasonality = FALSE)# Make future predictions
f = prophet::make_future_dataframe(m, periods=28, freq="month")
p = predict(m, f)# Extract forecast data (make sure 'death_statistics' and 'p' are defined above)
forecast_data <- p[p$ds >= max(death_statistics$ds), ]
forecast_table <- forecast_data[, c("ds", "yhat", "yhat_lower", "yhat_upper")]
# Name the columns of the forecast table
colnames(forecast_table) <- c("Date", "Forecast (yhat)", "Lower Bound (yhat_lower)", "Upper Bound (yhat_upper)")
# Print the forecast table
print(forecast_table)## Date Forecast (yhat) Lower Bound (yhat_lower)
## 269 2025-02-23 11441.885 9272.385
## 270 2025-03-23 10105.087 8099.229
## 271 2025-04-23 13085.501 11095.152
## 272 2025-05-23 10627.674 8534.373
## 273 2025-06-23 10158.710 8085.413
## 274 2025-07-23 9942.582 7818.954
## 275 2025-08-23 9946.602 7885.250
## 276 2025-09-23 10347.539 8132.547
## 277 2025-10-23 11106.651 9158.449
## 278 2025-11-23 11029.027 8983.023
## 279 2025-12-23 11498.100 9467.176
## 280 2026-01-23 14935.430 12809.901
## 281 2026-02-23 11874.784 9913.900
## 282 2026-03-23 10547.062 8521.364
## 283 2026-04-23 12936.465 10870.480
## 284 2026-05-23 10492.847 8507.801
## 285 2026-06-23 10009.817 8049.739
## 286 2026-07-23 9789.775 7861.544
## 287 2026-08-23 9223.760 7265.953
## 288 2026-09-23 10188.275 8042.201
## 289 2026-10-23 10942.766 8830.488
## 290 2026-11-23 11453.473 9371.775
## 291 2026-12-23 11360.872 9242.463
## 292 2027-01-23 14782.928 12723.218
## 293 2027-02-23 11723.908 9612.612
## 294 2027-03-23 10405.837 8276.791
## 295 2027-04-23 12786.620 10764.240
## 296 2027-05-23 9774.305 7722.387
## 297 2027-06-23 9860.866 7642.428
## Upper Bound (yhat_upper)
## 269 13349.99
## 270 12304.25
## 271 15095.39
## 272 12767.31
## 273 12063.00
## 274 12018.99
## 275 12000.71
## 276 12483.93
## 277 13199.76
## 278 13155.55
## 279 13649.39
## 280 16920.40
## 281 13967.40
## 282 12661.17
## 283 15124.81
## 284 12474.24
## 285 11999.26
## 286 11939.35
## 287 11284.13
## 288 12099.77
## 289 12939.88
## 290 13575.86
## 291 13445.68
## 292 16832.42
## 293 13824.32
## 294 12462.28
## 295 14864.13
## 296 11910.16
## 297 11854.94
# Create a time series object for the 'death_statistics' data.
# Assuming 'dd.df' is the data frame with dates and death counts
dd.df <- data.frame(t = as.Date(death_statistics$ds), x = death_statistics$y)
# Convert to time series object with weekly frequency (frequency=52 for weekly data)
ts_data_for_decompose <- ts(dd.df$x, start = c(2020, 1), frequency = 52)
# Perform the decomposition
decomposed_data <- decompose(ts_data_for_decompose)
# Plot the decomposition
plot(decomposed_data)## # A tibble: 269 × 2
## ds y
## <dttm> <dbl>
## 1 2020-01-05 00:00:00 12254
## 2 2020-01-12 00:00:00 14058
## 3 2020-01-19 00:00:00 12990
## 4 2020-01-26 00:00:00 11856
## 5 2020-02-02 00:00:00 11612
## 6 2020-02-09 00:00:00 10986
## 7 2020-02-16 00:00:00 10944
## 8 2020-02-23 00:00:00 10841
## 9 2020-03-01 00:00:00 10816
## 10 2020-03-08 00:00:00 10895
## # ℹ 259 more rows
The data above shows the deaths in England and Wales over the period from January 2020 to March 2025 in weekly intervals, the date on the ds axis is the final day of the week in measure.
I have decided to undertake this project because my family have long time been in the funeral industry and as a result I have been doing lots of data interpretation with funerals, deaths and and hospitals since I came to university and in other statistical modules to get a better understanding for my parents business and potentially help, as far as I am aware they do not do any kind of modelling like this.
## Date Forecast (yhat) Lower Bound (yhat_lower)
## 269 2025-02-23 11441.885 9272.385
## 270 2025-03-23 10105.087 8099.229
## 271 2025-04-23 13085.501 11095.152
## 272 2025-05-23 10627.674 8534.373
## 273 2025-06-23 10158.710 8085.413
## 274 2025-07-23 9942.582 7818.954
## 275 2025-08-23 9946.602 7885.250
## 276 2025-09-23 10347.539 8132.547
## 277 2025-10-23 11106.651 9158.449
## 278 2025-11-23 11029.027 8983.023
## 279 2025-12-23 11498.100 9467.176
## 280 2026-01-23 14935.430 12809.901
## 281 2026-02-23 11874.784 9913.900
## 282 2026-03-23 10547.062 8521.364
## 283 2026-04-23 12936.465 10870.480
## 284 2026-05-23 10492.847 8507.801
## 285 2026-06-23 10009.817 8049.739
## 286 2026-07-23 9789.775 7861.544
## 287 2026-08-23 9223.760 7265.953
## 288 2026-09-23 10188.275 8042.201
## 289 2026-10-23 10942.766 8830.488
## 290 2026-11-23 11453.473 9371.775
## 291 2026-12-23 11360.872 9242.463
## 292 2027-01-23 14782.928 12723.218
## 293 2027-02-23 11723.908 9612.612
## 294 2027-03-23 10405.837 8276.791
## 295 2027-04-23 12786.620 10764.240
## 296 2027-05-23 9774.305 7722.387
## 297 2027-06-23 9860.866 7642.428
## Upper Bound (yhat_upper)
## 269 13349.99
## 270 12304.25
## 271 15095.39
## 272 12767.31
## 273 12063.00
## 274 12018.99
## 275 12000.71
## 276 12483.93
## 277 13199.76
## 278 13155.55
## 279 13649.39
## 280 16920.40
## 281 13967.40
## 282 12661.17
## 283 15124.81
## 284 12474.24
## 285 11999.26
## 286 11939.35
## 287 11284.13
## 288 12099.77
## 289 12939.88
## 290 13575.86
## 291 13445.68
## 292 16832.42
## 293 13824.32
## 294 12462.28
## 295 14864.13
## 296 11910.16
## 297 11854.94
Graph:
The graph above shows the historic data taken from the imported death statistics and an additional forecast for 28 periods after that in months. the graph in the historical part seems to be repeating annually with the same effect meaning one cycle is a year. From this we can see it could be described as a spiking upward sine function.
I obtained this graph by doing the following code:
First i intalled the correct packages being ‘remotes’ which helps me
install packages from ‘GitHub through which i installed ’Prophet’
package, the i loaded libraries i may need (Rcpp, prophet(TS
forecasting), astsa (statistictial analysis not used here) and
readxl(for reading my excel file)). Next i import my data
‘death_statistics.xlsx’ using the ‘readxl’. this i move to forecasting
using ‘prophet’ to fit my data where i had to specify my data was weekly
not daily, i stored this as ‘m’. I then forcast the data in the future
using ‘make_future_dataframe’ for 28 periods with frequency of months, I
store this as ‘f’. I can then use ‘predict’ for ‘m’ and ‘f’ to finally
plot the predicted in ‘p’ against me TS model ‘m’.
Table:
I wanted to see the values for the predicted values in the the graph as it is difficult to see. maily to see exactly what months these spikes occur as i could also see a small spike near the start of the year. From this table i could see that small spike in in the 4th month out of the yeah likely being near the 23rd due to the symmetry of the 3rd and 5th month. i also found the trough is in 7th and 8th month likely around the 9th of the 8th month through symmetry again.
I produced the table though only considering the future values by taking the dates after the most recent in the origional data using
Then I can take the relevant data from the columns of the forecast such as ‘yhat_lower’ ‘yhat_upper’ and ‘yhat’ and put it into a table before naming the columns.
Above is my decomposition of data for the period we can see the trend there is a spike at the start probably due to COVID but he seasonality is very similar to what we observed from our previous table meaning this data is reliant alone on its seasonal aspect not so much its trend later on after 2021. The seasonality as what might be expected suggest more deaths in the winter at the end of the year perhaps from illness like influenza and other viruses that are more prominent in the colder months.
I created this by assigning values to a time series for frequency of 52 due to it being in weeks starting at the start of 2020. and then plotting using the decompose of the time series into the plot.
plot(dd.df$t, dd.df$x, type='l',xlab='Time',ylab='Deaths',main='Weekly Death Statistics')
model <-lm(x~t,data=dd.df)
lines(dd.df$t,fitted(model),col='green')The graph above shows a plot of observations from the data frame against time in the data frame. I then put onto the line of linear regression for the time which i don’t believe is very helpful due to the cyclical nature of the data. The line of linear regression is going down but very shallow gradient. I don’t believe wide spread immortality in humans is very soon, this downward nature is likely due to the spike in deaths during COVID.
I coded this graph by simply plotting from the data frame using the code in the R box above labelling as well. then making a linear model from the data from the data frame and fitting that as a line over the top of the other graph in green to keep it light hearted and not morbid.
Above i have used two other time frames, day and quarter. The day
requires me to have many periods, in this example 1000 which would can
take a lot longer to load and it unnecessary to have make it a day due
to the randomness of deaths in a day period and autocorrelation of the
percent movements in day to day date being low I would consider this
overkill especially as the original data is in weeks.
The other being every quarter does not give enough information, the
users of this information will not be looking for 100 years into the
future only likely 5 years making periods 20 is well within the power of
R studios. also the graph does not look particularly helpful in
identifying the individual peeks in January and April.