logo
# Fit the Prophet model
m = prophet::prophet(death_statistics, weekly.seasonality = TRUE, daily.seasonality = FALSE)
# Make future predictions
f = prophet::make_future_dataframe(m, periods=28, freq="month")
p = predict(m, f)
# Plot the forecast results
plot(m, p)

# Extract forecast data (make sure 'death_statistics' and 'p' are defined above)
forecast_data <- p[p$ds >= max(death_statistics$ds), ]
forecast_table <- forecast_data[, c("ds", "yhat", "yhat_lower", "yhat_upper")]

# Name the columns of the forecast table
colnames(forecast_table) <- c("Date", "Forecast (yhat)", "Lower Bound (yhat_lower)", "Upper Bound (yhat_upper)")

# Print the forecast table
print(forecast_table)
##           Date Forecast (yhat) Lower Bound (yhat_lower)
## 269 2025-02-23       11441.885                 9272.385
## 270 2025-03-23       10105.087                 8099.229
## 271 2025-04-23       13085.501                11095.152
## 272 2025-05-23       10627.674                 8534.373
## 273 2025-06-23       10158.710                 8085.413
## 274 2025-07-23        9942.582                 7818.954
## 275 2025-08-23        9946.602                 7885.250
## 276 2025-09-23       10347.539                 8132.547
## 277 2025-10-23       11106.651                 9158.449
## 278 2025-11-23       11029.027                 8983.023
## 279 2025-12-23       11498.100                 9467.176
## 280 2026-01-23       14935.430                12809.901
## 281 2026-02-23       11874.784                 9913.900
## 282 2026-03-23       10547.062                 8521.364
## 283 2026-04-23       12936.465                10870.480
## 284 2026-05-23       10492.847                 8507.801
## 285 2026-06-23       10009.817                 8049.739
## 286 2026-07-23        9789.775                 7861.544
## 287 2026-08-23        9223.760                 7265.953
## 288 2026-09-23       10188.275                 8042.201
## 289 2026-10-23       10942.766                 8830.488
## 290 2026-11-23       11453.473                 9371.775
## 291 2026-12-23       11360.872                 9242.463
## 292 2027-01-23       14782.928                12723.218
## 293 2027-02-23       11723.908                 9612.612
## 294 2027-03-23       10405.837                 8276.791
## 295 2027-04-23       12786.620                10764.240
## 296 2027-05-23        9774.305                 7722.387
## 297 2027-06-23        9860.866                 7642.428
##     Upper Bound (yhat_upper)
## 269                 13349.99
## 270                 12304.25
## 271                 15095.39
## 272                 12767.31
## 273                 12063.00
## 274                 12018.99
## 275                 12000.71
## 276                 12483.93
## 277                 13199.76
## 278                 13155.55
## 279                 13649.39
## 280                 16920.40
## 281                 13967.40
## 282                 12661.17
## 283                 15124.81
## 284                 12474.24
## 285                 11999.26
## 286                 11939.35
## 287                 11284.13
## 288                 12099.77
## 289                 12939.88
## 290                 13575.86
## 291                 13445.68
## 292                 16832.42
## 293                 13824.32
## 294                 12462.28
## 295                 14864.13
## 296                 11910.16
## 297                 11854.94
# Create a time series object for the 'death_statistics' data.
# Assuming 'dd.df' is the data frame with dates and death counts
dd.df <- data.frame(t = as.Date(death_statistics$ds), x = death_statistics$y)

# Convert to time series object with weekly frequency (frequency=52 for weekly data)
ts_data_for_decompose <- ts(dd.df$x, start = c(2020, 1), frequency = 52)

# Perform the decomposition
decomposed_data <- decompose(ts_data_for_decompose)

# Plot the decomposition
plot(decomposed_data)

Introduction

1.1 The times series data of deaths in England and Wales (Jan 2020 - Mar 2025, weekly)

death_statistics
## # A tibble: 269 × 2
##    ds                      y
##    <dttm>              <dbl>
##  1 2020-01-05 00:00:00 12254
##  2 2020-01-12 00:00:00 14058
##  3 2020-01-19 00:00:00 12990
##  4 2020-01-26 00:00:00 11856
##  5 2020-02-02 00:00:00 11612
##  6 2020-02-09 00:00:00 10986
##  7 2020-02-16 00:00:00 10944
##  8 2020-02-23 00:00:00 10841
##  9 2020-03-01 00:00:00 10816
## 10 2020-03-08 00:00:00 10895
## # ℹ 259 more rows

The data above shows the deaths in England and Wales over the period from January 2020 to March 2025 in weekly intervals, the date on the ds axis is the final day of the week in measure.

1.2 The purpose of this project.

I have decided to undertake this project because my family have long time been in the funeral industry and as a result I have been doing lots of data interpretation with funerals, deaths and and hospitals since I came to university and in other statistical modules to get a better understanding for my parents business and potentially help, as far as I am aware they do not do any kind of modelling like this.

1.3 Graph plots and tables

1.3.1 forecasting

plot(m,p)

print(forecast_table)
##           Date Forecast (yhat) Lower Bound (yhat_lower)
## 269 2025-02-23       11441.885                 9272.385
## 270 2025-03-23       10105.087                 8099.229
## 271 2025-04-23       13085.501                11095.152
## 272 2025-05-23       10627.674                 8534.373
## 273 2025-06-23       10158.710                 8085.413
## 274 2025-07-23        9942.582                 7818.954
## 275 2025-08-23        9946.602                 7885.250
## 276 2025-09-23       10347.539                 8132.547
## 277 2025-10-23       11106.651                 9158.449
## 278 2025-11-23       11029.027                 8983.023
## 279 2025-12-23       11498.100                 9467.176
## 280 2026-01-23       14935.430                12809.901
## 281 2026-02-23       11874.784                 9913.900
## 282 2026-03-23       10547.062                 8521.364
## 283 2026-04-23       12936.465                10870.480
## 284 2026-05-23       10492.847                 8507.801
## 285 2026-06-23       10009.817                 8049.739
## 286 2026-07-23        9789.775                 7861.544
## 287 2026-08-23        9223.760                 7265.953
## 288 2026-09-23       10188.275                 8042.201
## 289 2026-10-23       10942.766                 8830.488
## 290 2026-11-23       11453.473                 9371.775
## 291 2026-12-23       11360.872                 9242.463
## 292 2027-01-23       14782.928                12723.218
## 293 2027-02-23       11723.908                 9612.612
## 294 2027-03-23       10405.837                 8276.791
## 295 2027-04-23       12786.620                10764.240
## 296 2027-05-23        9774.305                 7722.387
## 297 2027-06-23        9860.866                 7642.428
##     Upper Bound (yhat_upper)
## 269                 13349.99
## 270                 12304.25
## 271                 15095.39
## 272                 12767.31
## 273                 12063.00
## 274                 12018.99
## 275                 12000.71
## 276                 12483.93
## 277                 13199.76
## 278                 13155.55
## 279                 13649.39
## 280                 16920.40
## 281                 13967.40
## 282                 12661.17
## 283                 15124.81
## 284                 12474.24
## 285                 11999.26
## 286                 11939.35
## 287                 11284.13
## 288                 12099.77
## 289                 12939.88
## 290                 13575.86
## 291                 13445.68
## 292                 16832.42
## 293                 13824.32
## 294                 12462.28
## 295                 14864.13
## 296                 11910.16
## 297                 11854.94

Graph:

The graph above shows the historic data taken from the imported death statistics and an additional forecast for 28 periods after that in months. the graph in the historical part seems to be repeating annually with the same effect meaning one cycle is a year. From this we can see it could be described as a spiking upward sine function.

I obtained this graph by doing the following code:
First i intalled the correct packages being ‘remotes’ which helps me install packages from ‘GitHub through which i installed ’Prophet’ package, the i loaded libraries i may need (Rcpp, prophet(TS forecasting), astsa (statistictial analysis not used here) and readxl(for reading my excel file)). Next i import my data ‘death_statistics.xlsx’ using the ‘readxl’. this i move to forecasting using ‘prophet’ to fit my data where i had to specify my data was weekly not daily, i stored this as ‘m’. I then forcast the data in the future using ‘make_future_dataframe’ for 28 periods with frequency of months, I store this as ‘f’. I can then use ‘predict’ for ‘m’ and ‘f’ to finally plot the predicted in ‘p’ against me TS model ‘m’.

Table:

I wanted to see the values for the predicted values in the the graph as it is difficult to see. maily to see exactly what months these spikes occur as i could also see a small spike near the start of the year. From this table i could see that small spike in in the 4th month out of the yeah likely being near the 23rd due to the symmetry of the 3rd and 5th month. i also found the trough is in 7th and 8th month likely around the 9th of the 8th month through symmetry again.

I produced the table though only considering the future values by taking the dates after the most recent in the origional data using

forecast_data <- p[p$ds >= max(death_statistics$ds), ]

Then I can take the relevant data from the columns of the forecast such as ‘yhat_lower’ ‘yhat_upper’ and ‘yhat’ and put it into a table before naming the columns.

1.3.2 Data decomposition

plot(decompose(ts_data_for_decompose))

Above is my decomposition of data for the period we can see the trend there is a spike at the start probably due to COVID but he seasonality is very similar to what we observed from our previous table meaning this data is reliant alone on its seasonal aspect not so much its trend later on after 2021. The seasonality as what might be expected suggest more deaths in the winter at the end of the year perhaps from illness like influenza and other viruses that are more prominent in the colder months.

I created this by assigning values to a time series for frequency of 52 due to it being in weeks starting at the start of 2020. and then plotting using the decompose of the time series into the plot.

plot(dd.df$t, dd.df$x, type='l',xlab='Time',ylab='Deaths',main='Weekly Death Statistics')
model <-lm(x~t,data=dd.df)
lines(dd.df$t,fitted(model),col='green')

The graph above shows a plot of observations from the data frame against time in the data frame. I then put onto the line of linear regression for the time which i don’t believe is very helpful due to the cyclical nature of the data. The line of linear regression is going down but very shallow gradient. I don’t believe wide spread immortality in humans is very soon, this downward nature is likely due to the spike in deaths during COVID.

I coded this graph by simply plotting from the data frame using the code in the R box above labelling as well. then making a linear model from the data from the data frame and fitting that as a line over the top of the other graph in green to keep it light hearted and not morbid.

1.3.3 Other time frames

f =prophet::make_future_dataframe(m, periods=1000, freq="day")
p =predict(m, f)
plot(m,p)

f =prophet::make_future_dataframe(m, periods=12, freq="quarter")
p =predict(m, f)
plot(m,p)

Above i have used two other time frames, day and quarter. The day requires me to have many periods, in this example 1000 which would can take a lot longer to load and it unnecessary to have make it a day due to the randomness of deaths in a day period and autocorrelation of the percent movements in day to day date being low I would consider this overkill especially as the original data is in weeks.
The other being every quarter does not give enough information, the users of this information will not be looking for 100 years into the future only likely 5 years making periods 20 is well within the power of R studios. also the graph does not look particularly helpful in identifying the individual peeks in January and April.