A large medical clinic would like to forecast daily patient visits for purposes of staffing.
The length of the time series necessitates a model-based method because there isn’t enough data for a data-driven method. Since one month alone can’t reflect seasonality, a model could take this data and combine it with data obtained elsewhere and begin to transform it.
There are several caveats here with many things unclear, including if there is enough data available from the other hospital to gauge seasonality, whether that hospital is comparable in capacity and clientele and whether the other hospital collects its data in the same way that our hospital does. If these answers are affirmative, it could be valuable to forecast fluctuations in patient counts using the meager data from our hospital.
Based on the limited data she has, using the same day of a previous week may be the best way easily available to plan for the number of patients coming through the door today. However, there’s no context for last week’s data, so she doesn’t know objectively whether it was a busy, average or slow time at the hospital.
Model-based methods are harder to automate than data-driven methods, but this seems to be a task that requires at least a moderate level of automation. The forecast must be continuous to evaluate staffing levels each day and the hospital doesn’t seem to have much forecasting expertise. In this case, a model-based method would require regular checking to see if it’s sticking relatively well to observed patterns.
1. Simple weighted averaging: This could begin a model that transforms our hospital’s data for the past week by weighting it according to fluctuations in the other hospital’s figures — for the same time of year and going back as many years as possible. That would create a series of predicted values for each day of past years that could be averaged to estimate visitation day by day into the future. It’s slightly more advanced than the current heuristic method, but it’s still highly speculative because we don’t know what our past visitation data is.
2. A more advanced multi-level analysis: This would use the weighted average work done in the previous method as a jumping-off point to compare values for each day in the past to current observed values for certain days. The differences — or errors — could be measured with another model so the model can be adjusted to get predictions closer to observed values than the other method could. It’s a much better approach that will always improve with more time and data.
Persistence is data-driven, since it uses past speeds at a certain time to predict future speeds.
Physical is model-based, since it creates speed data based on current atmospheric conditions.
Statistical is data-driven. The giveaway here is that it isn’t based on any predefined model and relies on patterns in immediate past wind speeds. Patterns are best detected by data-driven methods.
Hybrid is a combination of the two, since it mixes physical and statistical methods.
Persistence is based on correlation, since it links future speeds to past speeds.
Physical is based on causal modeling, since it assumes that other atmospheric conditions cause changes in wind speeds.
Statistical is based on extrapolation, since it uses past training data as a base for modeling wind speeds.
Hybrid is based on a combination of causal modeling and extrapolation, since it — again — mixes physical and statistical methods.
A major advantage of the hybrid approach is that it’s guaranteed to have a higher level of certainty and robustness than either the physical and statistical models alone. However, it is harder to do, taking people who are experts in various methods and increasing costs.