Question 1

A large medical clinic would like to forecast daily patient visits for the purpose of staffing.


A): If data is available only for the last month, how does this affect the choice of model-based vs. data-driven methods?

This would push the analyst towards choosing a model-based approach. Model-based approaches are especially useful when there is little data available; data-driven methods tend to require more data. However, it depends on the time-series and the particular type of approach. A naive model, which is a data-driven method, tends to require very little data, especially if there is not significant seasonality.


B): The clinic has access to the admissions data of a nearby hospital. Under what conditions will including the hospital information be potentially useful for forecasting the clinic’s daily visits?

If admissions data for the nearby hospital are correlated to daily patient visit data at the clinic, then the hospital data could be useful.


C): Thus far, the clinic administrator takes a heuristic approach, using visit numbers the same day of the previous week as a forecast. What is the advantage of this approach? What is the disadvantage?

The advantage of this approach is that it is simple (it is a seasonal naïve forecast where the seasons are days of the week), the administrator can quickly and easily do this without the need of an experienced analyst. The disadvantage is that the administrator may be missing out on systematic variations that could be captured by a more complex model and thus lead to better predictions.


D): What level of automation appears to be required for this task?

It would be useful to automate this task since it is trying to forecast daily patient visits, so if it were automated a new forecast could easily be generated each day.


E): Describe two approaches for improving the current heuristic (naive) forecasting approach using ensembles.

In addition to the naïve approach currently used, I would suggest creating two other models:

  1. Using just the clinic’s daily patient visits data, create a linear model that uses the number of patient visits on the same day last week, any observed general trend, and the patient visits of the most recent day as predictors.
  2. A model that uses both the clinic’s daily patient visit data as well as the nearby hospital admissions data.

Then combine the forecasts of these three models. Unless there is strong evidence that one model is much better than the others, I would weight the forecasts from each model equally when combining them.



Question 2

The ability to scale up renewable energy, in in particular wind power and speed, is dependent on the ability to forecast its short-term availability. Soman et al. (2010) describe different methods for wind power forecasting (the quote is slightly edited for brevity):

Persistance Method: This method is also known as ‘Naive Predictor’. It is assumed that the wind speed at time t + \(\delta\)t will be the same as it was at time t. Unbelieveably, it is more accurate than most of the physical and statistical methods for very-short to short term forecasts…

Physical Approach: Physical systems use parametrizations based on a detailed physical description of the atmosphere…

Statistical Approach: The statistical approach is based on training with measurement data and uses differences between the predicted and the actual wind speeds in immediate past to tune model parameters. It is easy to model, inexpensive, and provides timely predictions. It is not based on any predefined mathematical model and is rather based on patterns…

Hybrid Approach: In general, the combination of different approaches such as mixing physical and statistical approaches or combining short term and medium-term models, etc., is referred to as a hybrid approach.


A): For each of the four types of methods, describe whether it is model-based, data-driven, or a combination.

Persistence Method: data-driven - naïve forecasts are the most basic types of data-driven models

Physical Approach: model-based - Assumes a relationship between physical characteristics/features of the atmosphere and wind speed. The model parameters in this case would likely be coefficients relating the values the features of the atmosphere to future wind speed.

Statistical Approach: data-driven - Soman et al. say in the description that the approach “.is not based on any predefined mathematical model and rather it is based on patterns.”, this is pretty much the definition of a data-driven approach.

Hybrid Approach: combination - It combines multiple models


B): For each of the four types of methods, describe whether it is based on extrapolation, causal modeling, correlation modeling, or a combination.

Persistence Method: extrapolation - forecasts of a time-series based on its history are extrapolations.

Physical Approach: this is either based on correlation or causation. It is unclear if the modelers assume that the atmospheric features actually effect windspeed and thus have a causal relationship with wind, or are just correlated with certain windspeed.

Statistical Approach: extrapolation - creates forecasts of future windspeeds using historical windspeed data

Hybrid Approach: combination - it combines several different methods creating an ensemble forecast


C): Describe the advantages and disadvantages of the hybrid approach.

Advantages: Combined forecasts tend to have reduced errors compared with the component forecasts that make up the combined forecast. The error of the combined forecast is always less than the average error of its component forecasts, and is often less than the most accurate of its combined forecasts. If you don’t know which method is best, then the hybrid approach is a good option because it combines them all.

Disadvantages: It requires creating multiple different forecasts to combine and is thus inherently costlier. It is much more complicated than persistence method and requires a forecaster (or forecasters) who are familiar with multiple methods and how to combine forecasts in a structured way. If a rule for combining forecasts is not made in advance, then a forecaster may (perhaps inadvertently) combine forecasts in a biased way.