Homework 3: Questions 1 & 2 from Chapter 4

Problem 1: A large medical clinic would like to forecast daily patient visits for purposes of staffing.

A. If data is only available for the last month, how does this affect the choice of model-based vs. data-driven methods?.

Per definition, model-based methods are especially advantageous when the series at hand is very short, whereas data-driven methods are advantageous when model assumptions are likely to be violated, or when the structure of the time series changes over time. Since the data at hand is only available for the last month, model-based method seems to be a logical choice as there is not enough data for a data-driven method.

B. The clinic has access to the admissions data of a nearby hospital. Under what conditions will including the hospital information be potentially useful for forecasting the hospital’s daily visits?.

The hospitals need to be comparable in terms of the scope and type of work, number of personnel and admission, general cost of services, their data collection methods. That said, even geographic location has a potential of skewing the data. However, data from a nearly hospital can be incredibly useful in terms of determining seasonal fluctiations in admission of patients.

C. Thus far, the clinic administrator takes a heuristic approach, using visit numbers from the same day of the previous week as a forecast. What is the advantage of this approach? What is the disadvantage?.

It is clear that the clinic administrator is doing the best she can in terms of forecasting the patients’ admissions given the limited data she has. This relatively simpe approach gives an idea as to how many staff will be needed on a particular day of a week. Being a data-driven method, naive forecast requires less user input and therefore is a good candidate for automation. This heuristic approach, however, does not prepare the clinic administrator for fluctuations that stem from seasonal changes that the hospital will - most likely - experience in the future. Additionally, not being able to compare the data to that of the previous year’s puts the administrator in disadvantage since it is impossible for her to tell whether or not today is more or less busy as compared to this day last month, quarter, year and therefore makes it more challenging to correctly schedule the number of her staff.

D. What level of automation appears to be required for this task? Explain.

As stated in my previous response, naive forecast being a data-driven method, is an excellent candidate for automation.

However, because of the type of data at hand, the model-based method was adviced, which may be harder to automate as compared to the data-driven methods. More automation is usually required when many series are to be forecasted on a continuous basis, which is desirable in this case. Models that are based on many assumptions for producing adequate forecasts are less likely to be useful for automation, as they require constantly evaluating whether the assumptions are met. For instance, in the clinics case, we need to take into consideration not only the number of patients and staff, but also the time it takes for a nurse to do particular tasks, how well the nurse is trained and whether or not he or she is qualified to be doing the more complex tasks. That said, even with an automated system in place, it is advisable to monitor the forecasts and forecast errors produced by an automated system and occasionally re-examine their suitability and update the system accordingly.

Based on all the factors presented above, I recommend automating the system to at least some degree.

E. Describe two approaches for improving the current heuristic (naive) forecasting approach using ensembles..

Approach one: Combining multiple forecasting methods. It can definitely lead to improved performance as compared to the current approach. Combining methods can be done via two-level methods, where the first method uses the original time series to generate forecasts of future values, and the second method uses the forecast errors from the first layer to generate forecasts of future forecast errors, thereby “correcting” the first level forecasts. This can certainly be done with the data on hand as well as the data that will be received from a nearby hospital.

Approach two: Ensemble approach. In this case, multiple methods are applied to the time series, each generating separate forecasts. The resulting forecasts are then averaged in some way to produce the final forecast. Combining methods can take advantage of the capability of different forecasting methods to capture different aspects of the time series. Averaging across multiple methods can lead to forecasts that are more robust and are of higher precision. In the case of the clinics, we can average the current forecasting approach with the seasonal naive forecast that we will get from the data received from another hospital. We can even use different methods for forecasting different horizons or periods, for example, when one method performs better for one-step-ahead forecasting and another method for two or more periods ahead forecasting and another method for two or more periods ahead, or when one method is better at forecasting weekends and another at weekdays. Another approach is to use different series measuring admissions and take an average of multiple resulting forecasts. As in the clinics example, when a series of interest is available from multiple sources, each with slightly different numbers or precision, we can create ensembles of forecasts, each based on a different series. This would mean getting data from the intake sheet that the nurses fill out manually as well as electronic records of patients being admitted.

Problem 2: Renewable energy.

A. For each of the four types of methods for predicting wind power and speed (persistence, physical, statistical and hybrid), describe whether they’re model-based, data-driven or a combination..

Persistence Method: Naive is data-driven: it uses past speeds to predict future speeds.

Physical Approach: Model-based: it uses parameterizations based on the current atmospheric conditions.

Statistical Approach: Data-driven: it isn’t based on any predefined model and relies on patterns in past wind speeds.

Hybrid Approach: This is a combination approach, using the other three methods described above.

B. For each method, describe whether it’s based on extrapolation, causal modeling, correlation or a combination..

Persistence Method: It is clearly an extrapolation method by definition: “the action of estimating or concluding something by assuming that existing trends will continue or a current method will remain applicable.”

Physical Approach: I think it is a causal method as it correlates predetermined parameters with atmospheric conditions. The model expresses more than correlation because correlation does not imply causation and this model does.

Statistical Approach: I think it is based on correlation, since it uses difference between the predicted and the actual wind speeds.

Hybrid Approach: A combination method combining two or more of the above to create an ensemble forecast.

C. Describe the advantages and disadvantages of the hybrid approach..

The main advantage of the hybrid approach lies in its ability to generate a more accurate forecast as compared to using just physical or statistical model by itself. However, the major disadvantage of the approach lies in its complexity that requires an advanced skill level to be carried out successfully and is, therefore, more expensive.

Assignment 3

Viacheslav Tomenko

February 13, 2017