For this assignment I will be completing the assigned problems (Problems 1 and 2) from Chapter 4 in the Practical Time Series Forecasting with R textbook by, Galit Schmueli and Kenneth C. Lichtendahl Jr.
A large medical clinic would like to forecast daily patient visits for purposes of staffing.
The amount of data available plays a large role in which type of method should be used to perform the time series analysis. Model-based methods are helpful when the series is short because, as the book mentions, with an assumed underlying model, few data points are needed to estimate the model parameters. If only one month’s worth of data is available a model-based method would be preferred. Data-driven methods are more helpful when there is a longer time series because they are useful when the model assumptions are likely to be violated, or when the structure of the time series changes over time, which is probably more likely to happen the longer the time series becomes. Additionally data-driven forecasting methods require less user input, this has it’s benefits but among its disadvantages is that this means the method requires more data (i.e. a longer time series) in order for the model to have adequate learning.
The clinic could use that external data in their forecasting analysis if a few conditions are met. First, the analyst would want to determine how the two time series are correlated using admissions data. Secondly, the most important thing would be for the analyst to make sure that whatever external data/information is being integrated (in this case the admissions data of the nearby hospital) must be available at the time of prediction. Consequently, this means that the admissions data must be the available values (or estimates) from past weeks.
The advantage of using the seasonal naive approach using the visit numbers from the same day of the previous week as a forecast is that if the data does show seasonality related to the day of the week, then the forecast will include this same seasonality. This makes for a simple way to start the analysis and have a benchmark to compare other models/methods to after further analysis. However, the disadvantage is really that if last week’s data was not representative of a “normal” week or if there is not seasonality relating to the day but maybe the time of day or something else, this forecast will not capture that seasonality. For example, hospital admissions may be different in different locations or by population or by time of year, there may be more seasonality within one of these relationships that may not be captured.
The level of automation that is required depends on how often this information will need to be forecasted (i.e. on-going forecasting or one-off), expertise level, how many series are being incorporated, etc. For this problem, it would make sense to have this forecast automated because it is likely that daily patient visits will continue to be used for staffing purposes. If we consider what was said in part a of this question, the clinic only has last month’s data, therefore the automated process would not be ideal because automated processes prefer larger data sets. However, I would think that the goal would be for the amount of data captured/available to grow over time. If an automated process could be set up that was continually updating on more of the data that would be available this could help the automation forecast improve. However, unless all the characteristics (seasonality, trend) are correctly specified the data-driven model used for the automated process may not perform very well.
Two approaches to imrpove the heuristic (seasonal naive) forecasting approach using ensembles would be; 1.Using more than one seasonal naive forecast using different season indicators. Instead of just using the day of the week as the seasonality indicator, using the time of year or time or day to develop different forecasts and then average them could be helpful. Additionally, different methods for forecasting different horizons and periods. 2. Similarly to the Amtrak example in the book, the clinic could use different series measuring the phenomenon of interest. Assuming that the clinic likely has more than one way of tracking admission rates and daily patient visits, those two series can be used to develop two different forecasts for each. There may be a patient visit tracking system that is electronic and one that is manual, the performance of the ensemble would be evaluated by comparing the two forecasts against the actual values. Improvement in the forecasting model is greatest when the forecast errors are negatively correlated or at least uncorrelated.
The ability to scale up renewable energy, and in particular wind power and speed, is dependent on the ability to forecast its short-term availability. Soman et al(2010) describe different methods for wind power forecasting (the quote is slightly edited for brevity): See pages 77 and 78 for 4 approaches.
Persistence Method (Naive Predictor): This approach is data-driven because it is a naive forecast and uses the last point in the series to generate the forecast..
Physical Approach: This approach is model-based since it uses parameterizations based on a detailed physical description of the atmosphere to generate forecasts.
Statistical Approach: This approach seems data-driven. As stated, the approach does not base the forecast on any predefined mathematical model and is based on patterns, which is indicative of a data-driven approach. The definition also indicates that the model is easy and inexpensive which we’ve seen in data-driven methods as well. Additionally, the approach also mentioned that it uses actual wind speeds in immediate past to tune model parameters which is another indication of it being data-driven.
Hybrid Approach: This approach is a combination of both model-based and data-driven methods, more specifically is uses the three methods previously explained, persistency method, physical approach, and statistical approach.
Persistence Method (Naive Predictor): This approach is an extrapolation because it uses the assumed wind speed to predict future wind speed values. Therefore, using the series own historical values to forecast, like naive forecasts do.
Physical Approach: This approach seems as though it would be correlation modeling because it uses external information (the physical descriptions of the atmosphere) to determine the forecast.
Statistical Approach: This approach is an exptrapolation because it uses historical values to forecast. This approach focuses on using the difference between predicted and actual wind speeds in the immediate past to tune model parameters, therefore because no external data is incorporated to make the forecast, this approach resembles extrapolation most closely.
Hybrid Approach: This approach is a combination because it uses methods previously mentioned to generate the forecast.
The advantage of the hybrid approach is that it uses a combination of the other approaches and in that gets both model-based and data-driven methods. Combining methods has been shown to lead to more accurate forecasts. As Armstrong mentioned, “to improve forecasting accuracy, combine forecasts derived from methods that differ substantially and draw from different sources of information. When feasible, use five or more methods.” This would indicate that because the hybrid uses a combination of approaches, it may lead to a more accurate forecast. Additionally Armstrong summarized, that instead of trying to choose the best single method to forecast, the hybrid (and combining forecasts in general), the analyst can frame the problem by asking what methods would help improve the forecast accuracy (Armstrong, 2001). The disadvantages of the hybrid are consistent with the disadvantages of combining forecasting models as well, it is not the simpliest way to forecast, it takes more time, it requires the analyst to know how to perform more methods, and the analyst needs to develop a formal procedure to combine forecasts this way judgmental weighting can be avoided. Armstrong recommends using equal weighs when combining forecasts uness you have strong evidence to support your unequal weighting.