Task: Predicting sales at the cafe

This task involves building several predictive models of daily sales at the cafe, using the data that you constructed last week. (If you did not do your homework last week, then please feel free to copy one of your colleagues’ code for the data construction).

Your task:

  1. Load the invoices.csv data.
  2. Construct a data-set with two columns, one is the date and one is daily sales in dollars.
  3. Construct some features that will help you predict the sales for the next day. These could include the day, the last day’s sales, the same day last week’s sales, the weather, etc. At a bare minimum, try using day-of-the-week dummy variables and a time trend.
  4. Subset the data so that you’re only estimating sales up to the end of October 2014. (Keep a separate dataset for November, which we’ll use for testing). 4a. Use the subset data to train some predictive models. You know how to fit linear models using lm(), regularised linear models using glmnet(), regression trees using rpart() and random forests using randomForest(). If you’re feeling keen, you can try to use some others!
  5. Use the fitted models to predict the sales on every day in November. What is the RMSE for each model? Plot your predictions and the actual.

Note that the task will be easier if you don’t include any features that are constructed from historical sales (as then your forecasts will have to be “rolling”). However, your model probably won’t work as well…