Question 1

Half-hourly electricity demand for Victoria, Australia is contained in vic_elec. Extract the January 2014 electricity demand, and aggregate this data to daily with daily total demands and maximum temperatures.

  1. Plot the data, plot the scatter plot bettwen demand and temperature, and find the regression model for Demand with temperature as a predictor variable. Why is there a positive relationship?

  2. Produce a residual plot. Is the model adequate? Are there any outliers or influential observations?

  3. Use the model to forecast the electricity demand that you would expect for the next day if the maximum temperature was \(15^\circ\text{C}\) and compare it with the forecast if the with maximum temperature was \(35^\circ\text{C}\). Do you believe these forecasts?

  4. Give prediction intervals for your forecasts.

  5. Plot Demand vs Temperature for all of the available data in vic_elec aggregated to daily total demand and maximum temperature. What does this say about your model?

Question 2

Data set olympic_running contains the winning times (in seconds) in each Olympic Games sprint, middle-distance and long-distance track events from 1896 to 2016.

  1. Plot the winning time against the year by sex and lenght. Describe the main features of the plot.

  2. Fit a regression line to the data. Obviously the winning times have been decreasing, but at what average rate per year?

  3. Plot the residuals against the year. What does this indicate about the suitability of the fitted line?

  4. Predict the winning time for each race in the 2020 Olympics. Give a prediction interval for your forecasts. What assumptions have you made in these calculations?

Question 3

The data set souvenirs concerns the monthly sales figures of a shop which opened in January 1987 and sells gifts, souvenirs, and novelties. The shop is situated on the wharf at a beach resort town in Queensland, Australia. The sales volume varies with the seasonal population of tourists. There is a large influx of visitors to the town at Christmas and for the local surfing festival, held every March since 1988. Over time, the shop has expanded its premises, range of products, and staff.

  1. Produce a time plot of the data and describe the patterns in the graph. Identify any unusual or unexpected fluctuations in the time series.

  2. Explain why it is necessary to take logarithms of these data before fitting a model.

  3. Fit a regression model to the logarithms of these sales data with a linear trend, seasonal dummies and a “surfing festival” dummy variable. Plot the fitted values and the actual sales in one graph.

  4. Plot the residuals against time and against the fitted values. Do these plots reveal any problems with the model?

  5. Do boxplots of the residuals for each month. Does this reveal any problems with the model?

  6. What do the values of the coefficients tell you about each variable?

  7. What does the Ljung-Box test tell you about your model?

  8. Regardless of your answers to the above questions, use your regression model to predict the monthly sales for 1994, 1995, and 1996. Produce prediction intervals for each of your forecasts.

  9. How could you improve these predictions by modifying the model?

Question 4

The us_gasoline series consists of weekly data for supplies of US finished motor gasoline product, from 2 February 1991 to 20 January 2017. The units are in “thousand barrels per day”. Consider only the data to the end of 2004.

  1. Fit a harmonic regression with trend to the data. Experiment with changing the number Fourier terms. Plot the observed gasoline and fitted values and comment on what you see.

  2. Select the appropriate number of Fourier terms to include by minimising the AICc or CV value.

  3. Check the residuals of the final model using the gg_tsresiduals() function. Use a Ljung-Box test to check for residual autocorrelation.

  4. Generate forecasts for the next year of data. Plot these along with the actual data for 2005. Comment on the forecasts.

Question 5

The annual population of Afghanistan is available in the global_economy data set.

  1. Plot the data and comment on its features. Can you observe the effect of the Soviet-Afghan war?

  2. Fit a linear trend model and compare this to a piecewise linear trend model with knots at 1980 and 1989.

  3. Generate forecasts from these two models for the five years after the end of the data, and comment on the results.

Question 6

  1. List and explain each assumption of the OLS

  2. What is the difference between unbiased and consistent estimator?

  3. What is the different between \(R^2\) and adjusted \(R^2\)?