Final Exam — Time Series Models
PRACTICAL FINAL EXAM — 1 HOUR This exam covers all topics of the Time Series course: random walk model and return calculations, stationarity and unit-root tests, spurious regression and cointegration, and ARIMA/SARIMA model calibration and forecasting. You will work in Google Colab and must submit (1) your Jupyter Notebook link and (2) your Canvas quiz answers. Use the three synthetic CSV files provided on Canvas to answer each problem.
1 General Instructions
- Work individually in Google Colab. Log in with your @tec.mx account.
- Rename your notebook: Evidence-YourFirstName-YourLastName.
- Share your notebook (Edit privileges) with cdorante@tec.mx.
- Submit your Colab link on Canvas under “Final Evidence — Notebook Link”.
- Also complete the Canvas quiz “Quiz related to Evidence” with the numerical results and interpretations from your notebook.
- You may use Google Colab’s AI code generator and Gemini. You may not share your work with other students.
- All answers must be obtained programmatically — no manual calculations.
1.1 Dataset files
Download these three CSV files from Canvas and upload them to your Colab session (or mount your Drive):
| File | Description |
|---|---|
| exam_stock_prices.csv | Daily closing prices for two synthetic stocks (StockA, StockB) |
| exam_macro_monthly.csv | Monthly macroeconomic series: output_index and consumption_index |
| exam_retail_sales.csv | Monthly retail sales (units) for a consumer product |
2 Problem 1 — Random Walk Model and Return Analysis (30 points)
2.1 Context
The file exam_stock_prices.csv contains daily closing prices for two synthetic financial assets (StockA and StockB) over approximately 5 years of trading days. According to the Random Walk Hypothesis (Fama, 1965), the logarithm of stock prices follows a random walk with a drift:
Y_t = \phi_0 + Y_{t-1} + \varepsilon_t
where Y_t = \ln(Price_t), \phi_0 is the drift, and \varepsilon_t \sim N(0, \sigma_\varepsilon^2).
2.2 Exercise 1.1 — Data loading and visualization
Load the dataset. Set the date column as a datetime index. Plot the price series of both stocks over time.
Q1 (Canvas): How many daily observations does the dataset contain?
2.3 Exercise 1.2 — Continuously compounded (cc) returns
Calculate the continuously compounded daily returns (cc returns) for both stocks.
Compute the mean and standard deviation of the cc returns for each stock. Report to 4 decimal places.
Q2 (Canvas): What is the mean daily cc return of StockA (as a percentage, i.e., multiply by 100, rounded to 4 decimal places)?
Q3 (Canvas): What is the standard deviation of the daily cc returns of StockA? (round to 4 decimal places)
Q4 (Canvas): What is the standard deviation of the daily cc returns of StockB? (round to 4 decimal places)
2.4 Exercise 1.3 — Estimating Random Walk parameters
According to the Random Walk model and the historical data of StockA, estimate the two key parameters of the random walk model from the actual data. The key parameters are \phi_0 and \phi1.
Estimate both parameters for StockA and StockB.
Q5 (Canvas): What is the estimated drift parameter \phi_0 for StockA? (round to 4 decimal places)
Q6 (Canvas): What is the estimated \sigma_\varepsilon for StockA? (round to 4 decimal places)
2.5 Exercise 1.4 — Unit-root test: are the log prices and returns stationary?
Apply the Augmented Dickey-Fuller (ADF) test to determine whether:
The log prices of StockA are stationary (use the Python adfuller function with the parameter regression=‘ct’ to allow for a constant and trend).
The cc returns of StockA are stationary (use the Python adfuller function with the parameter regression=‘c’).
Q7 (Canvas): The ADF p-value for log(StockA) in LEVELS is approximately ___. (choose the closest value)
Q8 (Canvas): The ADF p-value for StockA cc RETURNS is approximately ___. (choose the closest value)
Interpretation questions
Q16 (Canvas): In the Random Walk model Y_t = \phi_0 + Y_{t-1} + \varepsilon_t, if \phi_0 > 0, what does this imply about the long-run behavior of the series? EXPLAIN WITH YOUR OWN WORDS IN YOUR NOTEBOOK
Q17 (Canvas): The ADF test for log(StockA) shows p-value is much greater than 0.05. What is the correct conclusion? EXPLAIN WITH YOUR OWN WORDS IN YOUR NOTEBOOK
Q18 (Canvas): What is the main advantage of using continuously compounded (cc) returns instead of simple returns in time-series modeling? EXPLAIN WITH YOUR OWN WORDS IN YOUR NOTEBOOK
3 Problem 2 — Stationarity, Spurious Regression and Cointegration (35 points)
3.1 Context
The file exam_macro_monthly.csv contains monthly observations of two macroeconomic indicators: output_index (a proxy for economic output) and consumption_index. Both series are expressed as index numbers (base = 100).
Your task is to:
- Test whether the log values of each series is stationary (unit-root test).
- Run a regression between the two log variables and diagnose whether it is spurious or it is a reliable regression.
3.2 Exercise 2.1 — Data loading and visualization
Load the dataset. Set the date column as a datetime index and convert it to monthly frequency. Plot both series on the same figure using log scale.
3.3 Exercise 2.2 — Unit-root test on both series
Apply the ADF test to the log levels of both series (use regression=‘ct’). Then apply the ADF test to their first differences (use regression=‘c’).
Q9 (Canvas): The ADF p-value for log(output_index) in LEVELS is approximately ___. (choose the closest)
Q10 (Canvas): The ADF p-value for the first difference of log(output_index) is approximately ___. (choose the closest)
3.4 Exercise 2.3 — Regression between the non-stationary series (spurious regression)
Run a linear regression of log(consumption_index) on log(output_index) using the log-level (non-stationary) series. Report the regression summary.
Q11 (Canvas): What is the R-squared of the regression log(consumption) ~ log(output)? (round to 4 decimal places)
Q12 (Canvas): What is the estimated \beta_1 coefficient in that regression? (round to 4 decimal places)
Q19 (Canvas — Interpretation): A researcher runs a regression of one non-stationary variable on another non-stationary variable and obtains a very high R^2 and a highly significant t-statistic. What phenomenon is most likely occurring? EXPLAIN WITH YOUR OWN WORDS IN YOUR NOTEBOOK
3.5 Exercise 2.4 — Cointegration test
To determine whether the regression is spurious or valid, test for cointegration.
Q13 (Canvas): The ADF p-value of the cointegration test is approximately ___. (choose the closest)
Q20 (Canvas — Interpretation): In the cointegration test, we apply the ADF to the regression residuals? EXPLAIN WITH YOUR OWN WORDS If p < 0.05, what is the correct conclusion? EXPLAIN WITH YOUR WORDS
Q21 (Canvas — Interpretation): Two non-stationary series appear to have a very high R^2 in a regression. To determine whether this relationship is real or spurious, the most appropriate next step is:
4 Problem 3 — ARIMA/SARIMA Model Calibration and Forecasting (35 points)
4.1 Context
The file exam_retail_sales.csv contains monthly retail sales (in units) for a consumer product over 15 years. Your goal is to:
- Transform the series to stationarity.
- Confirm stationarity with the ADF test.
- Identify the correct ARIMA/SARIMA model order using ACF and PACF plots.
- Estimate the model and interpret the coefficients.
- Check that the residuals are white noise.
- Forecast the next 12 months.
4.2 Exercise 3.1 — Data loading and log transformation
Load the data, compute the log of sales, and plot the series. Then compute the annual % growth as the seasonal log difference, and called it annualgrowth.
$Plot the annual growth series.
4.3 Exercise 3.2 — ADF test on the annual growth series
Q14 (Canvas): The ADF p-value for the annual growth series of retail sales is approximately ___. (choose the closest)
4.4 Exercise 3.3 — Correlogram: ACF and PACF
Plot the ACF and PACF of the annual growth series. Use the pattern to determine initial values of p and q.
Q22 (Canvas — Interpretation): Based on the ACF and PACF of the annual growth series, you observe: ACF decays gradually across several lags, while PACF cuts off sharply after lag 1. This pattern is called:
Q23 (Canvas — Interpretation): The PACF cuts off sharply after lag 1 in the annual growth series. What initial model order does this suggest?
4.5 Exercise 3.4 — Fit the SARIMA model
Fit a first SARIMA model after your conclusion of the ACF and PACF plots.
Q15 (Canvas): What is the estimated AR(1) coefficient \phi_1 in the SARIMA model you run ? (round to 4 decimal places)
Q16b (Canvas): What is the AIC of the SARIMA(1,0,0)(0,1,0,12) model? (round to 2 decimal places)
4.6 Exercise 3.5 — Residual diagnostics
Check whether the model residuals behave like white noise. Run the corresponding plots to test this. Explain
Q24 (Canvas — Interpretation): What is a correct interpretation of your previous plots ?
4.7 Exercise 3.6 — Forecasting the next 12 months
Generate forecasts for the next 12 months. Convert the log-scale forecasts back to the original sales scale using \exp(\cdot). Plot the historical series and forecasts with 95% confidence intervals.
Q17b (Canvas): What is the January 2024 point forecast for retail sales (in units)? (round to 2 decimal places)
Q25 (Canvas — Interpretation): When forecasting with a SARIMA model, the confidence interval of the forecast widens as the forecast horizon increases. What is the main reason for this?
5 Submission Checklist
Before submitting, make sure that:
End of Exam. Good luck!