In this project, I explored the development of a regression model that used multiple time series variables and critiqued the model.

Preparation and About the Data

The following code was executed to load the data needed:

source("code_Time_Series.R")

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

load("data_ice.Rdata")

This data set, consists of five time series objects, contains bus ridership numbers for Iowa City Transit for the 11-year period from January 1978 to December 1988 as well as several variables that pertain to bus ridership. The time series are:

rides = number of bus riders during given month (in 1,000’s)
students = number of students enrolled at the University of Iowa in Iowa City during the corresponding fall semester (in 1,000’s)
spaces = number of downtown parking spaces in Iowa City during the given month (in 1,000’s)
rp_fare = bus fare (in Jan 1978 dollars) for a single ride; prices have been deflated to Jan 1978; the prefix “rp” stands for “real price”
rp_gas = real price of a gallon of gas (in Jan 1978 dollars)

Data Exploration

Let’s start by examining the raw data and some of its basic characteristics.

Plotted both rides and log(rides) to see which has more stable variation.

autoplot(rides)

autoplot(log(rides))

The variation of the logged data is more stable, so I plan to work with it over the original data.

Understanding All Variables

Executed the following chunks of code to get a feel for the other variables.

autoplot(students)

autoplot(spaces)

autoplot(rp_fare)

autoplot(rp_gas)

Regression Model

Used tslm to regress log(rides) on trend and season. Then used aa_plot_fitted(fit) to assess the fit visually.

fit <- tslm(log(rides) ~ trend + season)
aa_plot_fitted(fit)

The fit doesn’t look very good. For example, it doesn’t capture the overall U-shape of the time series.

Expanding the Regression Model with Multiple Series

Next, expanded the prior regression to include the variables students, spaces, log(rp_fare), and log(rp_gas). With this, I will assess the fit with against a number of time-series diagnostics.

fit <- tslm(log(rides) ~ trend + season + students + spaces + log(rp_fare) +
    log(rp_gas))
aa_plot_fitted(fit)

summary(fit)

## 
## Call:
## tslm(formula = log(rides) ~ trend + season + students + spaces + 
##     log(rp_fare) + log(rp_gas))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.14811 -0.04624 -0.01090  0.03660  0.17276 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.2847901  0.2159824  15.209  < 2e-16 ***
## trend        -0.0045344  0.0004931  -9.196 1.94e-15 ***
## season2       0.1171999  0.0288228   4.066 8.78e-05 ***
## season3       0.0262450  0.0288455   0.910  0.36481    
## season4      -0.0384786  0.0288736  -1.333  0.18528    
## season5      -0.3125263  0.0289249 -10.805  < 2e-16 ***
## season6      -0.4006730  0.0289462 -13.842  < 2e-16 ***
## season7      -0.4097179  0.0290188 -14.119  < 2e-16 ***
## season8      -0.4343003  0.0290403 -14.955  < 2e-16 ***
## season9      -0.1369896  0.0288935  -4.741 6.14e-06 ***
## season10     -0.0022152  0.0288779  -0.077  0.93899    
## season11     -0.0569882  0.0289031  -1.972  0.05105 .  
## season12     -0.0811247  0.0289399  -2.803  0.00594 ** 
## students      0.0813214  0.0059164  13.745  < 2e-16 ***
## spaces       -0.0121496  0.0401286  -0.303  0.76261    
## log(rp_fare) -0.1556497  0.0952207  -1.635  0.10486    
## log(rp_gas)   0.4987317  0.0395896  12.598  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06758 on 115 degrees of freedom
## Multiple R-squared:  0.9404, Adjusted R-squared:  0.9322 
## F-statistic: 113.5 on 16 and 115 DF,  p-value: < 2.2e-16

accuracy(fit)

##              ME       RMSE        MAE         MPE      MAPE      MASE      ACF1
## Training set  0 0.06307404 0.05013463 -0.01579507 0.9962192 0.4901869 0.5346375

checkresiduals(fit)

## 
##  Breusch-Godfrey test for serial correlation of order up to 24
## 
## data:  Residuals from Linear regression model
## LM test = 69.805, df = 24, p-value = 2.34e-06

This fit looks very good except a few insignificant variables and some autocorrelation. We could improve the autocorrelation by bringing in lag of rides into the model.

Regression Model with Multiple Time Series Data for Bus Ridership

Preparation and About the Data

Data Exploration

Understanding All Variables

Regression Model

Expanding the Regression Model with Multiple Series