============================================================================================================
About: This document is also available at http://rpubs.com/sherloconan/605300
Data source: Yahoo Finance
In this assignment you will be working with dataset from your 699 project. You will run a regression model.
Scale or normalize your data. Make sure to apply imputation if needed. [ - 5pts]
Build a multiple linear regression model or logistic regression (based on your Y). [ - 10pts]
Print summary and interpret table (see lecture slides). Describe the summary. [- 15pts]
Perform another model and evaluate which model performs better. [-10pts]
The EDA parts are available at RPubs - part I and RPubs - part II.
The dataset is the Bitcoin price as a time series. Instead, an ARIMA model is built. Autoregressive integrated moving average (ARIMA) model is denoted as ARIMA(p,d,q) where parameters p, d, q is respectively the number of lags, the degree of differencing, and the order of the moving average. The model assumes stationarity.
The Augmented Dickey-Fuller (ADF) test is one of the unit root tests to examine how strongly a trend defines a time series. The null hypothesis (H0) of such a test is that the time series is non-stationary. When a time series is not stationary, a unit root is present in the sample. The p-value of 0.7481 is not statistically significant at an alpha level of 0.05, hence fail to reject the null hypothesis. The time series is taken first-order difference transformation to decompose trend and seasonality. Logarithm transformation is not necessarily required in this case.
Intuitively, ARIMA parameters p and q is respectively set to 1 and 0, which means ARIMA(1,1,0)(1,1,0)[30] model is built. The standardized residuals fall within [-1,1], and the other half spikes reach as much as ±6. Most autocorrelations of residuals are close to 0. However, only four p-values are above 0.05, while the rest are narrowly around it. Furthermore, the 30 days bitcoin prices are predicted.
adf.test(BTC[which(BTC$Date=="2017-1-12"):which(BTC$Date=="2018-10-6"),5])
##
## Augmented Dickey-Fuller Test
##
## data: BTC[which(BTC$Date == "2017-1-12"):which(BTC$Date == "2018-10-6"), 5]
## Dickey-Fuller = -1.5988, Lag order = 8, p-value = 0.7481
## alternative hypothesis: stationary
price <- ts(BTC[which(BTC$Date=="2017-1-12"):which(BTC$Date=="2018-10-6"),5],freq=30)
adf.test(diff(price,1))
##
## Augmented Dickey-Fuller Test
##
## data: diff(price, 1)
## Dickey-Fuller = -7.6665, Lag order = 8, p-value = 0.01
## alternative hypothesis: stationary
par(mfrow=c(1,2))
acf(diff(BTC[which(BTC$Date=="2017-1-12"):which(BTC$Date=="2018-10-6"),5],1),lag.max=30,main="Autocorrelation Plot, d=1")
pacf(diff(BTC[which(BTC$Date=="2017-1-12"):which(BTC$Date=="2018-10-6"),5],1),lag.max=30,main="Partial Autocorrelation Plot, d=1")
auto.arima(price,D=1)
## Series: price
## ARIMA(0,1,1)(2,1,0)[30]
##
## Coefficients:
## ma1 sar1 sar2
## 0.1028 -0.5846 -0.3300
## s.e. 0.0405 0.0374 0.0364
##
## sigma^2 estimated as 226934: log likelihood=-4579.02
## AIC=9166.05 AICc=9166.11 BIC=9183.65
fit <-arima(price,order=c(1,1,0),seasonal=list(order=c(1,1,0),period=30))
tsdiag(fit)
pred <- forecast(fit,h=30)
par(mfrow=c(1,1))
plot(pred,xlab="Observation",ylab="Closing Price (USD)",main="Bitcoin Trading Price: 2017-2018\n Forecasts from ARIMA(1,1,0)(1,1,0)[30]",lwd=2)
lines(pred$fitted,col="red")
legend("topleft",legend=c("Fitted","Predicted","Original"),col=c("red","blue","black"),lty=c(1,1,1),lwd=c(2,2,2),bty="n")
A close study associated with a multiple linear regression model is available at the previous project Community Structure and Crime Rates: Evidence from Cross-Sectional Data in the U.S..
The tasks of (1) scaling, (2) regression modeling, (3) interpretation, and (4) model comparison are all included.