The seems to be two distinct trends. First the EUR seems to be strengthening from the year 2000 to around 2008. The Euro then seems to weaken from 2008 to 2015. The rate also seems somewhat volatile within these major trends. The second plot that looks at yearly averages makes these trends even clearer.
par(pin=c(3, 2))
par(mfrow=c(2,1))
plot(data,type="l")
plot(aggregate(data,FUN=mean))
Stationarity assumes that the mean , variance and autocorrelation do not change over time. The ACF plot show that the FX rate is highly correlated to its prior level. This violates an assumption of stationarity.
##52 weeks
acf(data, lag=52)
Box plot across months will give us a sense on seasonal effect.The mean looks pretty consistent during the year. (No violation) The variance seems to be a little higher in the middle of the year. However, this is inconclusive.The seasonal plot shows no large swings in the mean or variance due to seasonal effects.
par(mfrow=c(1,1))
boxplot(data~cycle(data),xlab="Week",ylab="Rate")
ggseasonplot(data, polar=TRUE) +
ylab("Rate") +
ggtitle("Eur$ Seasonal Plot")
Thus, a graphical interpretation of the data leads us to conclude that the data in non stationary. Let’s run a few test to confirm our visual analysis. The Augmented Dickey–Fuller Test p-value of .7253 tells us to accept the null hypotheses that the data is non stationary. The Box-Pierce test p-value near 0 indicates a time series has significant autocorrleations. Thus, both of these statistical tests agree with our assessment that the data is not stationary.
tseries::adf.test(data)
##
## Augmented Dickey-Fuller Test
##
## data: data
## Dickey-Fuller = -1.6528, Lag order = 10, p-value = 0.7253
## alternative hypothesis: stationary
Box.test(data)
##
## Box-Pierce test
##
## data: data
## X-squared = 1009.6, df = 1, p-value < 2.2e-16
The diff.data is much tighter. The means seems centered around 0, the variance except for a high volatility near 2008-22009 (global financial crisis) seems relatively constant. The yearly data shows no clear trends.
par(pin=c(3, 2))
par(mfrow=c(2,1))
plot(data.dif,type="l")
plot(aggregate(data.dif,FUN=mean))
Stationarity assumes that the mean , variance and autocorrelation do not change over time. The ACF plot shows that besides the global financial crisis volatility, there is really no significant autocorrelation present. This data seems better to use for time series analysis.
##52 weeks
acf(data.dif, lag=52)
Box plot across months will give us a sense on seasonal effect.The diff.data box plot data is much tighter. There are a few outliers but the variance is greatly reduced. The mean looks pretty consistent during the year. (No violation) The seasonal plot shows no large swings in the mean or variance due to seasonal effects.
par(mfrow=c(1,1))
boxplot(data.dif~cycle(data.dif),xlab="Week",ylab="Rate")
ggseasonplot(data.dif, polar=TRUE) +
ylab("Rate") +
ggtitle("Eur$ Seasonal Plot")
Thus, a graphical interpretation of the data leads us to conclude that the diff.data is perhaps stationary. Let’s run a few test to confirm our visual analysis. The Augmented Dickey–Fuller Test p-value of .01 means we reject the null hypotheses and that the data is stationary. The Box-Pierce test’s large p-value indicates that there are no significant autocorrleations. Thus, both of these statistical tests agree with our assessment that the data is stationary.
tseries::adf.test(data.dif)
## Warning in tseries::adf.test(data.dif): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: data.dif
## Dickey-Fuller = -9.2813, Lag order = 10, p-value = 0.01
## alternative hypothesis: stationary
Box.test(data.dif)
##
## Box-Pierce test
##
## data: data.dif
## X-squared = 50.377, df = 1, p-value = 1.268e-12
Just to get a sense of the data I plotted both parametric and non-parametric trends. The combined trend and seasonal model to follow.
time.pts=c(1:length(data))
time.pts=c(time.pts-min(time.pts))/max(time.pts)
x1=time.pts
x2=time.pts^2
lm.fit=lm(data~x1+x2)
data.fit.lm= ts(fitted(lm.fit),start=2000,frequency=52)
##non-parametric model
## Local Polynomial Trend Estimation
loc.fit= loess(data~time.pts)
data.fit.loc= ts(fitted(loc.fit),start=2000,frequency=52)
## Splines Trend Estimation
library(mgcv)
gam.fit= gam(data~s(time.pts))
data.fit.gam= ts(fitted(gam.fit),start=2000,frequency=52)
## Is there a trend?
ts.plot(data,ylab="Rate")
lines(data.fit.loc,lwd=2,col="brown")
lines(data.fit.gam,lwd=2,col="red")
lines(data.fit.lm,lwd=2,col="blue")
The non-parametric with ANOVA seasonality model (red) model seems to capture the movement of the EUR better than the parametric poly regression with ANOVA model. (green)
##seasonal anova and trend parametric poly
lm.fit = dynlm(data~x1+x2+season(data)-1)
dd= ts(fitted(lm.fit),start=2000,frequency=52)
## seasonal anova and non-parametric model
gam.fit<- gam(data~s(time.pts)+season(data)-1)
ee<-ts(fitted(gam.fit),start=2000,frequency=52)
## plots
ts.plot(data,ylab="Rate")
lines(dd,lwd=2,col="green")
lines(ee,lwd=2,col="red")
The residuals of both models shows persistence and remain outside of the confidence bands. It is interesting how the non-parametric model correlations go negative. The residuals on both models appear non-stationary. Neither are great choices for time series analysis.
#the residuals
resid.1<-data-dd
resid.2<-data-ee
#plot time series
par(mfrow=c(2,2))
plot(resid.1,main="Seasonal ANOVA and Trend Parametric Poly Residuals",ylab="Residuals",col="green")
acf(resid.1, lag=52)
plot(resid.2,main="Seasonal ANOVA and Trend Non-Parametric Residuals",ylab="Residuals",col="red")
acf(resid.2, lag=52)
Residuals help us determine if our model adequately captures the information in the data. A good model will: 1) yield uncorrelated residuals and 2) the residuals will have zero mean. Correlation of residuals indicate there is information left in the residuals that should be used in or model. A non zero residual mean represents bias. It is also helpful for the residuals to have a constant variance and to be normally distributed.
The parametric model’s histogram is not normally distributed. Additionally, both the Augmented Dickey-Fuller Test and Box Peirce Test suggest the model is non stationary.
r.lm.fit<-resid(lm.fit)
gghistogram(r.lm.fit) + ggtitle("Histogram of residuals Parametric Model")
## Warning: Using `bins = 30` by default. Pick better value with the argument
## `bins`.
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
tseries::adf.test(r.lm.fit)
##
## Augmented Dickey-Fuller Test
##
## data: r.lm.fit
## Dickey-Fuller = -3.2255, Lag order = 10, p-value = 0.08352
## alternative hypothesis: stationary
Box.test(r.lm.fit)
##
## Box-Pierce test
##
## data: r.lm.fit
## X-squared = 987.17, df = 1, p-value < 2.2e-16
The non parametric model’s histogram is not normally distributed. Additionally, both the Augmented Dickey-Fuller Test and Box Peirce Test suggest the model is non-stationary.
#r.gam.fit<-resid(gam.fit)
#gghistogram(r.gam.fit) + ggtitle("Histogram of residuals Non-Parametric Model")
checkresiduals(resid.2)
## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.
tseries::adf.test(resid.2)
## Warning in tseries::adf.test(resid.2): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: resid.2
## Dickey-Fuller = -4.8482, Lag order = 10, p-value = 0.01
## alternative hypothesis: stationary
Box.test(resid.2)
##
## Box-Pierce test
##
## data: resid.2
## X-squared = 961.71, df = 1, p-value < 2.2e-16
library(tsbox)
## Warning: package 'tsbox' was built under R version 4.0.3
yyyy<- ts_xts(resid.2)
yyyy <- ts_df(yyyy)
ggqqplot(yyyy$value)
shapiro.test(yyyy$value) ###less than .05 = not normal
##
## Shapiro-Wilk normality test
##
## data: yyyy$value
## W = 0.98524, p-value = 1.28e-08
time.pts=c(1:length(data.dif))
time.pts=c(time.pts-min(time.pts))/max(time.pts)
x1=time.pts
x2=time.pts^2
lm.fit.dif = dynlm(data.dif~x1+x2+season(data.dif)-1)
dd= ts(fitted(lm.fit.dif),start=2000,frequency=52)
plot(dd)
gam.fit.dif<- gam(data.dif~s(time.pts)+season(data.dif)-1)
ee= ts(fitted(gam.fit.dif),start=2000,frequency=52)
plot(ee)
ts.plot(data.dif,ylab="Diff Data")
lines(dd,lwd=2,col="green") #parametric
lines(ee,lwd=1,col="red") ##non parametric
#data.diff residuals
resid.1d<-data.dif-dd
resid.2d<-data.dif-ee
#plot time series
par(pin=c(5, 4))
par(mfrow=c(1,1))
plot(resid.1d,main="Seasonal ANOVA and Trend Parametric Poly Residuals",ylab="Residuals")
plot(resid.2d,main="Seasonal ANOVA and Trend Non-Parametric Residuals",ylab="Residuals")
The histogram and the qqplot are much improved. The ACF plot is alos better. The number of observations out of the confidence bands is significantly lower and the trend is much less persistent. The test data is mixed. The Augmented Dickey-Fuller Test shows stationary data while the Boxtest is not convinced.
checkresiduals(resid.1d)
## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.
library(tsbox)
library(ggpubr)
r1d<- ts_xts(resid.1d)
r1d <- ts_df(r1d)
library(ggpubr)
ggqqplot(r1d$value)
tseries::adf.test(r1d$value)
## Warning in tseries::adf.test(r1d$value): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: r1d$value
## Dickey-Fuller = -9.2961, Lag order = 10, p-value = 0.01
## alternative hypothesis: stationary
Box.test(r1d$value)
##
## Box-Pierce test
##
## data: r1d$value
## X-squared = 29.068, df = 1, p-value = 6.988e-08
The residuals
checkresiduals(resid.2d)
## Warning in modeldf.default(object): Could not find appropriate degrees of
## freedom for this model.
r2d<- ts_xts(resid.2d)
r2d <- ts_df(r2d)
library(ggpubr)
ggqqplot(r2d$value)
tseries::adf.test(r2d$value)
## Warning in tseries::adf.test(r2d$value): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: r2d$value
## Dickey-Fuller = -9.3911, Lag order = 10, p-value = 0.01
## alternative hypothesis: stationary
Box.test(r2d$value)
##
## Box-Pierce test
##
## data: r2d$value
## X-squared = 28.305, df = 1, p-value = 1.036e-07
rb1<-resid(lm.fit.dif)
rb2<-resid(gam.fit.dif)
acf(rb1, lag=52)
acf(rb2, lag=52)