library(repmis)
library(moments)
githubURL <- "https://github.com/AndyBunn/TeachingData/raw/master/kbli.Rdata"
source_data(githubURL)
## Downloading data from: https://github.com/AndyBunn/TeachingData/raw/master/kbli.Rdata
## SHA-1 hash of the downloaded data file is:
## e61c66faa0cd4f23f6bf7e96d860b6b591c39cef
## [1] "kbli.zoo" "kbli.ts"
tAvgF <- kbli.ts[,"tAvgF"]
tAvgF.d <- decompose(tAvgF)
plot(tAvgF.d)
sd(tAvgF)
## [1] 8.908073
tAvgF.ag<-tAvgF.d$trend+tAvgF.d$random
plot(tAvgF.ag,ylab="Average Temp (F)")
sd(tAvgF.ag,na.rm=T)
## [1] 2.404545
The s.d for the decomposed data is 8.9 (with seasonality), and the s.d. for the deseasonalized series is 2.4…considerably less with season taken out. This means that the variability within the dataset is greatly increased when seasonality is included.
par(mar=c(1,1,1,1))
acf(tAvgF,na.action=na.pass,main="Deseasonalized")
pacf(tAvgF, na.action = na.pass, main="")
acf(tAvgF.ag, na.action = na.pass, main="Residual")
pacf(tAvgF.ag, na.action = na.pass, main="")
The decomposed data follows an AF(1) model. This is obvious in the pacf plot where the 2nd lag is negatively related to the 1st lag. The cyclinc nature in the ACF plot definitely shows the strong seasonal component.
The de-seasonalized data also follows an AR(1) model. This is obvious in the pacf plot where the 2nd lag is negatively related to the 1st lag.
With phi1 and phi2 set to 0.5 and 0.25 respectively
n <- 300
epsilon <- rnorm(n = n,mean = 0,sd = 1)
x <- numeric(length = n)
phi1 <- 0.5
phi2<-0.25
for(i in 3:n){
x[i] <- phi1 * x[i-1] + phi2 *x[i-2] + epsilon[i]
}
plot(x,type="l")
Because I’ve created an AR(2) model, I would anticipate that the acf would show that the 2nd lag is lower than the 1st lag, but still high enough to be considered affected by the previous lag.
acf(x)
pacf(x)
The pacf graph indicates that the 2nd lag falls within the range of similarity with the 1st lag (outside the 95% confidence interval)
With phi1 and phi2 set to 0.6 and 0.4 respectively
n <- 300
epsilon <- rnorm(n = n,mean = 0,sd = 1)
x <- numeric(length = n)
phi1 <- 0.6
phi2<-0.4
for(i in 3:n){
x[i] <- phi1 * x[i-1] + phi2 *x[i-2] + epsilon[i]
}
plot(x,type="l")
##acf and pacf
acf(x)
pacf(x)
With phi1 and phi2 set to 0.6 and 0.4 respectively and s.d of 2.
n <- 300
epsilon <- rnorm(n = n,mean = 0,sd = 2)
x <- numeric(length = n)
phi1 <- 0.6
phi2<-0.4
for(i in 3:n){
x[i] <- phi1 * x[i-1] + phi2 *x[i-2] + epsilon[i]
}
plot(x,type="l")
acf(x)
pacf(x)
By changing the s.d. from 1 to 2, I noticed an interesting difference in the acf graphs…an s.d. of 2 shows a much less steep decline than the s.d. of 1.
data("Seatbelts")
tsp(Seatbelts)
## [1] 1969.000 1984.917 12.000
plot(Seatbelts)
DK<-Seatbelts[,"DriversKilled"]
plot(DK)
kt <-kurtosis(DK)
sk <- skewness(DK)
my.xlim <- range(DK)
h<-hist(DK, breaks=10, col="lightblue", xlab="Number of Deaths per year",
main="",xlim=my.xlim)
xfit<-seq(min(DK),max(DK),length=100)
yfit<-dnorm(xfit,mean=mean(DK),sd=sd(DK))
yfit <- yfit*diff(h$mids[1:2])*length(DK)
lines(xfit, yfit, col="darkblue", lwd=2)
boxplot(DK, horizontal=TRUE, outline=TRUE, axes=FALSE,
ylim=my.xlim, col = "green", add = TRUE, boxwex=3)
text(x = 145, y=33, labels = paste("Kurtosis=",round(kt,2)),pos = 4)
text(x = 145, y=29, labels = paste("Skewness=",round(sk,2)),pos = 4)
Annualdeaths <- time(DK)
DK.lm <- lm(DK ~ Annualdeaths)
summary(DK.lm)
##
## Call:
## lm(formula = DK ~ Annualdeaths)
##
## Residuals:
## Min 1Q Median 3Q Max
## -50.934 -17.733 -2.797 15.238 67.865
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3709.4345 741.7417 5.001 1.29e-06 ***
## Annualdeaths -1.8142 0.3752 -4.835 2.74e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 24.01 on 190 degrees of freedom
## Multiple R-squared: 0.1096, Adjusted R-squared: 0.1049
## F-statistic: 23.38 on 1 and 190 DF, p-value: 2.736e-06
The coefficient shows that the number of deaths decreased by ~2 people annually over the 16 year time period. This makes some sense…as seatbelt laws most likely became more enforced over that time period.
acf(DK)
pacf(DK)
According to these acf/pacf results, the data fits and AR(1) model–the lag at (k-1) in the pacf figure indicates that the number of deaths at time Y is not effected by the time at Y-1. This makes perfect sense because car crashes are typically unaffected by each other and don’t follow many identifiable trends over time.