Name:Sara Cendejas_Zarelli Date:4/10/16
library(repmis)
## Warning: package 'repmis' was built under R version 3.2.4
githubURL <- "https://github.com/AndyBunn/TeachingData/raw/master/kbli.Rdata"
source_data(githubURL)
## Downloading data from: https://github.com/AndyBunn/TeachingData/raw/master/kbli.Rdata
## SHA-1 hash of the downloaded data file is:
## e61c66faa0cd4f23f6bf7e96d860b6b591c39cef
## [1] "kbli.zoo" "kbli.ts"
tAvgF <- kbli.ts[,"tAvgF"]
tminF<-kbli.ts[,"tMinF"]
tmaxF<-kbli.ts[,"tMaxF"]
summary(kbli.ts)
## tMaxF tAvgF tMinF
## Min. :35.94 Min. :30.23 Min. :24.48
## 1st Qu.:48.63 1st Qu.:42.73 1st Qu.:36.66
## Median :57.07 Median :49.90 Median :42.11
## Mean :57.25 Mean :50.21 Mean :43.18
## 3rd Qu.:66.00 3rd Qu.:58.11 3rd Qu.:50.68
## Max. :76.81 Max. :67.48 Max. :57.74
summary(tAvgF)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 30.23 42.73 49.90 50.21 58.11 67.48
tAvgF.d <- decompose(tAvgF)
plot(tAvgF.d)
length (tAvgF)
## [1] 480
tsp(tAvgF)
## [1] 1976.000 2015.917 12.000
standard deviation of the decomposed data
sd(tAvgF)
## [1] 8.908073
Take the seasonal component out and make a plot of the deseasonalized series (trend + residuals)
tAvgF.ag <- aggregate(tAvgF,FUN=mean)
plot(tAvgF.ag,ylab="Average Temp (F)")
length(tAvgF.ag)
## [1] 40
tsp(tAvgF.ag)
## [1] 1976 2015 1
standard deviation of the de-seasonalized series
sd(tAvgF.ag)
## [1] 1.133904
The s.d for the decomposed data is 8.9 (with seasonality taken into account), and the s.d. for the deseasonalized series is 1.13…considerably less with season taken out. This means that the variability within the dataset is greatly increased when seasonality is included.
Decomposed data
acf(tAvgF)
pacf(tAvgF)
The decomposed data follows an AF(1) model. This is obvious in the pacf plot where the 2nd lag is negatively related to the 1st lag.
De-seasonalized data
acf(tAvgF.ag)
pacf(tAvgF.ag)
The de-seasonalized data also follows an AR(1) model. This is obvious in the pacf plot where the 2nd lag is negatively related to the 1st lag.
With phi1 and phi2 set to 0.5 and 0.25 respectively
n <- 300
epsilon <- rnorm(n = n,mean = 0,sd = 1)
x <- numeric(length = n)
phi1 <- 0.5
phi2<-0.25
for(i in 3:n){
x[i] <- phi1 * x[i-1] + phi2 *x[i-2] + epsilon[i]
}
plot(x,type="l")
Because I’ve created an AR(2) model, I would anticipate that the acf would show that the 2nd lag is lower than the 1st lag, but still high enough to be considered affected by the previous lag.
acf(x)
pacf(x)
The pacf graph indicates that the 2nd lag falls within the range of similarity with the 1st lag (outside the 95% confidence interval)
With phi1 and phi2 set to 0.6 and 0.4 respectively
n <- 300
epsilon <- rnorm(n = n,mean = 0,sd = 1)
x <- numeric(length = n)
phi1 <- 0.6
phi2<-0.4
for(i in 3:n){
x[i] <- phi1 * x[i-1] + phi2 *x[i-2] + epsilon[i]
}
plot(x,type="l")
##acf and pacf
acf(x)
pacf(x)
With phi1 and phi2 set to 0.6 and 0.4 respectively and s.d of 2.
n <- 300
epsilon <- rnorm(n = n,mean = 0,sd = 2)
x <- numeric(length = n)
phi1 <- 0.6
phi2<-0.4
for(i in 3:n){
x[i] <- phi1 * x[i-1] + phi2 *x[i-2] + epsilon[i]
}
plot(x,type="l")
acf(x)
pacf(x)
By changing the s.d. from 1 to 2 I noticed an interesting difference in the acf graphs…an s.d. of 2 shows a much less steep decline than the s.d. of 1.
install.packages(“moments”)
data("Seatbelts")
summary(Seatbelts)
## DriversKilled drivers front rear
## Min. : 60.0 Min. :1057 Min. : 426.0 Min. :224.0
## 1st Qu.:104.8 1st Qu.:1462 1st Qu.: 715.5 1st Qu.:344.8
## Median :118.5 Median :1631 Median : 828.5 Median :401.5
## Mean :122.8 Mean :1670 Mean : 837.2 Mean :401.2
## 3rd Qu.:138.0 3rd Qu.:1851 3rd Qu.: 950.8 3rd Qu.:456.2
## Max. :198.0 Max. :2654 Max. :1299.0 Max. :646.0
## kms PetrolPrice VanKilled law
## Min. : 7685 Min. :0.08118 Min. : 2.000 Min. :0.0000
## 1st Qu.:12685 1st Qu.:0.09258 1st Qu.: 6.000 1st Qu.:0.0000
## Median :14987 Median :0.10448 Median : 8.000 Median :0.0000
## Mean :14994 Mean :0.10362 Mean : 9.057 Mean :0.1198
## 3rd Qu.:17203 3rd Qu.:0.11406 3rd Qu.:12.000 3rd Qu.:0.0000
## Max. :21626 Max. :0.13303 Max. :17.000 Max. :1.0000
tsp(Seatbelts)
## [1] 1969.000 1984.917 12.000
plot(Seatbelts)
DK<-Seatbelts[,"DriversKilled"]
plot(DK)
library(moments)
## Warning: package 'moments' was built under R version 3.2.3
kt <-kurtosis(DK)
sk <- skewness(DK)
my.xlim <- range(DK)
h<-hist(DK, breaks=10, col="lightblue", xlab="Number of Deaths per year",
main="",xlim=my.xlim)
xfit<-seq(min(DK),max(DK),length=100)
yfit<-dnorm(xfit,mean=mean(DK),sd=sd(DK))
yfit <- yfit*diff(h$mids[1:2])*length(DK)
lines(xfit, yfit, col="darkblue", lwd=2)
boxplot(DK, horizontal=TRUE, outline=TRUE, axes=FALSE,
ylim=my.xlim, col = "green", add = TRUE, boxwex=3)
text(x = 145, y=33, labels = paste("Kurtosis=",round(kt,2)),pos = 4)
text(x = 145, y=29, labels = paste("Skewness=",round(sk,2)),pos = 4)
Annualdeaths <- time(DK)
DK.lm <- lm(DK ~ Annualdeaths)
summary(DK.lm)
##
## Call:
## lm(formula = DK ~ Annualdeaths)
##
## Residuals:
## Min 1Q Median 3Q Max
## -50.934 -17.733 -2.797 15.238 67.865
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3709.4345 741.7417 5.001 1.29e-06 ***
## Annualdeaths -1.8142 0.3752 -4.835 2.74e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 24.01 on 190 degrees of freedom
## Multiple R-squared: 0.1096, Adjusted R-squared: 0.1049
## F-statistic: 23.38 on 1 and 190 DF, p-value: 2.736e-06
The coefficient shows that the number of deaths decreased by ~2 people annually over the 16 year time period. This makes some sense…as seatbelt laws most likely became more enforced over that time period.
acf(DK)
pacf(DK)
According to these acf/pacf results, the data fits and AR(1) model–the lag at (k-1) in the pacf figure indicates that the number of deaths at time Y is not effected by the time at Y-1. This makes perfect sense because car crashes are typically unaffected by each other and don’t follow many identifiable trends over time.