Name:Sara Cendejas_Zarelli Date:4/10/16

Bellingham Weather Data

library(repmis)
## Warning: package 'repmis' was built under R version 3.2.4
githubURL <- "https://github.com/AndyBunn/TeachingData/raw/master/kbli.Rdata"
source_data(githubURL) 
## Downloading data from: https://github.com/AndyBunn/TeachingData/raw/master/kbli.Rdata
## SHA-1 hash of the downloaded data file is:
## e61c66faa0cd4f23f6bf7e96d860b6b591c39cef
## [1] "kbli.zoo" "kbli.ts"
tAvgF <- kbli.ts[,"tAvgF"]
tminF<-kbli.ts[,"tMinF"]
tmaxF<-kbli.ts[,"tMaxF"]
summary(kbli.ts)
##      tMaxF           tAvgF           tMinF      
##  Min.   :35.94   Min.   :30.23   Min.   :24.48  
##  1st Qu.:48.63   1st Qu.:42.73   1st Qu.:36.66  
##  Median :57.07   Median :49.90   Median :42.11  
##  Mean   :57.25   Mean   :50.21   Mean   :43.18  
##  3rd Qu.:66.00   3rd Qu.:58.11   3rd Qu.:50.68  
##  Max.   :76.81   Max.   :67.48   Max.   :57.74
summary(tAvgF)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   30.23   42.73   49.90   50.21   58.11   67.48

Decompose the data

tAvgF.d <- decompose(tAvgF)
plot(tAvgF.d)

length (tAvgF)
## [1] 480
tsp(tAvgF)
## [1] 1976.000 2015.917   12.000

standard deviation of the decomposed data

sd(tAvgF)
## [1] 8.908073

Take the seasonal component out and make a plot of the deseasonalized series (trend + residuals)

tAvgF.ag <- aggregate(tAvgF,FUN=mean)
plot(tAvgF.ag,ylab="Average Temp (F)")

length(tAvgF.ag)
## [1] 40
tsp(tAvgF.ag)
## [1] 1976 2015    1

standard deviation of the de-seasonalized series

sd(tAvgF.ag)
## [1] 1.133904

The s.d for the decomposed data is 8.9 (with seasonality taken into account), and the s.d. for the deseasonalized series is 1.13…considerably less with season taken out. This means that the variability within the dataset is greatly increased when seasonality is included.

How strongly correlated is the average temperature at time x with the temperature at time x-1?

cor(tAvgF[2:480],tAvgF[1:(480-1)])
## [1] 0.8114186

This means the autocorrelation value at the first lag (k=1) is phi=0.811.

Plot the acf and pacf of both series

Decomposed data

acf(tAvgF)

pacf(tAvgF)

The decomposed data follows an AF(1) model. This is obvious in the pacf plot where the 2nd lag is negatively related to the 1st lag.

De-seasonalized data

acf(tAvgF.ag)

pacf(tAvgF.ag)

The de-seasonalized data also follows an AR(1) model. This is obvious in the pacf plot where the 2nd lag is negatively related to the 1st lag.

Part 2: my own AR(2) model of the form: xi=phi1(x-1) + phi2(x-2) + episolon(i)

With phi1 and phi2 set to 0.5 and 0.25 respectively

n <- 300
epsilon <- rnorm(n = n,mean = 0,sd = 1) 
x <- numeric(length = n)
phi1 <- 0.5
phi2<-0.25
for(i in 3:n){
   x[i] <- phi1 * x[i-1] + phi2 *x[i-2] + epsilon[i]  
  }
plot(x,type="l")

Because I’ve created an AR(2) model, I would anticipate that the acf would show that the 2nd lag is lower than the 1st lag, but still high enough to be considered affected by the previous lag.

acf and pacf

acf(x)

pacf(x)

The pacf graph indicates that the 2nd lag falls within the range of similarity with the 1st lag (outside the 95% confidence interval)

With phi1 and phi2 set to 0.6 and 0.4 respectively

n <- 300
epsilon <- rnorm(n = n,mean = 0,sd = 1) 
x <- numeric(length = n)
phi1 <- 0.6
phi2<-0.4
  for(i in 3:n){
   x[i] <- phi1 * x[i-1] + phi2 *x[i-2] + epsilon[i]  
  }
plot(x,type="l")

##acf and pacf

acf(x)

pacf(x)

With phi1 and phi2 set to 0.6 and 0.4 respectively and s.d of 2.

n <- 300
epsilon <- rnorm(n = n,mean = 0,sd = 2) 
x <- numeric(length = n)
phi1 <- 0.6
phi2<-0.4
for(i in 3:n){
   x[i] <- phi1 * x[i-1] + phi2 *x[i-2] + epsilon[i]  
  }
plot(x,type="l")

acf and pacf

acf(x)

pacf(x)

By changing the s.d. from 1 to 2 I noticed an interesting difference in the acf graphs…an s.d. of 2 shows a much less steep decline than the s.d. of 1.

Part 3: New Time Series data–“Orange Data”

install.packages(“moments”)

data("Seatbelts")
summary(Seatbelts)
##  DriversKilled      drivers         front             rear      
##  Min.   : 60.0   Min.   :1057   Min.   : 426.0   Min.   :224.0  
##  1st Qu.:104.8   1st Qu.:1462   1st Qu.: 715.5   1st Qu.:344.8  
##  Median :118.5   Median :1631   Median : 828.5   Median :401.5  
##  Mean   :122.8   Mean   :1670   Mean   : 837.2   Mean   :401.2  
##  3rd Qu.:138.0   3rd Qu.:1851   3rd Qu.: 950.8   3rd Qu.:456.2  
##  Max.   :198.0   Max.   :2654   Max.   :1299.0   Max.   :646.0  
##       kms         PetrolPrice        VanKilled           law        
##  Min.   : 7685   Min.   :0.08118   Min.   : 2.000   Min.   :0.0000  
##  1st Qu.:12685   1st Qu.:0.09258   1st Qu.: 6.000   1st Qu.:0.0000  
##  Median :14987   Median :0.10448   Median : 8.000   Median :0.0000  
##  Mean   :14994   Mean   :0.10362   Mean   : 9.057   Mean   :0.1198  
##  3rd Qu.:17203   3rd Qu.:0.11406   3rd Qu.:12.000   3rd Qu.:0.0000  
##  Max.   :21626   Max.   :0.13303   Max.   :17.000   Max.   :1.0000
tsp(Seatbelts)
## [1] 1969.000 1984.917   12.000
plot(Seatbelts)

DK<-Seatbelts[,"DriversKilled"]
plot(DK)

library(moments)
## Warning: package 'moments' was built under R version 3.2.3

Histogram of data on Drivers killed

kt <-kurtosis(DK)
sk <- skewness(DK)
my.xlim <- range(DK)
h<-hist(DK, breaks=10, col="lightblue", xlab="Number of Deaths per year",
        main="",xlim=my.xlim) 
xfit<-seq(min(DK),max(DK),length=100) 
yfit<-dnorm(xfit,mean=mean(DK),sd=sd(DK)) 
yfit <- yfit*diff(h$mids[1:2])*length(DK) 
lines(xfit, yfit, col="darkblue", lwd=2)
boxplot(DK, horizontal=TRUE,  outline=TRUE,  axes=FALSE,
        ylim=my.xlim, col = "green", add = TRUE, boxwex=3)
text(x = 145, y=33, labels = paste("Kurtosis=",round(kt,2)),pos = 4)
text(x = 145, y=29, labels = paste("Skewness=",round(sk,2)),pos = 4)

Is there a linear trend in time?

Annualdeaths <- time(DK)
DK.lm <- lm(DK ~ Annualdeaths)
summary(DK.lm)
## 
## Call:
## lm(formula = DK ~ Annualdeaths)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -50.934 -17.733  -2.797  15.238  67.865 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3709.4345   741.7417   5.001 1.29e-06 ***
## Annualdeaths   -1.8142     0.3752  -4.835 2.74e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 24.01 on 190 degrees of freedom
## Multiple R-squared:  0.1096, Adjusted R-squared:  0.1049 
## F-statistic: 23.38 on 1 and 190 DF,  p-value: 2.736e-06

The coefficient shows that the number of deaths decreased by ~2 people annually over the 16 year time period. This makes some sense…as seatbelt laws most likely became more enforced over that time period.

acf(DK)

pacf(DK)

According to these acf/pacf results, the data fits and AR(1) model–the lag at (k-1) in the pacf figure indicates that the number of deaths at time Y is not effected by the time at Y-1. This makes perfect sense because car crashes are typically unaffected by each other and don’t follow many identifiable trends over time.