Plot, Plot, Plot the global temp anomaly data! I could plot more things but I think this is enough for now.
library(markdown)
library(moments)
giss.globe<-readRDS("C:\\Users\\jesse_000\\Desktop\\Grad\\ESCI 597\\Data\\giss.globe.rds")
kt <-kurtosis(giss.globe)
sk <- skewness(giss.globe)
my.xlim <- range(giss.globe)
h<-hist(giss.globe, breaks=20, col="lightskyblue3", xlab= "Temp Anomaly distribution",main="",xlim=my.xlim)
xfit<-seq(min(giss.globe),max(giss.globe),length=100)
yfit<-dnorm(xfit,mean=mean(giss.globe),sd=sd(giss.globe))
yfit <- yfit*diff(h$mids[1:2])*length(giss.globe)
lines(xfit, yfit, col="darkblue", lwd=2)
boxplot(giss.globe, horizontal=TRUE, outline=TRUE, axes=FALSE,ylim=my.xlim, col = "violetred2", add = TRUE, boxwex=25)
text(x = 1.00, y=200, labels = paste("Kurtosis=",round(kt,2)),pos = 2)
text(x = 1.00, y=190, labels = paste("Skewness=",round(sk,2)),pos = 2)
Corner_text(text="Figure 1",location="topright")
That data looks relatively normally distributed. Skewness values between -2 and +2 are generally considered acceptable in order to prove normal univariate distribution. Looks Good. What does that data actually look like? Lets plot it.
Lets detrend it and look at the residuals (Figures 2 and 3). There is some curvature to the residuals so the linear model doesn’t fit perfectly. The data looks to follow somewhat of an exponential model.
Figures 4 and 5 are the acf and pacf of the data prior to using a linear model to detrend the data. You could also use a logarithmic model (something interesting below).
par(mfrow=c(2,2))
globe.temp<-time(giss.globe)
temp.lm<-lm(giss.globe~globe.temp)
plot(giss.globe,ylab="Temp Anomaly", main="giss.globe")
abline(temp.lm, col="blue")
Corner_text(text="Figure 2",location="topright")
abline(temp.lm, col="blue")
plot(residuals(temp.lm),main="residuals")
abline(0,0, col="blue")
Corner_text(text="Figure 3",location="topright")
acf(giss.globe)
Corner_text(text="Figure 4",location="topright")
pacf(giss.globe)
Corner_text(text="Figure 5",location="topright")
summary(temp.lm)
##
## Call:
## lm(formula = giss.globe ~ globe.temp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.57814 -0.14040 -0.00303 0.13824 0.82365
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.389e+01 2.364e-01 -58.75 <2e-16 ***
## globe.temp 7.142e-03 1.213e-04 58.86 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1944 on 1641 degrees of freedom
## Multiple R-squared: 0.6786, Adjusted R-squared: 0.6784
## F-statistic: 3465 on 1 and 1641 DF, p-value: < 2.2e-16
After the trend was accounted for with a logistic model . After a linear model was applied and the acf and pacf of the residuals from that linear model was generated, I expected to see a better removal of the trend. The residuals (Figure 3) still showed some heterscedasity. Also I expected to see a larger decrease in the first lag. However, thinking a litter further, monthly temperature data makes sense to behave more similarly to a AR(2) and even to some degree an AR(3) model. The significance of the first 3 lags in the pacf plots also indicate that the process is something similar to an AR(3) model as well.
Since the data looks to follow somewhat of an exponential model, perhaps we can better remove the exponential trend with a log function. The result below (Figure 6 and 7) looks slightly better but maybe a polynomial fitting function would be better. R apparently does not like to fit ts data to polynomial models without complications so I will move on. The residuals are slightly more spread but they are certainly more clean and heteroscedastic.
Figures 8 and 9 below here are acf and pcaf functions ran on the residuals of a logistic model. Using a logarithmic regression did slightly better in removing the trend. This can be seen in that the auto correlation in the data decays more quickly in the logistic regression (Figures 8 and 9). The pacf still indicates that the process here is likely following a \(y_t=\phi_1Y_{t-1}+\theta_2Y_{t-2}+\beta_3Y_{t-3}+\epsilon_t\). The third lag is not extraordinarily strong however I would judge that to be acceptable to call a AR(3) function.
par(mfrow=c(1,2))
globe.temp2<-time(giss.globe)
anom<-log(1+giss.globe)
temp.lm2<-lm(anom~globe.temp2)
plot(anom,ylab="Log(Temp Anomaly+1)", main="giss.globe")
abline(temp.lm2, col="blue")
Corner_text(text="Figure 7",location="topright")
abline(temp.lm2, col="blue")
plot(residuals(temp.lm2),main="residuals")
abline(temp.lm2, col="blue")
Corner_text(text="Figure 7",location="topright")
acf(residuals(temp.lm2))
Corner_text(text="Figure 8",location="topright")
pacf(residuals(temp.lm2))
Corner_text(text="Figure 9",location="topright")
summary(temp.lm2$residuals)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.26700 -0.11880 0.01847 0.00000 0.13000 0.57710
I found a function HoltWinters
that can apparently be used to fit exponential timeseries models. I found this code online. Lest try it.
# simple exponential - models level
fit <- HoltWinters(giss.globe, beta=FALSE, gamma=FALSE)
# double exponential - models level and trend
fit2 <- HoltWinters(giss.globe, gamma=FALSE)
# triple exponential - models level, trend, and seasonal components
fit3 <- HoltWinters(giss.globe)
fit2
## Holt-Winters exponential smoothing with trend and without seasonal component.
##
## Call:
## HoltWinters(x = giss.globe, gamma = FALSE)
##
## Smoothing parameters:
## alpha: 0.5301432
## beta : 0.02925079
## gamma: FALSE
##
## Coefficients:
## [,1]
## a 0.926714457
## b 0.003655542
Lastly, I corrected the something interesting form the last assignment and I thing I finally can make sense of it after all that we’ve learned. The ccf()
between the sunspots data and elnino data is actually really interesting after I was able to figure out how to subset the larger sunspot data so it lined up with the elnino data. It is plotted below. It appears that the elnino 3.4 index is positively correlated with approximately 2 lags in the past and negatively correlated with one-ish lags in the future. So this could be interpreted as increased sunspots leads to a higher elnino 3.4 index and as sunspots fade the elnino index takes 1-2 months to follow suit. This makes sense! Hopefully I didn’t botch that this time.
sunspots<-readRDS("C:\\Users\\jesse_000\\Desktop\\Grad\\ESCI 597\\Data\\sunspots.rds")
nino34<-readRDS("C:\\Users\\jesse_000\\Desktop\\Grad\\ESCI 597\\Data\\nino34.rds")
sunspots1<-window(sunspots,1870,c(2017,12))
ninosun<-ccf(sunspots1, nino34)
plot(ninosun)