data(sunspots)
plot(sunspots,ylab="Observed Sunspots")
Above is a plot of the number of sunspots counted monthly from 1850 to 2016.The plot seems to have some perhaps decadal variation. I will look to see if its related to the el nino patterns? We shall see. First lets plot the hell out of this data.
summary(sunspots)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 15.70 42.00 51.27 74.93 253.80
tsp(sunspots)
## [1] 1749.000 1983.917 12.000
library(moments)
kt <-kurtosis(sunspots)
sk <- skewness(sunspots)
my.xlim <- range(sunspots)
h<-hist(sunspots, breaks=10, col="lightblue", xlab="Lake Level (ft)",
main="",xlim=my.xlim)
xfit<-seq(min(sunspots),max(sunspots),length=100)
yfit<-dnorm(xfit,mean=mean(sunspots),sd=sd(sunspots))
yfit <- yfit*diff(h$mids[1:2])*length(sunspots)
lines(xfit, yfit, col="darkblue", lwd=2)
boxplot(sunspots, horizontal=TRUE, outline=TRUE, axes=FALSE,ylim=my.xlim, col = "lightgreen", add = TRUE, boxwex=130)
text(x = 250, y=800, labels = paste("Kurtosis=",round(kt,2)),pos = 2)
text(x = 250, y=750, labels = paste("Skewness=",round(sk,2)),pos = 2)
The distribution appears to be rather leptokurtic based upon the kurtosis number of 3.98 (beaing greater than 3). Looking at the quantiles we can get a better understanding of the datas distrubution. To do this I will use the qqnorm function and look at devations for the expectation of normality.
qqnorm(sunspots,main="Normal QQ Plot of Sunspots Data")
qqline(sunspots, col = 2)
Given that this is data gathered from the sun, I don’t expect that a long term trend will be detected. Changes occuring on the sun likely occur on scales much greater then 1850 - 2016. We will look just to be sure.
SunTime <- time(sunspots)
Sun.lm <- lm(sunspots ~ SunTime)
summary(Sun.lm)
##
## Call:
## lm(formula = sunspots ~ SunTime)
##
## Residuals:
## Min 1Q Median 3Q Max
## -60.400 -34.848 -8.314 23.557 196.470
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -135.88524 22.25201 -6.107 1.16e-09 ***
## SunTime 0.10027 0.01191 8.416 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 42.92 on 2818 degrees of freedom
## Multiple R-squared: 0.02452, Adjusted R-squared: 0.02417
## F-statistic: 70.83 on 1 and 2818 DF, p-value: < 2.2e-16
There appears to be 0.10027
more sunspots per year over across the time in which we have data for p < 2e-16
. Who woulda thunk?
Lets plot the data with some moving average filters and plot that linear modle along with those plots. Making the moving average at 100 (red) and 50 (blue) seems to work the best. Lesser time intervals looks very messy and is hard to see any trend other then the original data.
sun50 <- filter(x=sunspots, filter=rep(x=1/50,times=50), sides=2)
sun100 <- filter(x=sunspots, filter=rep(x=1/100,times=100), sides=2)
plot(sunspots,col="darkgrey")
lines(sun100,col="red",lwd=2)
lines(sun50,col="blue",lwd=2)
abline(Sun.lm, col="black",lwd=2, lty="dashed")
The result of the linear model is much easier to visually associate with the moving averages then the original data.
First of all, What is the frequency of the sunspots
data set? I forget! Lets findout.
tsp(sunspots)
## [1] 1749.000 1983.917 12.000
The data appears to be in the format of monthly observations from 1850 to 2016. Bellow is the trend with the variation across a calendar year removed.
Apparently the sun oscillates in intensity about every 11 years. Not quite decadal but closeish. Not quite sure how this will affect the rest of the analysis but I will proceed regardless.
f <- frequency(sunspots)
filt <- c(0.5, rep(1, times=f-1), 0.5)/f
m.hat <- filter(x=sunspots, filter=filt)
plot(m.hat,ylab="Observed Monthly Sunspots")
s.hat <- sunspots - m.hat
plot(s.hat,ylab="Sunspots Relative to Mean",axes=FALSE)
par(las=2)
axis(side=1, at=seq(0, 2020, by=10))
axis(side=2, at=seq(-50, 100, by=50))
box()
s.hat <- sunspots - m.hat
boxplot(s.hat~cycle(s.hat),xlab="Month",ylab="Observed Monthly Sunspots")
s.hat <- aggregate(s.hat~cycle(s.hat),FUN=mean)
s.hat <- rep(s.hat$s.hat,length(sunspots)/12)
s.hat <- ts(s.hat)
tsp(s.hat) <- tsp(sunspots)
z.hat <- sunspots - m.hat - s.hat
sunspots.hat <- cbind(sunspots,m.hat,s.hat,z.hat)
plot(sunspots.hat)
sunspots.d <- decompose(sunspots) # compare to the more sophisticated stl function
plot(sunspots.d)
(and probably incorrect)
Looking for correlations between elnino events and sunspots. The Cross Correlation Function (ccf) in R can look for events in one time series that lag (or lead) in time in response to another event. If sun spots were to actually influence el nino events we would expect that the el nino event it self would lag behind the sun spot cycle so this analysis seems appropriate.
Not sure if I can interperate it correctly but I will try.
Here are the original time series just to take another look.
nino34<-readRDS("nino34.rds")
data(nino34)
plot(nino34,ylab="Nino Region 3.4 Anomolies")
data(sunspots)
plot(sunspots,ylab="Observed Sunspots")
The code for the ccf function is quite simple. ccf(x-variable name, y-variable name)
. For my purpose it will be ccf(sunspots, nino34)
. This is assuming that the y-variable responds to the x-variable. Its plotted results are below.
ninosun<-ccf(sunspots, nino34)
ninosun
##
## Autocorrelations of series 'X', by lag
##
## -2.3333 -2.2500 -2.1667 -2.0833 -2.0000 -1.9167 -1.8333 -1.7500 -1.6667
## -0.001 0.003 0.009 0.018 0.023 0.027 0.023 0.022 0.020
## -1.5833 -1.5000 -1.4167 -1.3333 -1.2500 -1.1667 -1.0833 -1.0000 -0.9167
## 0.013 0.007 0.002 0.009 0.013 0.015 0.020 0.025 0.031
## -0.8333 -0.7500 -0.6667 -0.5833 -0.5000 -0.4167 -0.3333 -0.2500 -0.1667
## 0.039 0.041 0.039 0.030 0.016 0.011 0.011 0.012 0.020
## -0.0833 0.0000 0.0833 0.1667 0.2500 0.3333 0.4167 0.5000 0.5833
## 0.020 0.025 0.026 0.029 0.020 0.009 -0.006 -0.020 -0.032
## 0.6667 0.7500 0.8333 0.9167 1.0000 1.0833 1.1667 1.2500 1.3333
## -0.041 -0.045 -0.041 -0.040 -0.033 -0.031 -0.025 -0.027 -0.032
## 1.4167 1.5000 1.5833 1.6667 1.7500 1.8333 1.9167 2.0000 2.0833
## -0.044 -0.054 -0.061 -0.062 -0.060 -0.057 -0.049 -0.044 -0.038
## 2.1667 2.2500 2.3333
## -0.032 -0.035 -0.041
I am not sure that the data are correctly alligned because the sun spots data is much longer than the other.
In any case the correlations at approximately 1.5 - 2.0 are negative, indicating that an above average value of sun spots is likely to lead to a below average value of “Nino Region 3.4 Anomolies” about 1.5 months later.
I likely botched this whole thing but we are going to learn about this more later…. I think?