Checkout my sunspots Rmarkdown.Rmd document. It is neat!

data(sunspots)
plot(sunspots,ylab="Observed Sunspots")

Above is a plot of the number of sunspots counted monthly from 1850 to 2016.The plot seems to have some perhaps decadal variation. I will look to see if its related to the el nino patterns? We shall see. First lets plot the hell out of this data.

summary(sunspots)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   15.70   42.00   51.27   74.93  253.80

tsp(sunspots)

## [1] 1749.000 1983.917   12.000

library(moments)
kt <-kurtosis(sunspots)
sk <- skewness(sunspots)
my.xlim <- range(sunspots)
h<-hist(sunspots, breaks=10, col="lightblue", xlab="Lake Level (ft)",
        main="",xlim=my.xlim) 
xfit<-seq(min(sunspots),max(sunspots),length=100) 
yfit<-dnorm(xfit,mean=mean(sunspots),sd=sd(sunspots)) 
yfit <- yfit*diff(h$mids[1:2])*length(sunspots) 
lines(xfit, yfit, col="darkblue", lwd=2)
boxplot(sunspots, horizontal=TRUE,  outline=TRUE,  axes=FALSE,ylim=my.xlim, col = "lightgreen", add = TRUE, boxwex=130)
text(x = 250, y=800, labels = paste("Kurtosis=",round(kt,2)),pos = 2)
text(x = 250, y=750, labels = paste("Skewness=",round(sk,2)),pos = 2)

The distribution appears to be rather leptokurtic based upon the kurtosis number of 3.98 (beaing greater than 3). Looking at the quantiles we can get a better understanding of the datas distrubution. To do this I will use the qqnorm function and look at devations for the expectation of normality.

qqnorm(sunspots,main="Normal QQ Plot of Sunspots Data")
qqline(sunspots, col = 2)

No Trend In Time

Given that this is data gathered from the sun, I don’t expect that a long term trend will be detected. Changes occuring on the sun likely occur on scales much greater then 1850 - 2016. We will look just to be sure.

SunTime <- time(sunspots)
Sun.lm <- lm(sunspots ~ SunTime)
summary(Sun.lm)

## 
## Call:
## lm(formula = sunspots ~ SunTime)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -60.400 -34.848  -8.314  23.557 196.470 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -135.88524   22.25201  -6.107 1.16e-09 ***
## SunTime        0.10027    0.01191   8.416  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 42.92 on 2818 degrees of freedom
## Multiple R-squared:  0.02452,    Adjusted R-squared:  0.02417 
## F-statistic: 70.83 on 1 and 2818 DF,  p-value: < 2.2e-16

There appears to be 0.10027 more sunspots per year over across the time in which we have data for p < 2e-16. Who woulda thunk?

Filters

Lets plot the data with some moving average filters and plot that linear modle along with those plots. Making the moving average at 100 (red) and 50 (blue) seems to work the best. Lesser time intervals looks very messy and is hard to see any trend other then the original data.

sun50 <- filter(x=sunspots, filter=rep(x=1/50,times=50), sides=2)
sun100 <- filter(x=sunspots, filter=rep(x=1/100,times=100), sides=2)
plot(sunspots,col="darkgrey")
lines(sun100,col="red",lwd=2)
lines(sun50,col="blue",lwd=2)
abline(Sun.lm, col="black",lwd=2, lty="dashed")

The result of the linear model is much easier to visually associate with the moving averages then the original data.

Decompose the time series

First of all, What is the frequency of the sunspots data set? I forget! Lets findout.

tsp(sunspots)

## [1] 1749.000 1983.917   12.000

The data appears to be in the format of monthly observations from 1850 to 2016. Bellow is the trend with the variation across a calendar year removed.

Apparently the sun oscillates in intensity about every 11 years. Not quite decadal but closeish. Not quite sure how this will affect the rest of the analysis but I will proceed regardless.

f <- frequency(sunspots)
filt <- c(0.5, rep(1, times=f-1), 0.5)/f
m.hat <- filter(x=sunspots, filter=filt)
plot(m.hat,ylab="Observed Monthly Sunspots")

s.hat <- sunspots - m.hat
plot(s.hat,ylab="Sunspots Relative to Mean",axes=FALSE)
par(las=2)
axis(side=1, at=seq(0, 2020, by=10))
axis(side=2, at=seq(-50, 100, by=50))
box()

s.hat <- sunspots - m.hat
boxplot(s.hat~cycle(s.hat),xlab="Month",ylab="Observed Monthly Sunspots")

s.hat <- aggregate(s.hat~cycle(s.hat),FUN=mean) 
s.hat <- rep(s.hat$s.hat,length(sunspots)/12)
s.hat <- ts(s.hat) 
tsp(s.hat) <- tsp(sunspots) 

z.hat <- sunspots - m.hat - s.hat
sunspots.hat <- cbind(sunspots,m.hat,s.hat,z.hat)
plot(sunspots.hat)

sunspots.d <- decompose(sunspots) # compare to the more sophisticated stl function
plot(sunspots.d)

Something Interestring?

(and probably incorrect)

Cross correlation and Lagged Regression

Looking for correlations between elnino events and sunspots. The Cross Correlation Function (ccf) in R can look for events in one time series that lag (or lead) in time in response to another event. If sun spots were to actually influence el nino events we would expect that the el nino event it self would lag behind the sun spot cycle so this analysis seems appropriate.

Not sure if I can interperate it correctly but I will try.

Here are the original time series just to take another look.

nino34<-readRDS("nino34.rds")
data(nino34)
plot(nino34,ylab="Nino Region 3.4 Anomolies")

data(sunspots)
plot(sunspots,ylab="Observed Sunspots")

Cross Correlation Function

The code for the ccf function is quite simple. ccf(x-variable name, y-variable name). For my purpose it will be ccf(sunspots, nino34). This is assuming that the y-variable responds to the x-variable. Its plotted results are below.

ninosun<-ccf(sunspots, nino34)

ninosun

## 
## Autocorrelations of series 'X', by lag
## 
## -2.3333 -2.2500 -2.1667 -2.0833 -2.0000 -1.9167 -1.8333 -1.7500 -1.6667 
##  -0.001   0.003   0.009   0.018   0.023   0.027   0.023   0.022   0.020 
## -1.5833 -1.5000 -1.4167 -1.3333 -1.2500 -1.1667 -1.0833 -1.0000 -0.9167 
##   0.013   0.007   0.002   0.009   0.013   0.015   0.020   0.025   0.031 
## -0.8333 -0.7500 -0.6667 -0.5833 -0.5000 -0.4167 -0.3333 -0.2500 -0.1667 
##   0.039   0.041   0.039   0.030   0.016   0.011   0.011   0.012   0.020 
## -0.0833  0.0000  0.0833  0.1667  0.2500  0.3333  0.4167  0.5000  0.5833 
##   0.020   0.025   0.026   0.029   0.020   0.009  -0.006  -0.020  -0.032 
##  0.6667  0.7500  0.8333  0.9167  1.0000  1.0833  1.1667  1.2500  1.3333 
##  -0.041  -0.045  -0.041  -0.040  -0.033  -0.031  -0.025  -0.027  -0.032 
##  1.4167  1.5000  1.5833  1.6667  1.7500  1.8333  1.9167  2.0000  2.0833 
##  -0.044  -0.054  -0.061  -0.062  -0.060  -0.057  -0.049  -0.044  -0.038 
##  2.1667  2.2500  2.3333 
##  -0.032  -0.035  -0.041

Uncertainties

I am not sure that the data are correctly alligned because the sun spots data is much longer than the other.

In any case the correlations at approximately 1.5 - 2.0 are negative, indicating that an above average value of sun spots is likely to lead to a below average value of “Nino Region 3.4 Anomolies” about 1.5 months later.

I likely botched this whole thing but we are going to learn about this more later…. I think?

Homework 1