Checkout my sunspots Rmarkdown.Rmd document. It is neat!

data(sunspots)
plot(sunspots,ylab="Observed Sunspots")


Above is a plot of the number of sunspots counted monthly from 1850 to 2016.The plot seems to have some perhaps decadal variation. I will look to see if its related to the el nino patterns? We shall see. First lets plot the hell out of this data.

summary(sunspots)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   15.70   42.00   51.27   74.93  253.80
tsp(sunspots)
## [1] 1749.000 1983.917   12.000
library(moments)
## Warning: package 'moments' was built under R version 3.2.5
kt <-kurtosis(sunspots)
sk <- skewness(sunspots)
my.xlim <- range(sunspots)
h<-hist(sunspots, breaks=10, col="lightblue", xlab="Lake Level (ft)",
        main="",xlim=my.xlim) 
xfit<-seq(min(sunspots),max(sunspots),length=100) 
yfit<-dnorm(xfit,mean=mean(sunspots),sd=sd(sunspots)) 
yfit <- yfit*diff(h$mids[1:2])*length(sunspots) 
lines(xfit, yfit, col="darkblue", lwd=2)
boxplot(sunspots, horizontal=TRUE,  outline=TRUE,  axes=FALSE,ylim=my.xlim, col = "lightgreen", add = TRUE, boxwex=130)
text(x = 250, y=800, labels = paste("Kurtosis=",round(kt,2)),pos = 2)
text(x = 250, y=750, labels = paste("Skewness=",round(sk,2)),pos = 2)


The distribution appears to be rather leptokurtic based upon the kurtosis number of 3.98 (beaing greater than 3). Looking at the quantiles we can get a better understanding of the datas distrubution. To do this I will use the qqnorm function and look at devations for the expectation of normality.

qqnorm(sunspots,main="Normal QQ Plot of Sunspots Data")
qqline(sunspots, col = 2)

No Trend In Time

Given that this is data gathered from the sun, I don’t expect that a long term trend will be detected. Changes occuring on the sun likely occur on scales much greater then 1850 - 2016. We will look just to be sure.

SunTime <- time(sunspots)
Sun.lm <- lm(sunspots ~ SunTime)
summary(Sun.lm)
## 
## Call:
## lm(formula = sunspots ~ SunTime)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -60.400 -34.848  -8.314  23.557 196.470 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -135.88524   22.25201  -6.107 1.16e-09 ***
## SunTime        0.10027    0.01191   8.416  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 42.92 on 2818 degrees of freedom
## Multiple R-squared:  0.02452,    Adjusted R-squared:  0.02417 
## F-statistic: 70.83 on 1 and 2818 DF,  p-value: < 2.2e-16

There appears to be 0.10027 more sunspots per year over across the time in which we have data for p < 2e-16. Who woulda thunk?

Filters

Lets plot the data with some moving average filters and plot that linear modle along with those plots. Making the moving average at 100 (red) and 50 (blue) seems to work the best. Lesser time intervals looks very messy and is hard to see any trend other then the original data.

sun50 <- filter(x=sunspots, filter=rep(x=1/50,times=50), sides=2)
sun100 <- filter(x=sunspots, filter=rep(x=1/100,times=100), sides=2)
plot(sunspots,col="darkgrey")
lines(sun100,col="red",lwd=2)
lines(sun50,col="blue",lwd=2)
abline(Sun.lm, col="black",lwd=2, lty="dashed") 

The result of the linear model is much easier to visually associate with the moving averages then the original data.

Decompose the time series

First of all, What is the frequency of the sunspots data set? I forget! Lets findout.

tsp(sunspots)
## [1] 1749.000 1983.917   12.000

The data appears to be in the format of monthly observations from 1850 to 2016. Bellow is the trend with the variation across a calendar year removed.

Apparently the sun oscillates in intensity about every 11 years. Not quite decadal but closeish. Not quite sure how this will affect the rest of the analysis but I will proceed regardless.

f <- frequency(sunspots)
filt <- c(0.5, rep(1, times=f-1), 0.5)/f
m.hat <- filter(x=sunspots, filter=filt)
plot(m.hat,ylab="Observed Monthly Sunspots") 

s.hat <- sunspots - m.hat
plot(s.hat,ylab="Sunspots Relative to Mean",axes=FALSE)
par(las=2)
axis(side=1, at=seq(0, 2020, by=10))
axis(side=2, at=seq(-50, 100, by=50))
box()

s.hat <- sunspots - m.hat
boxplot(s.hat~cycle(s.hat),xlab="Month",ylab="Observed Monthly Sunspots")

s.hat <- aggregate(s.hat~cycle(s.hat),FUN=mean) 
s.hat <- rep(s.hat$s.hat,length(sunspots)/12)
s.hat <- ts(s.hat) 
tsp(s.hat) <- tsp(sunspots) 

z.hat <- sunspots - m.hat - s.hat
sunspots.hat <- cbind(sunspots,m.hat,s.hat,z.hat)
plot(sunspots.hat)

sunspots.d <- decompose(sunspots) # compare to the more sophisticated stl function
plot(sunspots.d)