data(sunspots)
plot(sunspots,ylab="Observed Sunspots")
Above is a plot of the number of sunspots counted monthly from 1850 to 2016.The plot seems to have some perhaps decadal variation. I will look to see if its related to the el nino patterns? We shall see. First lets plot the hell out of this data.
summary(sunspots)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 15.70 42.00 51.27 74.93 253.80
tsp(sunspots)
## [1] 1749.000 1983.917 12.000
library(moments)
## Warning: package 'moments' was built under R version 3.2.5
kt <-kurtosis(sunspots)
sk <- skewness(sunspots)
my.xlim <- range(sunspots)
h<-hist(sunspots, breaks=10, col="lightblue", xlab="Lake Level (ft)",
main="",xlim=my.xlim)
xfit<-seq(min(sunspots),max(sunspots),length=100)
yfit<-dnorm(xfit,mean=mean(sunspots),sd=sd(sunspots))
yfit <- yfit*diff(h$mids[1:2])*length(sunspots)
lines(xfit, yfit, col="darkblue", lwd=2)
boxplot(sunspots, horizontal=TRUE, outline=TRUE, axes=FALSE,ylim=my.xlim, col = "lightgreen", add = TRUE, boxwex=130)
text(x = 250, y=800, labels = paste("Kurtosis=",round(kt,2)),pos = 2)
text(x = 250, y=750, labels = paste("Skewness=",round(sk,2)),pos = 2)
The distribution appears to be rather leptokurtic based upon the kurtosis number of 3.98 (beaing greater than 3). Looking at the quantiles we can get a better understanding of the datas distrubution. To do this I will use the qqnorm function and look at devations for the expectation of normality.
qqnorm(sunspots,main="Normal QQ Plot of Sunspots Data")
qqline(sunspots, col = 2)
Given that this is data gathered from the sun, I don’t expect that a long term trend will be detected. Changes occuring on the sun likely occur on scales much greater then 1850 - 2016. We will look just to be sure.
SunTime <- time(sunspots)
Sun.lm <- lm(sunspots ~ SunTime)
summary(Sun.lm)
##
## Call:
## lm(formula = sunspots ~ SunTime)
##
## Residuals:
## Min 1Q Median 3Q Max
## -60.400 -34.848 -8.314 23.557 196.470
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -135.88524 22.25201 -6.107 1.16e-09 ***
## SunTime 0.10027 0.01191 8.416 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 42.92 on 2818 degrees of freedom
## Multiple R-squared: 0.02452, Adjusted R-squared: 0.02417
## F-statistic: 70.83 on 1 and 2818 DF, p-value: < 2.2e-16
There appears to be 0.10027
more sunspots per year over across the time in which we have data for p < 2e-16
. Who woulda thunk?
Lets plot the data with some moving average filters and plot that linear modle along with those plots. Making the moving average at 100 (red) and 50 (blue) seems to work the best. Lesser time intervals looks very messy and is hard to see any trend other then the original data.
sun50 <- filter(x=sunspots, filter=rep(x=1/50,times=50), sides=2)
sun100 <- filter(x=sunspots, filter=rep(x=1/100,times=100), sides=2)
plot(sunspots,col="darkgrey")
lines(sun100,col="red",lwd=2)
lines(sun50,col="blue",lwd=2)
abline(Sun.lm, col="black",lwd=2, lty="dashed")
The result of the linear model is much easier to visually associate with the moving averages then the original data.
First of all, What is the frequency of the sunspots
data set? I forget! Lets findout.
tsp(sunspots)
## [1] 1749.000 1983.917 12.000
The data appears to be in the format of monthly observations from 1850 to 2016. Bellow is the trend with the variation across a calendar year removed.
Apparently the sun oscillates in intensity about every 11 years. Not quite decadal but closeish. Not quite sure how this will affect the rest of the analysis but I will proceed regardless.
f <- frequency(sunspots)
filt <- c(0.5, rep(1, times=f-1), 0.5)/f
m.hat <- filter(x=sunspots, filter=filt)
plot(m.hat,ylab="Observed Monthly Sunspots")
s.hat <- sunspots - m.hat
plot(s.hat,ylab="Sunspots Relative to Mean",axes=FALSE)
par(las=2)
axis(side=1, at=seq(0, 2020, by=10))
axis(side=2, at=seq(-50, 100, by=50))
box()
s.hat <- sunspots - m.hat
boxplot(s.hat~cycle(s.hat),xlab="Month",ylab="Observed Monthly Sunspots")
s.hat <- aggregate(s.hat~cycle(s.hat),FUN=mean)
s.hat <- rep(s.hat$s.hat,length(sunspots)/12)
s.hat <- ts(s.hat)
tsp(s.hat) <- tsp(sunspots)
z.hat <- sunspots - m.hat - s.hat
sunspots.hat <- cbind(sunspots,m.hat,s.hat,z.hat)
plot(sunspots.hat)
sunspots.d <- decompose(sunspots) # compare to the more sophisticated stl function
plot(sunspots.d)