Introduction
Our data set is the mean summer temperatures from 1781-1988. We want to see if there are any trends in the data or any type of seasonality. We also want to see if the mean temperature is going up throughout the years. It would be helpful to know how to predict mean temperature in order to see what changes we can make to the environment to maintain a good temperature. The forecast would be very helpful to environmental scientists and anyone working in the government who could possibly make policies about our environment and fuel emissions.
Analysis
In R for certain functions we need a specific package. In our case to do time series analysis we need the astsa package. After that we need to read in our txt file and then change this to a time series object and we do that using the command as.ts(), which stands for as a time series. We use the function read.cvs to read in our file. Unless the data file is in the same working directory as the R project, it will not be able to read the file. We use the function ts.plot to plot a time series object. We do not notice a general trend however there is seasonality in the data. A trend is a long term increase or decrease in the data. There is no trend because we don’t see a clear positive trend, or negative trend over time. Our data is more constant than it is linear. We do see spikes in the data which come from the seasonality. Seasonality is when a time series is affected by seasonal factors like the time of the year. Some summers the mean temperature was fairly high and other summers it was low. We try taking the log of the data to eliminate the seasonality, however it doesn’t help much. So we take the difference of the log of the data. Now we take the ACF and PACF to figure out a model selection. We use the sarima function to do our prediction. We check the residuals to see if it is white noise and if it is we can proceed and calculate forecast.
Conclusion
We found a model that fits well to our data. We see that the ten year forecast is somewhere around 15 degrees Celsius. This is good however we shouldn’t assume this is okay for the environment considering this is data for mean summer temperatures and doesn’t include sea levels, ocean temperatures, co2 emissions, or temperature of the earth. In order to make an informed decision about the environment we would want to look at data from all of those data sets as well.
library(astsa)
## Warning: package 'astsa' was built under R version 3.5.2
summer <- read.csv(file="summer.txt", header=TRUE, sep=",")
summer <- as.ts(summer, frequency = 4)
ts.plot(summer)
ts.plot(diff(log(summer)))
acf2(diff(log(summer)))
## ACF PACF
## [1,] -0.41 -0.41
## [2,] -0.13 -0.37
## [3,] 0.10 -0.18
## [4,] 0.03 -0.06
## [5,] -0.19 -0.25
## [6,] 0.15 -0.09
## [7,] 0.01 -0.06
## [8,] -0.01 0.02
## [9,] -0.10 -0.11
## [10,] 0.04 -0.13
## [11,] -0.03 -0.17
## [12,] 0.05 -0.10
## [13,] 0.08 0.05
## [14,] -0.12 -0.12
## [15,] 0.05 -0.05
## [16,] 0.06 0.02
## [17,] -0.06 0.03
## [18,] -0.01 0.02
## [19,] 0.04 -0.03
## [20,] 0.01 0.04
## [21,] -0.07 -0.01
## [22,] 0.01 -0.02
## [23,] -0.01 -0.10
## [24,] 0.10 0.08
## [25,] -0.15 -0.09
sarima(log(summer), 2,1,2)
## initial value -2.686194
## iter 2 value -2.887034
## iter 3 value -2.911776
## iter 4 value -2.912538
## iter 5 value -2.929618
## iter 6 value -2.935844
## iter 7 value -2.938807
## iter 8 value -2.940314
## iter 9 value -2.940328
## iter 10 value -2.940343
## iter 11 value -2.940349
## iter 12 value -2.940390
## iter 13 value -2.940399
## iter 14 value -2.940401
## iter 15 value -2.940406
## iter 16 value -2.940410
## iter 17 value -2.940443
## iter 18 value -2.940476
## iter 19 value -2.940498
## iter 20 value -2.940502
## iter 21 value -2.940502
## iter 21 value -2.940502
## iter 21 value -2.940502
## final value -2.940502
## converged
## initial value -2.939393
## iter 2 value -2.939446
## iter 3 value -2.939552
## iter 4 value -2.939574
## iter 5 value -2.939579
## iter 6 value -2.939586
## iter 7 value -2.939614
## iter 8 value -2.939677
## iter 9 value -2.939770
## iter 10 value -2.939810
## iter 11 value -2.939821
## iter 12 value -2.939822
## iter 12 value -2.939822
## final value -2.939822
## converged
## $fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed,
## optim.control = list(trace = trc, REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ar2 ma1 ma2 constant
## -0.2441 0.0108 -0.5612 -0.3698 -2e-04
## s.e. 0.4682 0.1057 0.4627 0.4423 2e-04
##
## sigma^2 estimated as 0.002767: log likelihood = 314.82, aic = -617.65
##
## $degrees_of_freedom
## [1] 202
##
## $ttable
## Estimate SE t.value p.value
## ar1 -0.2441 0.4682 -0.5213 0.6027
## ar2 0.0108 0.1057 0.1023 0.9186
## ma1 -0.5612 0.4627 -1.2129 0.2266
## ma2 -0.3698 0.4423 -0.8360 0.4042
## constant -0.0002 0.0002 -0.6716 0.5026
##
## $AIC
## [1] -2.983796
##
## $AICc
## [1] -2.982354
##
## $BIC
## [1] -2.887195
sarima.for(summer, 12,1,1,1,0,1,0,12)
## $pred
## Time Series:
## Start = 209
## End = 220
## Frequency = 1
## [1] 14.58253 13.91871 14.75303 14.25252 15.01248 16.03248 15.91248
## [8] 13.88248 15.33248 15.31248 14.64248 15.85248
##
## $se
## Time Series:
## Start = 209
## End = 220
## Frequency = 1
## [1] 1.055209 1.059849 1.059928 1.059933 1.059933 1.059933 1.059933
## [8] 1.059933 1.059933 1.059933 1.059933 1.059933
sarima.for(summer, n.ahead=10,2,1,2)
## $pred
## Time Series:
## Start = 209
## End = 218
## Frequency = 1
## [1] 15.16865 14.99411 15.03152 15.01849 15.01877 15.01557 15.01327
## [8] 15.01074 15.00827 15.00578
##
## $se
## Time Series:
## Start = 209
## End = 218
## Frequency = 1
## [1] 0.7999461 0.8148118 0.8150654 0.8166712 0.8178039 0.8190503 0.8202641
## [8] 0.8214840 0.8227001 0.8239149