Data from Bellingham Airport

require(zoo)
require(moments)
require(lubridate)

Load AirPassengers

Plot and decompose AirPassengers.

#?AirPassengers
#The classic Box & Jenkins airline data. Monthly totals of international airline passengers, 1949 to 1960.
airp<-AirPassengers
str(airp);tsp(airp);summary(airp);head(airp);length(airp)
##  Time-Series [1:144] from 1949 to 1961: 112 118 132 129 121 135 148 148 136 119 ...
## [1] 1949.000 1960.917   12.000
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   104.0   180.0   265.5   280.3   360.5   622.0
## [1] 112 118 132 129 121 135
## [1] 144
plot(airp, xlab = "Year", ylab = "Thousands of Air Passengers", main = "Monthly International Air Passenger data")

This looks like it should. From 1949 to 1960 with a frequency of 12. This will be interesting to decompose since there is not as smooth of a seasonality to the data as there is with co2. There should be a clear trend but I wonder if a non-linear trend would be more applicable.

boxplot(airp~cycle(airp), xlab = "Year", ylab = "Thousands of Air Passengers", main = "Monthly International Air Passenger data")

You can see which months are the ‘peak’ months but also that the data has a lot of variability within months. I don’t really like the boxplot since there is clearly an increasing trend that makes comparing months across all years hollow.

Air Passengers decomp

LOST<-decompose(AirPassengers, type = "additive")
str(LOST);head(LOST$trend);tail(LOST$trend)
## List of 6
##  $ x       : Time-Series [1:144] from 1949 to 1961: 112 118 132 129 121 135 148 148 136 119 ...
##  $ seasonal: Time-Series [1:144] from 1949 to 1961: -24.75 -36.19 -2.24 -8.04 -4.51 ...
##  $ trend   : Time-Series [1:144] from 1949 to 1961: NA NA NA NA NA ...
##  $ random  : Time-Series [1:144] from 1949 to 1961: NA NA NA NA NA ...
##  $ figure  : num [1:12] -24.75 -36.19 -2.24 -8.04 -4.51 ...
##  $ type    : chr "additive"
##  - attr(*, "class")= chr "decomposed.ts"
## [1] NA NA NA NA NA NA
## [1] NA NA NA NA NA NA
plot(LOST)

My first thought is that the random is does not look random. Not that that is what random means, but still. My very rough guess is that since seasonal has to be even over the whole series the ‘random’ noise is actually showing the change is seasonality. Applied to this data, I would say that betwen 1949 and 1960 more people are international air passengers(trend), more people are air passengers during the peak (seasonal), and that the proportion of total travel that happens during the peaks has increased (random). On the seasonal plot, you can see the offseason, shoulder season, and peak season.

Also you can see that the first and last six months are lost for trend and random.

Load KBLI data

Look at str, summary, and head before plotting kbli.

kbli <- readRDS("airportZoo.rds")
str(kbli);summary(kbli);head(kbli)
## 'zoo' series from 1941-01-01 to 2019-03-31
##   Data: num [1:28548, 1:3] 0 0 0 7.6 0.5 0 4.6 0 11.4 0 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:3] "prcp" "tmax" "tmin"
##   Index:  Date[1:28548], format: "1941-01-01" "1941-01-02" "1941-01-03" "1941-01-04" "1941-01-05" ...
##      Index                 prcp             tmax             tmin        
##  Min.   :1941-01-01   Min.   : 0.000   Min.   :-12.80   Min.   :-18.900  
##  1st Qu.:1960-08-16   1st Qu.: 0.000   1st Qu.: 10.00   1st Qu.:  1.700  
##  Median :1980-03-01   Median : 0.000   Median : 14.40   Median :  5.600  
##  Mean   :1980-02-29   Mean   : 2.413   Mean   : 14.43   Mean   :  5.494  
##  3rd Qu.:1999-09-15   3rd Qu.: 2.500   3rd Qu.: 19.40   3rd Qu.: 10.000  
##  Max.   :2019-03-31   Max.   :83.800   Max.   : 35.60   Max.   : 19.400
##            prcp tmax tmin
## 1941-01-01  0.0  6.7 -5.6
## 1941-01-02  0.0  7.2 -6.7
## 1941-01-03  0.0  5.6 -4.4
## 1941-01-04  7.6  7.2  1.1
## 1941-01-05  0.5  5.0 -6.1
## 1941-01-06  0.0  6.1  1.1

Same data as A visit to the zoo by Andy Bunn. You see the that the object is class zoo with daily data points for prcp, tmax and tmin(precipitation, maximum temperature, and minimum temperature). I would add that the summary statistics for the variables leads me to believe that they are in millimeters and \(^\circ\)Celsius.

plot(kbli)

Plotting kbli shows a clear seasonality to tmax and tmin. prcp is harder to decipher so it will be interesting to see how this procedes.

KBLI Aggregate variables

prcp

prcp<-kbli[,colnames(kbli) =="prcp"]
a_prcp<-aggregate(prcp, as.yearmon, sum)
str(a_prcp)
## 'zoo' series from Jan 1941 to Mar 2019
##   Data: num [1:938] 88.2 53.5 30.5 46.3 93.3 24.9 12.6 68.5 95.1 78.1 ...
##   Index:  'yearmon' num [1:938] Jan 1941 Feb 1941 Mar 1941 Apr 1941 ...

I created a zoo object with the Date index and prcp but without the two temperature variables. This was done so that the prcp is summed instead of averaged like tmax and tmin will be. prcp was aggregated to monthly resolution just as temp will be.

temps

temp<-kbli[,colnames(kbli) !="prcp"]
a_temp<-aggregate(temp, as.yearmon, mean)
str(a_temp)
## 'zoo' series from Jan 1941 to Mar 2019
##   Data: num [1:938, 1:2] 10 12.2 15.8 16.9 17.6 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:938] "1" "32" "60" "91" ...
##   ..$ : chr [1:2] "tmax" "tmin"
##   Index:  'yearmon' num [1:938] Jan 1941 Feb 1941 Mar 1941 Apr 1941 ...

Summed max and min temperature would be an interesting metric, but the average is much more applicable. a_prcp and a_temp are both the same length.

## ..$ : chr [1:938] "1" "32" "60" "91" ... is interesting as it shows the remaining rows or some sort of equivalent. This doesn’t show for a_prcp which leads me to believe it is a result of the mean fx opposed to the sum fx.

Recompose prcp and temp just to decompose

a_kbli<-cbind(a_prcp,a_temp)

str(a_kbli)
## 'zoo' series from Jan 1941 to Mar 2019
##   Data: num [1:938, 1:3] 88.2 53.5 30.5 46.3 93.3 24.9 12.6 68.5 95.1 78.1 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:3] "a_prcp" "tmax" "tmin"
##   Index:  'yearmon' num [1:938] Jan 1941 Feb 1941 Mar 1941 Apr 1941 ...
head(a_kbli);head(a_prcp);head(a_temp)
##          a_prcp     tmax       tmin
## Jan 1941   88.2 10.00323  0.2967742
## Feb 1941   53.5 12.15357 -0.1571429
## Mar 1941   30.5 15.75484  1.6645161
## Apr 1941   46.3 16.85333  3.3466667
## May 1941   93.3 17.63226  5.8225806
## Jun 1941   24.9 19.92000  7.8833333
## Jan 1941 Feb 1941 Mar 1941 Apr 1941 May 1941 Jun 1941 
##     88.2     53.5     30.5     46.3     93.3     24.9
##              tmax       tmin
## Jan 1941 10.00323  0.2967742
## Feb 1941 12.15357 -0.1571429
## Mar 1941 15.75484  1.6645161
## Apr 1941 16.85333  3.3466667
## May 1941 17.63226  5.8225806
## Jun 1941 19.92000  7.8833333
plot(a_kbli)

It’s alive! After being pulled apart and decimated (dodecimated?) seperately the variables fit back together and are ready for decompose

KBLI decomp

Maybe it was unnecessary to bind the variables back together as they will put through decomp seperately but it made for a good little plot.

prcp

This next step took a lot of work to get through. decompose didn’t seem to enjoy my objects and would spit back a variety of things from non-conformable rays and incompatible methods to the catchall Error in decompose(ts_prcp) : time series has no or less than 2 periods.

I figured that if I wasn’t sure what was going on I should just define my own ts object with the correct start and end that had the correct frequency.

ts_prcp<-ts(a_prcp, start = c(1941,1), end = c(2019,3), frequency=12)
tspd<-decompose(ts_prcp)
plot(tspd)

This looks about how I would expect it to. There is a non-random trend likely due to ENSO strengths, a highly regular seasonal pattern, and a good amount of noise thrown in there as well. In Bellingham we would expect higher precipitation during the winters and lower during the summers.

jan.pr<-as.vector(subset(a_prcp, months(time(a_prcp)) == "January"))
jul.pr<-as.vector(subset(a_prcp, months(time(a_prcp)) == "July"))
sub<-jan.pr-jul.pr
boxplot(jan.pr, jul.pr, sub, notch = T, ylab = "Monthly Precip (mm)", main = '1941-2019',names=c("Jan","July", "Jan-July"))

As expected, January has higher mean precip than July does. What surprises me is that the difference between Jan and July by year is not consistent. The third boxplot shows that some years January has ~230 mm more precipitation that in July of the same year while there are also years that July gets more precipitation!

At first I didn’t believe it but here it is:

head(jan.pr);head(jul.pr);head(sub)
## [1]  88.20  33.00  37.65 132.00 104.30 139.20
## [1] 12.6 64.2 43.0  0.3  4.0 21.2
## [1]  75.60 -31.20  -5.35 131.70 100.30 118.00

The second year (1942) has a higher cumulative precipitation in July than in January.

temps

a_tmax<-a_temp[,colnames(a_temp) =="tmax"]
ts_tmax<-ts(a_tmax, start = c(1941,1), end = c(2019,3), frequency=12)
tstmax<-decompose(ts_tmax)
plot(tstmax)

a_tmin<-a_temp[,colnames(a_temp) =="tmin"]
ts_tmin<-ts(a_tmin, start = c(1941,1), end = c(2019,3), frequency=12)
tstmin<-decompose(ts_tmin)
plot(tstmin)