HW1 HH

2.1

Use the help function to explore what the series gold, woolyrnq and gas represent.

A. Use autoplot() to plot each of these in separate plots:

B. What is the frequency of each series? Hint: apply the frequency() function.

C. * Use which.max() to spot the outlier in the gold series. Which observation was it?

library(forecast)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library(ggplot2)
library(readxl)
library(RCurl)
library(fpp2)

## -- Attaching packages ---------------------------------------------- fpp2 2.4 --

## v fma       2.4     v expsmooth 2.3

##

help(gold)

## starting httpd help server ... done

help(woolyrnq)
help(gas)

A. Use autoplot() to plot each of these in separate plots.

library(fpp2)
autoplot(forecast::gold) + ylab("US Dollars") +  ggtitle("Gold Prices in USA Daily") +xlab('Day')

autoplot(woolyrnq)

autoplot(gas)

B. What is the frequency of each series? Hint: apply the frequency() function.

frequency(gold)

## [1] 1

frequency(woolyrnq)

## [1] 4

frequency(gas)

## [1] 12

C. Use which.max() to spot the outlier in the gold series. Which observation was it?

which.max(gold)

## [1] 770

which.max(woolyrnq)

## [1] 21

which.max(gas)

## [1] 475

2.2

Down the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget, and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.

A. You can read the data into R with the following script:

tute1 <- read.csv("https://otexts.com/fpp2/extrafiles/tute1.csv", header=TRUE)
# View(tute1)

B. Convert the data to time series.

tute.ts <- ts(tute1[,-1], start=1981, frequency=4)

C. Construct time series plots of each of the three series.

autoplot(tute.ts, facets=TRUE)

2.3

Download some monthly Australian retail data from the book website. These represent retail sales in various categories for different Australian states, and are stored in a MS-Excel file.

A. You can read the data into R with the following script:

library(readxl)
retaildata <- readxl::read_excel("T:/00-624 HH Predictive Analytics/HW1 HH/HWK1HH/retail.xlsx", skip=1)
head(retaildata, 2)

## # A tibble: 2 x 190
##   `Series ID`         A3349335T A3349627V A3349338X A3349398A A3349468W
##   <dttm>                  <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
## 1 1982-04-01 00:00:00      303.      41.7      63.9      409.      65.8
## 2 1982-05-01 00:00:00      298.      43.1      64        405.      65.8
## # ... with 184 more variables: A3349336V <dbl>, A3349337W <dbl>,
## #   A3349397X <dbl>, A3349399C <dbl>, A3349874C <dbl>, A3349871W <dbl>,
## #   A3349790V <dbl>, A3349556W <dbl>, A3349791W <dbl>, A3349401C <dbl>,
## #   A3349873A <dbl>, A3349872X <dbl>, A3349709X <dbl>, A3349792X <dbl>,
## #   A3349789K <dbl>, A3349555V <dbl>, A3349565X <dbl>, A3349414R <dbl>,
## #   A3349799R <dbl>, A3349642T <dbl>, A3349413L <dbl>, A3349564W <dbl>,
## #   A3349416V <dbl>, A3349643V <dbl>, A3349483V <dbl>, A3349722T <dbl>,
## #   A3349727C <dbl>, A3349641R <dbl>, A3349639C <dbl>, A3349415T <dbl>,
## #   A3349349F <dbl>, A3349563V <dbl>, A3349350R <dbl>, A3349640L <dbl>,
## #   A3349566A <dbl>, A3349417W <dbl>, A3349352V <dbl>, A3349882C <dbl>,
## #   A3349561R <dbl>, A3349883F <dbl>, A3349721R <dbl>, A3349478A <dbl>,
## #   A3349637X <dbl>, A3349479C <dbl>, A3349797K <dbl>, A3349477X <dbl>,
## #   A3349719C <dbl>, A3349884J <dbl>, A3349562T <dbl>, A3349348C <dbl>,
## #   A3349480L <dbl>, A3349476W <dbl>, A3349881A <dbl>, A3349410F <dbl>,
## #   A3349481R <dbl>, A3349718A <dbl>, A3349411J <dbl>, A3349638A <dbl>,
## #   A3349654A <dbl>, A3349499L <dbl>, A3349902A <dbl>, A3349432V <dbl>,
## #   A3349656F <dbl>, A3349361W <dbl>, A3349501L <dbl>, A3349503T <dbl>,
## #   A3349360V <dbl>, A3349903C <dbl>, A3349905J <dbl>, A3349658K <dbl>,
## #   A3349575C <dbl>, A3349428C <dbl>, A3349500K <dbl>, A3349577J <dbl>,
## #   A3349433W <dbl>, A3349576F <dbl>, A3349574A <dbl>, A3349816F <dbl>,
## #   A3349815C <dbl>, A3349744F <dbl>, A3349823C <dbl>, A3349508C <dbl>,
## #   A3349742A <dbl>, A3349661X <dbl>, A3349660W <dbl>, A3349909T <dbl>,
## #   A3349824F <dbl>, A3349507A <dbl>, A3349580W <dbl>, A3349825J <dbl>,
## #   A3349434X <dbl>, A3349822A <dbl>, A3349821X <dbl>, A3349581X <dbl>,
## #   A3349908R <dbl>, A3349743C <dbl>, A3349910A <dbl>, A3349435A <dbl>,
## #   A3349365F <dbl>, A3349746K <dbl>, ...

B. Select one of the time series as follows (but replace the column name with your own chosen column):

retail.ts <- ts(retaildata[,"A3349350R"], frequency=12, start=c(1982,4))

C. Explore your chosen retail time series using the following functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf().

autoplot(retail.ts)

#### Q: Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

Answer: es the seasonality is very clear every year from the period of time of 1980s till 2015. Each year there is a spike in retail sales, presumably happening in summer.

yes there is a clear cyclicity as well. there is an upward trend of clothing sales increasing each and every year during this time. The magnitude of the upward trend is smaller in the early phase, 1990s, gradually goes up as time goes by, and the magnitude is most pronounced in the late phase, 2010.

2.6

Use the following graphics functions: autoplot(), ggseasonplot(), ggsubseriesplot(), gglagplot(), ggAcf() and explore features from the following time series: hsales, usdeaths, bricksq, sunspotarea, gasoline.

hsales

hsales.ts <- hsales

autoplot(hsales.ts)

ggseasonplot(hsales.ts)

ggsubseriesplot(hsales.ts)

gglagplot(hsales.ts)

ggAcf(hsales.ts)

#### Q: Can you spot any seasonality, cyclicity and trend? What do you learn about the series?

Answer: Hsales data shows a strong season no paden, which peaks every year around April March. It does not show trend over the years.

US Deaths

usdeaths.ts <- usdeaths

autoplot(usdeaths.ts)

ggseasonplot(usdeaths.ts)

ggsubseriesplot(usdeaths.ts)

gglagplot(usdeaths.ts)

ggAcf(usdeaths.ts)

Data of USA deaths do not show a trend, which stays at similar level year after year. It does, however, shows a very strong seasonality within each year. Each year in January to February, the death number is the smallest, while it peaks in summer time. The lag plot shows a negative correlation lag 6 and lag 8. So this explains the lag time from February to July.

bricksq

brisksq <- bricksq

autoplot(brisksq)

ggseasonplot(brisksq)

ggsubseriesplot(brisksq)

gglagplot(brisksq)

ggAcf(brisksq)

In terms of trend, the BRISKSQ data shows first a positive trend from early years until 1970s, subsequently a negative trend from yes after and onward. There is a huge dip in 1975, as well as 1982.

The seasonality plot for this data is confusing period it is likely that the middle of the year there are some what peak.

sunspotarea

sunspotarea.ts <-  sunspotarea

autoplot(sunspotarea.ts)

gglagplot(sunspotarea.ts,lags=12)

ggAcf(sunspotarea.ts)

gasoline

gasoline.ts <-  gasoline

autoplot(gasoline.ts)

ggseasonplot(gasoline.ts)

gglagplot(gasoline.ts)

ggAcf(gasoline.ts)

The gas sales in USA shows a clear seasonality. It is high during the late weeks (week 47- week 50 ), which corresponds to holiday season and the holiday travel. This is intuitive two common believe.