Predictive
Analytics

Dan Wigodsky

Data 624 Homework 1

February 7, 2019

Question 2:1

Three time series are loaded in to explore:

Gas: Australian monthly gas production from the Australian Bureau of Statistics
Woolyrnq: quarterly production of woollen yarn, 3/1965- 9/1994
Gold: daily morning gold prices 1/1/1985 - 3/31/1989
##  Time-Series [1:476] from 1956 to 1996: 1709 1646 1794 1878 2173 ...

##  Time-Series [1:119] from 1965 to 1994: 6172 6709 6633 6660 6786 ...

##  Time-Series [1:1108] from 1 to 1108: 306 300 303 297 304 ...

The price of gold spiked on the 770th entry in our series. There is no known gold price spike in the time in question that approached $600. On Black Monday in October, a climb to $491.50 was notable. The number of trading days in a year don’t match the purported timeframe.
The gas series has a frequency of every month. The wool series has a frequency of every quarter. The gold has a daily frequency.

Question 2:2

Tute1 time series

Tute1 is a quarterly series for a small company over the period 1981-2005. Series include sales, ad budget and GDP.
Head of Tute1 Dataset
X Sales AdBudget GDP
Mar-81 1020.2 659.2 251.8
Jun-81 889.2 589.0 290.9
Sep-81 795.0 512.5 290.8
Dec-81 1003.9 614.1 292.4
Mar-82 1057.7 647.2 279.1
Jun-82 944.4 602.0 254.0
Sep-82 778.5 530.7 295.6
Dec-82 932.5 608.4 271.7
Mar-83 996.5 637.9 259.6
Jun-83 907.7 582.4 280.5
Graphing our time series with the facet parameter places each series in a separate chart with a separate scale. Without it, all 3 are graphed on one axis and differ by color.

Question 2:3

Australian retail data

Series: Turnover for Clothing, footwear and personal accessory retail sector in Australian Capital Territory
Our series shows a strong seasonal effect where retailers sell a lot of merchandise quickly during the holiday season. It shows up in the seasonal plot, the lag plot, the acf and pacf plot. The subseries plots show a strong trend upwards. This is also seen by the stacking effect in the seasonal plot. The cycle is 12 months.

Question 2:6

New one-family home sales, US Census Bureau, Manufacturing and Construction Division

Usdeaths time series, monthly accidental deaths

Australian quarterly clay brick production: 1956–1994

Brick production shows a strong positive trend during the time. There appears to be a lag 1 and lag 4 correlation. There is some seasonal effect, which may have diminished over the period.
##  Time-Series [1:155] from 1956 to 1994: 189 204 208 197 187 214 227 223 199 229 ...

Sunspot area - annual series

The average sunspot area had an 11 year period which was not easily diagnosed by the software. There is a low around 1900 and a high in 1955. There are regular 11 year lows followed by highs half of the cycle later. The seasonplot function and subseries function would not work because they didn’t detect seasonality.
##  Time-Series [1:141] from 1875 to 2015: 213.1 109.3 92.9 22.2 36.3 ...
##  - attr(*, "names")= chr [1:141] "1875" "1876" "1877" "1878" ...

US Finished Motor Gasoline Product Supplied.

Weekly data beginning 2 February 1991, ending 20 January 2017. Units are “million barrels per day”.

US gas consumption has steadily increased. It looks like the rate of increase slowed over the period. The data is weekly and shows a yearly seasonality. The lag plot has a difficulty picking up any pattern. If the data were aggregated by week, a pattern would be easier to detect. The acf plot shows it, though. The variance also appears to increase through the 1990s as the economy expanded. The variance narrowed around the 2008 recession. It became large toward the end of the time period.

—————————————————————————

_______________________Appendix____________________________________________

devtools::install_github(“yixuan/showtext”) suppressWarnings(suppressMessages(library(forecast))) suppressWarnings(suppressMessages(library(showtext))) suppressWarnings(suppressMessages(library(ggplot2))) suppressWarnings(suppressMessages(library(kableExtra))) suppressWarnings(suppressMessages(library(fma))) suppressWarnings(suppressMessages(library(fpp))) suppressWarnings(suppressMessages(library(fpp2)))

font_add_google(name = “Corben”, family = “corben”, regular.wt = 400, bold.wt = 700)

Question 2:1

str(gas) autoplot(gas, ylab=“gas production”, xlab=“”) str(woolyrnq) autoplot(woolyrnq, ylab=“wool yarn production”, xlab=“”) str(gold) autoplot(gold, ylab=“gold price”, xlab=“”)+scale_x_discrete(limits=c(1,365,730,1095), labels=c(“1985”,“1986”,“1987”,“1988”,“1989”))

which.max(gold)

tute1 <- read.csv(“C:/Users/dawig/Desktop/Data624/tute1.csv”, header=TRUE) kable_input<-kable(tute1[1:10,], “html”) %>% kable_styling(“striped”, full_width = F) %>% column_spec(1, bold = T, color = “white”, background = “#3dc666”) %>% column_spec(2, bold = T, color = “#3dc666”, background = “white”) %>% column_spec(3, bold = T, color = “#3dc666”, background = “white”) %>% column_spec(4, bold = T, color = “#3dc666”, background = “white”) add_header_above(kable_input, header = c(‘’, “Head of Tute1 Dataset”=3), bold = TRUE, italic = TRUE)%>% kable_styling(bootstrap_options = “striped”, font_size = 22)

mytimeseries <- ts(tute1[,-1], start=1981, frequency=4) autoplot(mytimeseries, facets=TRUE, ylab=“tute 1 small business”, xlab=“”) autoplot(mytimeseries, facets=FALSE, ylab=“tute1 small business”, xlab=“”)

retaildata <- readxl::read_excel(“C:/Users/dawig/Desktop/Data624/retail.xlsx”, skip=1) turnover <- ts(retaildata[,“A3349608L”], frequency=12, start=c(1982,4)) autoplot(turnover, ylab=“turnover”, xlab=“”)+ theme(panel.background = element_rect(fill = ‘#efeae8’)) ggseasonplot(turnover)+ theme(panel.background = element_rect(fill = ‘#efeae8’)) ggsubseriesplot(turnover)+ theme(panel.background = element_rect(fill = ‘#efeae8’))+ ggtitle(“”) gglagplot(turnover)+ theme(panel.background = element_rect(fill = ‘#efeae8’))+ ggtitle(“”) ggAcf(turnover)+ theme(panel.background = element_rect(fill = ‘#efeae8’)) ggPacf(turnover)+ theme(panel.background = element_rect(fill = ‘#efeae8’))

str(hsales) autoplot(hsales, xlab=“year”)+ theme(panel.background = element_rect(fill = ‘#f9f8ef’)) ggseasonplot(hsales, xlab=“month”)+ theme(panel.background = element_rect(fill = ‘#f9f8ef’)) ggsubseriesplot(hsales, xlab=“month”)+ theme(panel.background = element_rect(fill = ‘#f9f8ef’))+ ggtitle(“”)

gglagplot(hsales, xlab=“hsales”)+ theme(panel.background = element_rect(fill = ‘#f9f8ef’))+ ggtitle(“”)

ggAcf(hsales, xlab=“lag”)+ theme(panel.background = element_rect(fill = ‘#f9f8ef’)) ggPacf(hsales, xlab=“lag”)+ theme(panel.background = element_rect(fill = ‘#f9f8ef’))

str(usdeaths) autoplot(usdeaths, xlab=“year”)+ theme(panel.background = element_rect(fill = ‘#f9f8ef’)) ggseasonplot(usdeaths, xlab=“month”)+ theme(panel.background = element_rect(fill = ‘#f9f8ef’)) ggsubseriesplot(usdeaths, xlab=“month”)+ theme(panel.background = element_rect(fill = ‘#f9f8ef’))+ ggtitle(“”)

gglagplot(usdeaths, xlab=“usdeaths”)+scale_x_continuous(breaks=c(7000,9000,11000),labels=c(“7000”,“9000”,“11000”))+ theme(axis.text.x = element_text(angle=-45),panel.background = element_rect(fill = ‘#f9f8ef’))+ ggtitle(“”)

ggAcf(usdeaths, xlab=“lag”)+ theme(panel.background = element_rect(fill = ‘#f9f8ef’)) ggPacf(usdeaths, xlab=“lag”)+ theme(panel.background = element_rect(fill = ‘#f9f8ef’))

str(bricksq) autoplot(bricksq, xlab=“year”)+ theme(panel.background = element_rect(fill = ‘#f9f8ef’)) ggseasonplot(bricksq, xlab=“year”)+ theme(panel.background = element_rect(fill = ‘#f9f8ef’)) ggsubseriesplot(bricksq, xlab=“year”)+ theme(panel.background = element_rect(fill = ‘#f9f8ef’))+ ggtitle(“”) gglagplot(bricksq, xlab=expression(y[“t-k”]), ylab=expression(y[“t”]))+ theme(panel.background = element_rect(fill = ‘#f9f8ef’),axis.title = element_text(size=14))+ ggtitle(“”)+scale_x_continuous(breaks=c(250,350,450,550),labels=c(‘250’,‘350’,‘450’,‘550’)) ggAcf(bricksq, xlab=“year”)+ theme(panel.background = element_rect(fill = ‘#f9f8ef’))

str(sunspotarea) autoplot(sunspotarea, xlab=“year”)+ theme(panel.background = element_rect(fill = ‘#f9f8ef’)) ggseasonplot(sunspotarea, xlab=“year”) ggsubseriesplot(sunspotarea, xlab=“year”) gglagplot(sunspotarea, xlab=expression(y[“t-k”]), ylab=expression(y[“t”]))+ theme(panel.background = element_rect(fill = ‘#f9f8ef’),legend.position=“none”,axis.title = element_text(size=14))+ ggtitle(“”) ggAcf(sunspotarea, xlab=“year”)+ theme(panel.background = element_rect(fill = ‘#f9f8ef’)) ggPacf(sunspotarea, xlab=“year”)+ theme(panel.background = element_rect(fill = ‘#f9f8ef’))

autoplot(gasoline, xlab=“year”)+ theme(panel.background = element_rect(fill = ‘#f9f8ef’)) ggseasonplot(gasoline, xlab=“week”)+ theme(panel.background = element_rect(fill = ‘#f9f8ef’)) ggsubseriesplot(gasoline, xlab=“year”)+ theme(panel.background = element_rect(fill = ‘#f9f8ef’))

gglagplot(gasoline, xlab=expression(y[“t-k”]), ylab=expression(y[“t”]))+ theme(panel.background = element_rect(fill = ‘#f9f8ef’),legend.position=“none”)+theme(axis.title = element_text(size=14))+ ggtitle(“”)

ggAcf(gasoline, xlab=“weeks lag”)+ theme(panel.background = element_rect(fill = ‘#f9f8ef’))