Use the help function to explore what the series gafa_stock, PBS, vic_elec and pelt represent.Use autoplot() to plot some of the series in these data sets. What is the time interval of each series?
## [1] "Historical stock prices from 2014-2018 for Google, Amazon, Facebook and Apple. All prices are in $USD."
## [1] "PBS: maptools::SpatialLines2PolySet\t\tConvert sp line and polygon objects to PBSmapping PolySet objects pbapply::pbapply\t\tAdding Progress Bar to '*apply' Functions plm::pbsytest\t\tBera, Sosa-Escudero and Yoon Locally-Robust Lagrange Multiplier Tests for Panel Models and Joint Test by Baltagi and Liraster::readStart\t\tHelper functions for programming splines::predict.bSpline\t\tEvaluate a Spline at New Values of x"
## [1] "Hudson Bay Company trading records for Snowshoe Hare and Canadian Lynx furs from 1845 to 1935. This data contains trade records for all areas of the company."
## [1] "Half-hourly electricity demand for Victoria, Australia"
## [1] "The timeline for gafa_stock series on volume stock goes from 2014-01-02 to 2018-12-31"
## [1] "The PBS/Cost/Scripts didn't auotoplot but the timeline goes from July 1991 to June 2008"
## [1] "The autoplot is almost identical each year. The timeline for vic_elec series on power demand goes from 2012-01-01 AEDT to 2014-12-31 23:30:00 AEDT"
## [1] "The timeline for pel/Hare series goes from 1845 to 1935"
view(gafa_stock)
sum(is.na(gafa_stock))
## [1] 0
gafa_stock_close <- gafa_stock %>%
dplyr::select(Symbol,Date,Close) %>%
group_by(Symbol)%>%
filter(Close == max(Close)) %>%
arrange(desc(Close))
gafa_stock_close
## # A tsibble: 4 x 3 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Symbol Date Close
## <chr> <date> <dbl>
## 1 AMZN 2018-09-04 2040.
## 2 GOOG 2018-07-26 1268.
## 3 AAPL 2018-10-03 232.
## 4 FB 2018-07-25 218.
#%>%filter(complete.cases(.))
Download the file tute1.csv from the book website, open it in Excel (or some other spreadsheet application), and review its contents. You should find four columns of information. Columns B through D each contain a quarterly series, labelled Sales, AdBudget and GDP. Sales contains the quarterly sales for a small company over the period 1981-2005. AdBudget is the advertising budget and GDP is the gross domestic product. All series have been adjusted for inflation.
a-You can read the data into R with the following script:
tute1 <- readr::read_csv("tute1.csv")
##
## -- Column specification --------------------------------------------------------
## cols(
## Quarter = col_date(format = ""),
## Sales = col_double(),
## AdBudget = col_double(),
## GDP = col_double()
## )
View(tute1)
sum(is.na(tute1))
## [1] 0
b-Convert the data to time series
mytimeseries <- tute1 %>%
mutate(Quarter = yearmonth(Quarter)) %>%
as_tsibble(index = Quarter)
c-Construct time series plots of each of the three series
mytimeseries %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")
Check what happens when you don’t include facet_grid().
mytimeseries %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()
### The only difference is the groupinng. I think it would be redondant to use facet_grid if the the three variables have close range values.
The USgas package contains data on the demand for natural gas in the US.
a-Install the USgas package. b-Create a tsibble from us_total with year as the index and state as the key. c-Plot the annual natural gas consumption by state for the New England area (comprising the states of Maine, Vermont, New Hampshire, Massachusetts, Connecticut and Rhode Island).
view(us_total)
glimpse(us_total)
## Rows: 1,266
## Columns: 3
## $ year <int> 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007~
## $ state <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alabama", "Alabama"~
## $ y <int> 324158, 329134, 337270, 353614, 332693, 379343, 350345, 382367, ~
us_total1 <- us_total %>%
as_tibble(index = year, key = state)%>%
filter(state == 'Connecticut' | state == 'Maine' | state == 'Massachusetts' | state == 'New Hampshire' | state == 'Rhode Island' | state == 'Vermont')#%>%
head(us_total1)
## # A tibble: 6 x 3
## year state y
## <int> <chr> <int>
## 1 1997 Connecticut 144708
## 2 1998 Connecticut 131497
## 3 1999 Connecticut 152237
## 4 2000 Connecticut 159712
## 5 2001 Connecticut 146278
## 6 2002 Connecticut 177587
ggplot(data= us_total1, aes(x = year, y = y, col = state)) +
geom_line()+
labs(x='Year', y="Natural Gas Consumption", title='Annual Natural Gas Consumption by State Region: case of New England Region')
a-Download tourism.xlsx from the book website and read it into R using readxl::read_excel(). b-Create a tsibble which is identical to the tourism tsibble from the tsibble package.
tourism <- readxl::read_excel("tourism.xlsx")
View(tourism)
#colSums(is.na(tourism))%>% kable()
sum(is.na(tourism))
## [1] 0
# tourism1 <- tourism %>%
# mutate(Quarter = yearquarter(as.Date(tourism$Quarter)))%>%
# as_tsibble( index = Quarter, key = c(Region, State, Purpose))
tourism$Quarter <- yearquarter(as.Date(tourism$Quarter))
glimpse(tourism)
## Rows: 24,320
## Columns: 5
## $ Quarter <qtr> 1998 Q1, 1998 Q2, 1998 Q3, 1998 Q4, 1999 Q1, 1999 Q2, 1999 Q3,~
## $ Region <chr> "Adelaide", "Adelaide", "Adelaide", "Adelaide", "Adelaide", "A~
## $ State <chr> "South Australia", "South Australia", "South Australia", "Sout~
## $ Purpose <chr> "Business", "Business", "Business", "Business", "Business", "B~
## $ Trips <dbl> 135.0777, 109.9873, 166.0347, 127.1605, 137.4485, 199.9126, 16~
tourism1 <- tourism %>%
as_tsibble( index = Quarter, key = c(Region, State, Purpose))
c-Find what combination of Region and Purpose had the maximum number of overnight trips on average.
tourism2 <- select( tourism, Region,Purpose, Trips) #something wrong with select....I think I was calling tourism1
# tourism2c <- tourism2 %>%
# #dplyr::select(Region,Purpose, Trips) %>% #something wrong with select....
# group_by(Region,Purpose)%>%
# summarise(MeanTrip == mean(Trips)) %>%
# filter(MeanTrip == max(MeanTrip))%>%
# arrange(desc(MeanTrip))
# tourism2c
print("running into error running the above code, reason might be the datatype of Quarter, so I went back to adjust the datatype")
## [1] "running into error running the above code, reason might be the datatype of Quarter, so I went back to adjust the datatype"
#rlang::last_error()
#rlang::last_trace()
d-Create a new tsibble which combines the Purposes and Regions, and just has total trips by State.
tourism3 <- tourism %>%
#dplyr::select(Region,Purpose, Trips) %>%
group_by(Region, Purpose)
tourism1$RegionPurpose <- paste(tourism1$Region, tourism1$Purpose)
# tourism3d <- tourism3 %>%
# group_by(State) %>%
# mutate(SumTripByState == sum(Trips)) %>%
# as_tsibble( index = Quarter, key = c(Region, State, RPurpose))
# #arrange(desc(MeanTrip))
# tourism3d
print("running into error running the above code, reason might be the datatype")
## [1] "running into error running the above code, reason might be the datatype"
Monthly Australian retail data is provided in aus_retail. Select one of the time series as follows (but choose your own seed value):
Explore your chosen retail time series using the following functions:
Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
set.seed(53566)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
autoplot(myseries)
## Plot variable not specified, automatically selected `.vars = Turnover`
gg_season(myseries)
## Plot variable not specified, automatically selected `y = Turnover`
gg_subseries(myseries)
## Plot variable not specified, automatically selected `y = Turnover`
gg_lag(myseries)
## Plot variable not specified, automatically selected `y = Turnover`
myseries %>%
ACF(Turnover)%>%
autoplot()
print("autoplot() gives a global view for the overall timeline, the trend line shows turnover never stop going wild. gg_season() allow a zoom in on the timeline to observe the turnover progression by month. gg_subseries() and gg_lag() goes a little more deeper than other function should one interested in looking up particular trend.")
## [1] "autoplot() gives a global view for the overall timeline, the trend line shows turnover never stop going wild. gg_season() allow a zoom in on the timeline to observe the turnover progression by month. gg_subseries() and gg_lag() goes a little more deeper than other function should one interested in looking up particular trend."