## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.0 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.3
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.0 ✔ feasts 0.3.2
## ✔ lubridate 1.9.2 ✔ fable 0.3.4
## ✔ ggplot2 3.5.1 ✔ fabletools 0.4.2
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
## starting httpd help server ... done
aus_production is quarterly data from 1956 to 2010.
pelt is yearly data from 1845 to 1935.
gafa_stock business day data when the Market is open from 2014 to
2018.
vic_elec is every 30 minutes data from 2012 to 2014.
## # A tsibble: 4 x 8 [!]
## # Key: Symbol [4]
## # Groups: Symbol [4]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2018-10-03 230. 233. 230. 232. 230. 28654800
## 2 AMZN 2018-09-04 2026. 2050. 2013 2040. 2040. 5721100
## 3 FB 2018-07-25 216. 219. 214. 218. 218. 58954200
## 4 GOOG 2018-07-26 1251 1270. 1249. 1268. 1268. 2405600
tute1 <- readr::read_csv("https://raw.githubusercontent.com/Naik-Khyati/data_624/main/hw1/tute1.csv")## Rows: 100 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): Sales, AdBudget, GDP
## date (1): Quarter
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
mytimeseries %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y")mytimeseries %>%
pivot_longer(-Quarter) %>%
ggplot(aes(x = Quarter, y = value, colour = name)) +
geom_line()When facet_grid() is not included in the ggplot code, the result is a single plot where all the time series data (for different variables represented by name) are plotted together in one chart.
## # A tsibble: 6 x 3 [1Y]
## # Key: state [1]
## year state y
## <int> <chr> <int>
## 1 1997 Alabama 324158
## 2 1998 Alabama 329134
## 3 1999 Alabama 337270
## 4 2000 Alabama 353614
## 5 2001 Alabama 332693
## 6 2002 Alabama 379343
ne_ts <- ts %>%
filter(state %in% c('Maine', 'Vermont', 'New Hampshire', 'Massachusetts', 'Connecticut', 'Rhode Island')) %>% mutate(y = y/1e3)
head(ne_ts)## # A tsibble: 6 x 3 [1Y]
## # Key: state [1]
## year state y
## <int> <chr> <dbl>
## 1 1997 Connecticut 145.
## 2 1998 Connecticut 131.
## 3 1999 Connecticut 152.
## 4 2000 Connecticut 160.
## 5 2001 Connecticut 146.
## 6 2002 Connecticut 178.
autoplot(ne_ts, y) +
labs(title = "Annual natural gas consumption by state",
subtitle = "New England",
y = "Consumption")library(readxl)
library(httr)
# URL of the raw Excel file
url <- "https://raw.githubusercontent.com/Naik-Khyati/data_624/main/hw1/tourism.xlsx"
# Temporary file to store the downloaded Excel
temp_file <- tempfile(fileext = ".xlsx")
# Download the file
GET(url, write_disk(temp_file, overwrite = TRUE))## Response [https://raw.githubusercontent.com/Naik-Khyati/data_624/main/hw1/tourism.xlsx]
## Date: 2024-09-08 16:51
## Status: 200
## Content-Type: application/octet-stream
## Size: 679 kB
## <ON DISK> C:\Users\User\AppData\Local\Temp\RtmpY1rpao\file52d435ad742a.xlsx
# Read the Excel file from the temporary location
tourism_data <- read_excel(temp_file, sheet = "Sheet1")
# View the first few rows of the dataset
head(tourism_data)## # A tibble: 6 × 5
## Quarter Region State Purpose Trips
## <chr> <chr> <chr> <chr> <dbl>
## 1 1998-01-01 Adelaide South Australia Business 135.
## 2 1998-04-01 Adelaide South Australia Business 110.
## 3 1998-07-01 Adelaide South Australia Business 166.
## 4 1998-10-01 Adelaide South Australia Business 127.
## 5 1999-01-01 Adelaide South Australia Business 137.
## 6 1999-04-01 Adelaide South Australia Business 200.
# Convert tourism_data to tsibble
tourism_ts <- tourism_data %>%
mutate(Quarter = yearquarter(Quarter)) %>%
as_tsibble(index = Quarter, key = c(Region, State, Purpose))
head(tourism_ts)## # A tsibble: 6 x 5 [1Q]
## # Key: Region, State, Purpose [1]
## Quarter Region State Purpose Trips
## <qtr> <chr> <chr> <chr> <dbl>
## 1 1998 Q1 Adelaide South Australia Business 135.
## 2 1998 Q2 Adelaide South Australia Business 110.
## 3 1998 Q3 Adelaide South Australia Business 166.
## 4 1998 Q4 Adelaide South Australia Business 127.
## 5 1999 Q1 Adelaide South Australia Business 137.
## 6 1999 Q2 Adelaide South Australia Business 200.
reg_pur_max_on_trips <- tourism_data %>%
group_by(Region, Purpose) %>%
summarise(Trip_Avg = mean(Trips)) %>%
filter(Trip_Avg == max(Trip_Avg)) %>%
arrange(desc(Trip_Avg))## `summarise()` has grouped output by 'Region'. You can override using the
## `.groups` argument.
## # A tibble: 6 × 3
## # Groups: Region [6]
## Region Purpose Trip_Avg
## <chr> <chr> <dbl>
## 1 Sydney Visiting 747.
## 2 Melbourne Visiting 619.
## 3 North Coast NSW Holiday 588.
## 4 Gold Coast Holiday 528.
## 5 South Coast Holiday 495.
## 6 Brisbane Visiting 493.
The combination of the Sydney region and the purpose of Visiting has the highest average number of overnight trips per quarter, with 747 trips.
reg_pur_tot_trips_state <- tourism_data %>%
mutate(Quarter = as.Date(Quarter)) %>%
group_by(State) %>%
summarise(total_trips = sum(Trips)) %>%
arrange(desc(total_trips))
head(reg_pur_tot_trips_state)## # A tibble: 6 × 2
## State total_trips
## <chr> <dbl>
## 1 New South Wales 557367.
## 2 Victoria 390463.
## 3 Queensland 386643.
## 4 Western Australia 147820.
## 5 South Australia 118151.
## 6 Tasmania 54137.
New South Wales, Victoria, and Queensland have a significant lead in total trips.
Can you spot any seasonality, cyclicity and trend? What do you learn about the series? What can you say about the seasonal patterns? Can you identify any unusual years?
us_employment %>%filter(Title == "Total Private") %>% gg_season(Employed) +
ggtitle("Seasonal Decomposition")us_employment %>%
filter(Title == "Total Private") %>%
gg_subseries(Employed) +
ggtitle("Subseries Plot")us_employment %>%
filter(Title == "Total Private") %>%
ACF(Employed) %>%
autoplot() +
ggtitle("Autocorrelation Function")The US Employment dataset shows a general upward trend in Total Private employment over the years, with a notable dip around 2008 that aligns with the housing bubble crash. The data exhibits a seasonal pattern, with employment increasing in the first half of the year, decreasing afterward, and then rising again. The lag plot indicates a strong positive correlation at all lags. For a clearer seasonal decomposition, adjusting the employment numbers by the factor of population growth could be beneficial.
For the AUS Production dataset, brick production lacks a clear trend but displays strong annual seasonality with a cyclical pattern. Production notably dropped in the early 1980s. The seasonal plot shows increases in Q1 and Q3, with a decline in Q4. The lag plot reveals consistent positive season-to-season correlation.
For the Pelts data set for Hare, I notice that there isn’t a distinct trend, but it is shown a potential seasonal pattern accompanied by some cyclic behavior.There seem to be sharp fluctuations in the number of traded Hare pelts through a few year periods, with a general decrease as the decade comes to an end. The lag plot illustrates a moderate positive correlation particularly in lag 1.
The Hare pelts dataset shows no clear trend but hints at seasonality and cyclic behavior. There are sharp fluctuations in traded pelts, with a general decline towards the end of the decade. The lag plot shows moderate positive correlation, especially at lag 1.
## Plot variable not specified, automatically selected `.vars = Barrels`
## Plot variable not specified, automatically selected `y = Barrels`
## Plot variable not specified, automatically selected `y = Barrels`
## Plot variable not specified, automatically selected `y = Barrels`
## Response variable not specified, automatically selected `var = Barrels`
The Gasoline Barrels series shows a positive trend with seasonal patterns but is quite noisy, with peaks and declines at specific times of the month. The lag plot reveals a positive correlation with some overplotting. No unusual years are evident, though overplotting may obscure such trends.