Manipulating Time Series Data in R

1. What is Time Series Data?
2. Manipulating Time Series with zoo
- Temporal attributes
- zoo
3. Indexing Time Series Objects
4. Rolling and Expanding Windows
- What is a rolling window?
- Expanding windows

This report is a summary of lesson by Harrison Brown, Data Camp

1. What is Time Series Data?

What is time series data?

AirPassengers

##      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1949 112 118 132 129 121 135 148 148 136 119 104 118
## 1950 115 126 141 135 125 149 170 170 158 133 114 140
## 1951 145 150 178 163 172 178 199 199 184 162 146 166
## 1952 171 180 193 181 183 218 230 242 209 191 172 194
## 1953 196 196 236 235 229 243 264 272 237 211 180 201
## 1954 204 188 235 227 234 264 302 293 259 229 203 229
## 1955 242 233 267 269 270 315 364 347 312 274 237 278
## 1956 284 277 317 313 318 374 413 405 355 306 271 306
## 1957 315 301 356 348 355 422 465 467 404 347 305 336
## 1958 340 318 362 348 363 435 491 505 404 359 310 337
## 1959 360 342 406 396 420 472 548 559 463 407 362 405
## 1960 417 391 419 461 472 535 622 606 508 461 390 432

summary(AirPassengers)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   104.0   180.0   265.5   280.3   360.5   622.0

Time series object:

work better with specialized tools
keep track of data and time
aid in smoother workflow

Ploting with autoplot

autoplot()

autoplot(zoo(maunaloa))

Temporal data classes

class() 함수로 object들의 속성 확인 가능

numeric
- integer, floating point
- August 9, 2022 = 19223
- Number of days since Jan.1, 1970
character
- String of text, names
- August 9, 2022 = "2022-08-09
- August 9, 2022 = "August 9, 2022"
Date
- Dates, days of the year
- August 9, 2022 = "2022-08-09"
- lubridate::as_date()
- can allow us to do math
POSIXct
- Dates and times, time zones
- August 9, 2022, 4:17 p.m = "2022-08-09 20:17:00 UTC"
- as.POSIXct()
- is.POSIXct()

Formatting dates

Order of time element

국가, 지역에 따라 time element 순서가 다를 수 있음

* U.S: 12/20/2022 * U.K: 20/12/2022 * Ambiguous most of the year: * 6/4/2010: June 4th or April 6th??

ISO 8601

Time elements arranged largest-to-smallest
- Year -> month -> day -> …
- 2022-06-04
- 2022-06-04 = June 4th, 2022
Time elements separated by specific characters
- Hyphens(-) between date elements
Ensure legibility and clarity
- 2022-06-04 vs. 20220604

Formatting dates and times

lubridate::parse_date_time()

dates_vector <- c("12/20/2022", "2022-12-21", "December 22, 2022")
dates_vector

## [1] "12/20/2022"        "2022-12-21"        "December 22, 2022"

parse_date_time(dates_vector,
                orders = c("%m/%d/%Y",
                           "%Y-%m-%d",
                           "%B %d, %Y"))

## [1] "2022-12-20 UTC" "2022-12-21 UTC" "2022-12-22 UTC"

2. Manipulating Time Series with zoo

Temporal attributes

Start point
- start()
End point
- end()
Frequency
- frequency()
\(\Delta t\): 1/Freq; 관찰이 이뤄지는 간격
- deltat()
time
cycle

start(AirPassengers)

## [1] 1949    1

# Decimal date
end(ftse)

## [1] 1998.646

end(ftse) %>% 
  lubridate::date_decimal()

## [1] "1998-08-24 20:18:27 UTC"

frequency(ftse)

## [1] 260

# weekly
frequency(maunaloa)

## [1] 52.17855

# delta t
deltat(ftse)

## [1] 0.003846154

Regular vs. irregular time series

Regular

Evenly-spaced intervals
No missing values
Uses decimal date for ‘irregular’ intervals

Irregular

Spacing can be irregular
- weekdays, random days
Missing observations
Decimal date or Date/POSIXct data

# Save the start point of maunaloa: maunaloa_start
maunaloa_start <- start(maunaloa)

# Assign the formatted date to start_iso
start_iso <- date_decimal(maunaloa_start)

# Convert to Date class
as_date(start_iso)

## [1] "1974-05-17"

zoo

zoo(x = ..., order.by = ...)
as.zoo: converting to zoo from ts
index()
coredata()
c(): when joining

Finding overlapping indices

# # Determine the overlapping indexes
# overlapping_index <-
#   index(coffee_overlap) %in% index(coffee)
# 
# # Create a subset of the elements which do not overlap
# coffee_subset <- coffee_overlap[!overlapping_index]
# 
# # Combine the coffee time series and the new subset
# coffee_combined <- c(coffee, coffee_subset)
# 
# autoplot(coffee_combined)

Converting between zoo and data frame

fortify.zoo(): from zoo to data frame

3. Indexing Time Series Objects

Subsetting a window of observations

Time series windows

Windows:

Subset of time series
Inherits frequency from parent time series
Defied by start and end point

Purpose of windows:

View a specified range of data
Focus in on years/events of interest
Ignore observations at the “edge” of the data
stats::window(x = ..., start = ..., end = ...)

Selecting a window from a time series

# Create a window from dow_jones
ftse_window <- window(ftse, start = "1995-01-01", end = "1997-01-01")

# Create an autoplot from the original dow_jones
autoplot(ftse) + 
  labs(y = "Price (USD)")

# Create an autoplot from dow_jones_window
autoplot(ftse_window) + 
  labs(y = "Price (USD)")

Logical expressions and subsets

# Complete the logical expression
subset <- index(maunaloa) >= "1990" &
          index(maunaloa) <= "2010"

# Extract the subset of maunaloa
maunaloa_subset <- maunaloa[subset]

# Autoplot the subsetted maunaloa dataset
autoplot(zoo(maunaloa_subset))

Monthly and quarterly data

Dates and aggregated data

Aggregation:

Monthly mean
Weekly maximum
Daily median
Shows general trend and patterns in the data

ex) Monthly data: Which data to use?

Data for January, 2003:
- 2003-01-01?
- 2003-01-31?
- 2003-01-15?
- 2003-01?
zoo::as.yearmon
zoo::as.yearqtr

laborday_2022 <- as_date("2022-09-05")
as.yearmon(laborday_2022)

## [1] "9 2022"

as.yearqtr(laborday_2022)

## [1] "2022 Q3"

as.yearqtr(2018.639)

## [1] "2018 Q3"

Resampling and aggregating observations

Smapling frequency

Frequency:

Number of observations per year
e.g., weekly, daily, monthly, …

Temporal resolution(해상도):

“High resolution” sampled often
“Low resolution” sampled infrequently
“High” and “Low” are subjective

Aggregation

High resolution -> Low resolution
Applies a function like mean, sum, max to the chosen interval
e.g.:
- Monthly sum of daily data
- Weekly mean of hourly values
Cannot ’reverse` aggregation
- Monthly total -> daily values? NO
Provides statistics to describe patterns in the data
Aggregation reduces information

xts

xts:

eXtensible Time Series
Extend the zoo package and zoo class of objects
apply.*(data = ..., FUN = ...) functions
endpoints(x = ..., on = ..., k = ...)
- on: “weeks”, “months”, “days”, …
- k: integer로 on에서 설정한 기간 단위
period.apply()

zoo_maunaloa <- zoo(maunaloa)
index(zoo_maunaloa) <- date_decimal(index(zoo_maunaloa))

autoplot(zoo_maunaloa)

# Aggregate to the monthly max and autoplot
monthly_max <- apply.monthly(zoo_maunaloa, FUN = max)
autoplot(monthly_max)

# Create the index from every third month
three_month_index <- endpoints(x = zoo_maunaloa,
                               on = "months",
                               k = 3)
# Apply the maximum to the time series using the index
three_month_max <- period.apply(x = zoo_maunaloa,
                                INDEX = three_month_index,
                                FUN = max)
# Autoplot with labels and theme
autoplot(three_month_max)

Imputing missing values

Imputing values with zoo

na. fucntion from zoo: * na.fill(object = ..., fill = ...): 단순히 fill 인수 값으로 대체 * na.locf(): 가장 최근 관찰값으로 대체 * na.approx(): 선형 보간 활용하여 대체

4. Rolling and Expanding Windows

What is a rolling window?

Measure of how statistics change as the data moves in time

Rolling with zoo

zoo::rollmean(x = ..., k = ..., align = ..., fill = ...)
- k: size of window
- align: alignment of window; “right”, “left”, “center”
- fill: values to fill-in outside of window
zoo::rollsum()
zoo::rollmax()
zoo::rollapply(data = ..., width = ..., FUN = ..., align = ..., fill = ...): 사용자정의함수 가능
- data: Time series object
- width: Width of window(k)
- FUN: Summary function

Expanding windows

base::seq_along() 으로 rollapply의 width 인수 생성

Expanding window inferences

Statistics approach global summaries
Expanding mean becomes less sensitive to change
Earlier observations are more sensitive to change