Ch. 1 - Flight Data

Review xts fundamentals

[Video]

Identify the time series

Welcome to this practice course on manipulating time series data using xts & zoo! This course is designed to review the time series tools available in R. As with most components of data analysis, much of the work of time series analysis involves data cleaning and manipulation. This course builds directly on the key concepts covered in DataCamp’s introductory course on xts & zoo.

Before moving forward, lets go over some of the fundamentals of time series data.

Which of the following sets of information would be appropriate for time series data manipulation?

  • The GDP of OECD countries last year.
  • [*] Annual snowfall in Chicago between 1950 and 2000.
  • The average lifespan of different mammals.
  • The current population of each U.S. state.

Flight data

#View the structure of the flights data
str(flights)
## 'data.frame':    72 obs. of  5 variables:
##  $ total_flights : int  8912 8418 9637 9363 9360 9502 9992 10173 9417 9762 ...
##  $ delay_flights : int  1989 1918 2720 1312 1569 1955 2256 2108 1708 1897 ...
##  $ cancel_flights: int  279 785 242 58 102 157 222 138 144 131 ...
##  $ divert_flights: int  9 23 32 7 8 5 10 20 6 9 ...
##  $ date          : chr  "2010-01-01" "2010-02-01" "2010-03-01" "2010-04-01" ...
#Examine the first five rows of the flights data
head(flights, n = 5)
##   total_flights delay_flights cancel_flights divert_flights       date
## 1          8912          1989            279              9 2010-01-01
## 2          8418          1918            785             23 2010-02-01
## 3          9637          2720            242             32 2010-03-01
## 4          9363          1312             58              7 2010-04-01
## 5          9360          1569            102              8 2010-05-01
#Identify class of the column containing date information
class(flights$date)
## [1] "character"

Pick out the xts object

It looks like your flights data will need some modifications before it can be turned into an xts object. As you know, there are many different types of objects in R. xts objects are specifically designed for time series functionality.

Which of the following accurately describes the internal structure of an xts object?

  • A simple matrix.
  • A data frame in which one column is a time-based object.
  • A vector of time information.
  • [*] A matrix indexed on a time-based object.

Encoding your flight data

# Load the xts package
library(xts)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
# Convert date column to a time-based class
flights$date <- as.Date(flights$date)

# Convert flights to an xts object using as.xts
flights_xts <- as.xts(flights[ , -5], order.by = flights$date)

# Check the class of flights
class(flights_xts)
## [1] "xts" "zoo"
# View the first five lines of flights
head(flights_xts, n = 5)
##            total_flights delay_flights cancel_flights divert_flights
## 2010-01-01          8912          1989            279              9
## 2010-02-01          8418          1918            785             23
## 2010-03-01          9637          2720            242             32
## 2010-04-01          9363          1312             58              7
## 2010-05-01          9360          1569            102              8

Manipulating and visualizing your data

[Video]

Exploring your flight data

# Identify the periodicity of flights_xts
periodicity(flights_xts)
## Monthly periodicity from 2010-01-01 to 2015-12-01
# Identify the number of periods in flights_xts
nmonths(flights_xts)
## [1] 72
# Find data on flights arriving in BOS in June 2014
flights_xts["2014-06"]
##            total_flights delay_flights cancel_flights divert_flights
## 2014-06-01          9662          2279            141              6

Visualize flight data

# Use plot.xts() to view total monthly flights into BOS over time
plot.xts(flights_xts$total_flights)

# Use plot.xts() to view monthly delayed flights into BOS over time
plot.xts(flights_xts$delay_flights)

# Use plot.zoo() to view all four columns of data in their own panels
# plot.zoo(flights_xts, plot.type = "multiple", ylab = labels)

# Use plot.zoo() to view all four columns of data in one panel
# plot.zoo(flights_xts, plot.type = "single", lty = lty)
# legend("right", lty = lty, legend = labels)

Saving and exporting xts objects

[Video]

Saving time - I

# Save your xts object to rds file using saveRDS
saveRDS(object = flights_xts, file = "flights_xts.rds")

# Read your flights_xts data from the rds file
flights_xts2 <- readRDS("flights_xts.rds")

# Check the class of your new flights_xts2 object
class(flights_xts2)
## [1] "xts" "zoo"
# Examine the first five rows of your new flights_xts2 object
head(flights_xts2, n = 5)
##            total_flights delay_flights cancel_flights divert_flights pct_delay
## 2010-01-01          8912          1989            279              9  22.31822
## 2010-02-01          8418          1918            785             23  22.78451
## 2010-03-01          9637          2720            242             32  28.22455
## 2010-04-01          9363          1312             58              7  14.01260
## 2010-05-01          9360          1569            102              8  16.76282
##            pct_cancel pct_divert
## 2010-01-01  3.1306104 0.10098743
## 2010-02-01  9.3252554 0.27322404
## 2010-03-01  2.5111549 0.33205354
## 2010-04-01  0.6194596 0.07476236
## 2010-05-01  1.0897436 0.08547009

Saving time - II

# Export your xts object to a csv file using write.zoo
write.zoo(flights_xts, file = "flights_xts.csv", sep = ",")

# Open your saved object using read.zoo
flights2 <- read.zoo("flights_xts.csv", sep = ",", FUN = as.Date, header = TRUE, index.column = 1)

# Encode your new object back into xts
flights_xts2 <- as.xts(flights2)

# Examine the first five rows of your new flights_xts2 object
head(flights_xts2, n = 5)
##            total_flights delay_flights cancel_flights divert_flights pct_delay
## 2010-01-01          8912          1989            279              9  22.31822
## 2010-02-01          8418          1918            785             23  22.78451
## 2010-03-01          9637          2720            242             32  28.22455
## 2010-04-01          9363          1312             58              7  14.01260
## 2010-05-01          9360          1569            102              8  16.76282
##            pct_cancel pct_divert
## 2010-01-01  3.1306104 0.10098743
## 2010-02-01  9.3252554 0.27322404
## 2010-03-01  2.5111549 0.33205354
## 2010-04-01  0.6194596 0.07476236
## 2010-05-01  1.0897436 0.08547009

Ch. 2 - Weather Data

Merging time series data by row

Exploring temperature data

Next steps - I

Merging using rbind()

Visualizing Boston winters

Merging time series data by column

Subsetting and adjusting periodicity

Generating a monthly average

Using merge() and plotting over time

Time series data workflow

Next steps - II

Expanding your data

Ch. 3 - Economic Data

Handling missingness

Exploring economic data

Replace missing data - I

Replace missing data - II

Estimating missing GDP

Lagging and differencing

Exploring unemployment data

Lagging unemployment

Differencing unemployment

Rolling functions

Add a discrete rolling sum to GDP data

Add a continuous rolling average to unemployment data

Manipulating MA unemployment data


Ch. 4 - Sports Data

Advanced features of xts

Encoding and plotting Red Sox data

Calculate a closing average

Calculate and plot a seasonal average

Calculate and plot a rolling average

Indexing commands in xts

Extract weekend games

Calculate a rolling average across all sports

Congratulations


About Michael Mallari

Michael is a hybrid thinker and doer—a byproduct of being a CliftonStrengths “Learner” over time. With 20+ years of engineering, design, and product experience, he helps organizations identify market needs, mobilize internal and external resources, and deliver delightful digital customer experiences that align with business goals. He has been entrusted with problem-solving for brands—ranging from Fortune 500 companies to early-stage startups to not-for-profit organizations.

Michael earned his BS in Computer Science from New York Institute of Technology and his MBA from the University of Maryland, College Park. He is also a candidate to receive his MS in Applied Analytics from Columbia University.

LinkedIn | Twitter | www.michaelmallari.com/data | www.columbia.edu/~mm5470