[Video]
Welcome to this practice course on manipulating time series data using xts & zoo! This course is designed to review the time series tools available in R. As with most components of data analysis, much of the work of time series analysis involves data cleaning and manipulation. This course builds directly on the key concepts covered in DataCamp’s introductory course on xts & zoo.
Before moving forward, lets go over some of the fundamentals of time series data.
Which of the following sets of information would be appropriate for time series data manipulation?
#View the structure of the flights data
str(flights)
## 'data.frame': 72 obs. of 5 variables:
## $ total_flights : int 8912 8418 9637 9363 9360 9502 9992 10173 9417 9762 ...
## $ delay_flights : int 1989 1918 2720 1312 1569 1955 2256 2108 1708 1897 ...
## $ cancel_flights: int 279 785 242 58 102 157 222 138 144 131 ...
## $ divert_flights: int 9 23 32 7 8 5 10 20 6 9 ...
## $ date : chr "2010-01-01" "2010-02-01" "2010-03-01" "2010-04-01" ...
#Examine the first five rows of the flights data
head(flights, n = 5)
## total_flights delay_flights cancel_flights divert_flights date
## 1 8912 1989 279 9 2010-01-01
## 2 8418 1918 785 23 2010-02-01
## 3 9637 2720 242 32 2010-03-01
## 4 9363 1312 58 7 2010-04-01
## 5 9360 1569 102 8 2010-05-01
#Identify class of the column containing date information
class(flights$date)
## [1] "character"
It looks like your flights data will need some modifications before it can be turned into an xts object. As you know, there are many different types of objects in R. xts objects are specifically designed for time series functionality.
Which of the following accurately describes the internal structure of an xts object?
# Load the xts package
library(xts)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
# Convert date column to a time-based class
flights$date <- as.Date(flights$date)
# Convert flights to an xts object using as.xts
flights_xts <- as.xts(flights[ , -5], order.by = flights$date)
# Check the class of flights
class(flights_xts)
## [1] "xts" "zoo"
# View the first five lines of flights
head(flights_xts, n = 5)
## total_flights delay_flights cancel_flights divert_flights
## 2010-01-01 8912 1989 279 9
## 2010-02-01 8418 1918 785 23
## 2010-03-01 9637 2720 242 32
## 2010-04-01 9363 1312 58 7
## 2010-05-01 9360 1569 102 8
[Video]
# Identify the periodicity of flights_xts
periodicity(flights_xts)
## Monthly periodicity from 2010-01-01 to 2015-12-01
# Identify the number of periods in flights_xts
nmonths(flights_xts)
## [1] 72
# Find data on flights arriving in BOS in June 2014
flights_xts["2014-06"]
## total_flights delay_flights cancel_flights divert_flights
## 2014-06-01 9662 2279 141 6
# Use plot.xts() to view total monthly flights into BOS over time
plot.xts(flights_xts$total_flights)
# Use plot.xts() to view monthly delayed flights into BOS over time
plot.xts(flights_xts$delay_flights)
# Use plot.zoo() to view all four columns of data in their own panels
# plot.zoo(flights_xts, plot.type = "multiple", ylab = labels)
# Use plot.zoo() to view all four columns of data in one panel
# plot.zoo(flights_xts, plot.type = "single", lty = lty)
# legend("right", lty = lty, legend = labels)
# Calculate percentage of flights delayed each month: pct_delay
flights_xts$pct_delay <- (flights_xts$delay_flights / flights_xts$total_flights) * 100
# Use plot.xts() to view pct_delay over time
plot.xts(flights_xts$pct_delay)
# Calculate percentage of flights cancelled each month: pct_cancel
flights_xts$pct_cancel <- (flights_xts$cancel_flights / flights_xts$total_flights) * 100
# Calculate percentage of flights diverted each month: pct_divert
flights_xts$pct_divert <- (flights_xts$divert_flights / flights_xts$total_flights) * 100
# Use plot.zoo() to view all three trends over time
plot.zoo(flights_xts[, c("pct_delay", "pct_cancel", "pct_divert")])
Wow! You’ve already extracted quite a bit of information from your flights_xts data. Visualizing time series data - and various values derived from these data - is a critical component of any time series analysis, whether you are interested in stock returns, user retention, or opinion polls.
On the right you can see a slightly cleaned version of the plot you generated in the previous exercise. Which of the following is a reasonable conclusion to draw from this plot?
Before drawing any conclusions, be sure to familiarize yourself with the different axis scales produced by plot.zoo(). For example, diverted flights are generally on a much smaller scale (0 - 0.4%) than delayed flights (0 - 30%).
[Video]
# Save your xts object to rds file using saveRDS
saveRDS(object = flights_xts, file = "flights_xts.rds")
# Read your flights_xts data from the rds file
flights_xts2 <- readRDS("flights_xts.rds")
# Check the class of your new flights_xts2 object
class(flights_xts2)
## [1] "xts" "zoo"
# Examine the first five rows of your new flights_xts2 object
head(flights_xts2, n = 5)
## total_flights delay_flights cancel_flights divert_flights pct_delay
## 2010-01-01 8912 1989 279 9 22.31822
## 2010-02-01 8418 1918 785 23 22.78451
## 2010-03-01 9637 2720 242 32 28.22455
## 2010-04-01 9363 1312 58 7 14.01260
## 2010-05-01 9360 1569 102 8 16.76282
## pct_cancel pct_divert
## 2010-01-01 3.1306104 0.10098743
## 2010-02-01 9.3252554 0.27322404
## 2010-03-01 2.5111549 0.33205354
## 2010-04-01 0.6194596 0.07476236
## 2010-05-01 1.0897436 0.08547009
# Export your xts object to a csv file using write.zoo
write.zoo(flights_xts, file = "flights_xts.csv", sep = ",")
# Open your saved object using read.zoo
flights2 <- read.zoo("flights_xts.csv", sep = ",", FUN = as.Date, header = TRUE, index.column = 1)
# Encode your new object back into xts
flights_xts2 <- as.xts(flights2)
# Examine the first five rows of your new flights_xts2 object
head(flights_xts2, n = 5)
## total_flights delay_flights cancel_flights divert_flights pct_delay
## 2010-01-01 8912 1989 279 9 22.31822
## 2010-02-01 8418 1918 785 23 22.78451
## 2010-03-01 9637 2720 242 32 28.22455
## 2010-04-01 9363 1312 58 7 14.01260
## 2010-05-01 9360 1569 102 8 16.76282
## pct_cancel pct_divert
## 2010-01-01 3.1306104 0.10098743
## 2010-02-01 9.3252554 0.27322404
## 2010-03-01 2.5111549 0.33205354
## 2010-04-01 0.6194596 0.07476236
## 2010-05-01 1.0897436 0.08547009
Michael is a hybrid thinker and doer—a byproduct of being a CliftonStrengths “Learner” over time. With 20+ years of engineering, design, and product experience, he helps organizations identify market needs, mobilize internal and external resources, and deliver delightful digital customer experiences that align with business goals. He has been entrusted with problem-solving for brands—ranging from Fortune 500 companies to early-stage startups to not-for-profit organizations.
Michael earned his BS in Computer Science from New York Institute of Technology and his MBA from the University of Maryland, College Park. He is also a candidate to receive his MS in Applied Analytics from Columbia University.
LinkedIn | Twitter | www.michaelmallari.com/data | www.columbia.edu/~mm5470