Implement an R function to generate a line graph depicting the trend of a time-series dataset, with separate lines for each group, utilizing ggplot2’s group aesthetic.
Introduction
This document demonstrates how to create a time-series line graph using the built-in AirPassengers dataset in R.
The dataset contains monthly airline passenger counts from 1949 to 1960.
We will convert the time-series object into a dataframe (because ggplot2 works best with dataframes).
We will visualize passenger trends over time using ggplot2.
We will draw separate lines for each year using the group aesthetic (and color for easy comparison).
Step 1: Load necessary libraries
We load:
ggplot2 to create the line plot.
dplyr for optional data handling (filtering, summarising, etc.).
tidyr for optional reshaping (not strictly required here, but commonly used in tidy workflows).
library(ggplot2)library(dplyr)
Warning: package 'dplyr' was built under R version 4.5.2
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(tidyr)
Step 2: Load the Built-in AirPassengers Dataset and Convert It to a Dataframe
Why Convert?
AirPassengers is a time-series (ts) object, not a dataframe.
ggplot2 expects data in a tabular structure, where:
each row is one observation
each column is one variable
So we build a dataframe with:
Date: a sequence of monthly dates from 1949-01 to 1960-12
Passengers: the numeric values from the time-series
Year: extracted from the date, used as the grouping variable (factor)
What This Code Is Doing (Line-by-Line)
seq(as.Date("1949-01-01"), by = "month", length.out = length(AirPassengers)) creates a monthly date sequence that matches the length of the time series.
as.numeric(AirPassengers) converts passenger counts into a plain numeric vector.
format(..., "%Y") extracts the year (like “1949”, “1950”, …).
as.factor(...) makes Year categorical, which is useful for grouping and coloring.
# Create a monthly date sequence that matches AirPassengers lengthdate_seq <-seq(as.Date("1949-01-01"),by ="month",length.out =length(AirPassengers))# Convert the time-series object into a dataframe for ggplot2data <-data.frame(Date = date_seq,Passengers =as.numeric(AirPassengers),Year =as.factor(format(date_seq, "%Y")))# Display first few rowshead(data, n =20)
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
temp=temp+geom_line(size =1.2)
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
temp
temp=temp+geom_point(size =2)temp
temp=temp+labs(title ='Time series data line and point graph',x ="Date",y ="Number of Passengers",color = data$Year )temp
t
temp=temp+theme_minimal()temp
temp=temp+theme(legend.position ="top")temp
Step 5: Call the Function to Generate the Plot
Here we use the function we created.
We pass:
"Date" as the time variable
"Passengers" as the values to plot
"Year" as the grouping variable
plot_time_series( data,"Date","Passengers","Year","Trend of Airline Passengers Over Time")
Discussion Questions
Why do we need group = Year if we already use color = Year?
What changes if we remove geom_point()?
What does the plot look like if we group by Month instead of Year (hint: you would need to create a Month column)?
Can you modify the function to allow changing line thickness and point size as extra parameters?