25-26 Program 3

Author

Manoj Kumar M V

Implement an R function to generate a line graph depicting the trend of a time-series dataset, with separate lines for each group, utilizing ggplot2’s group aesthetic.


Introduction

This document demonstrates how to create a time-series line graph using the built-in AirPassengers dataset in R.

  • The dataset contains monthly airline passenger counts from 1949 to 1960.
  • We will convert the time-series object into a dataframe (because ggplot2 works best with dataframes).
  • We will visualize passenger trends over time using ggplot2.
  • We will draw separate lines for each year using the group aesthetic (and color for easy comparison).

Step 1: Load necessary libraries

We load:

  • ggplot2 to create the line plot.
  • dplyr for optional data handling (filtering, summarising, etc.).
  • tidyr for optional reshaping (not strictly required here, but commonly used in tidy workflows).
library(ggplot2)
library(dplyr)
Warning: package 'dplyr' was built under R version 4.5.2

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(tidyr)

Step 2: Load the Built-in AirPassengers Dataset and Convert It to a Dataframe

Why Convert?

AirPassengers is a time-series (ts) object, not a dataframe.

ggplot2 expects data in a tabular structure, where:

  • each row is one observation
  • each column is one variable

So we build a dataframe with:

  • Date: a sequence of monthly dates from 1949-01 to 1960-12
  • Passengers: the numeric values from the time-series
  • Year: extracted from the date, used as the grouping variable (factor)

What This Code Is Doing (Line-by-Line)

  • seq(as.Date("1949-01-01"), by = "month", length.out = length(AirPassengers)) creates a monthly date sequence that matches the length of the time series.
  • as.numeric(AirPassengers) converts passenger counts into a plain numeric vector.
  • format(..., "%Y") extracts the year (like “1949”, “1950”, …).
  • as.factor(...) makes Year categorical, which is useful for grouping and coloring.
# Create a monthly date sequence that matches AirPassengers length
date_seq <- seq(
  as.Date("1949-01-01"),
  by = "month",
  length.out = length(AirPassengers)
)

# Convert the time-series object into a dataframe for ggplot2
data <- data.frame(
  Date = date_seq,
  Passengers = as.numeric(AirPassengers),
  Year = as.factor(format(date_seq, "%Y"))
)

# Display first few rows
head(data, n = 20)
         Date Passengers Year
1  1949-01-01        112 1949
2  1949-02-01        118 1949
3  1949-03-01        132 1949
4  1949-04-01        129 1949
5  1949-05-01        121 1949
6  1949-06-01        135 1949
7  1949-07-01        148 1949
8  1949-08-01        148 1949
9  1949-09-01        136 1949
10 1949-10-01        119 1949
11 1949-11-01        104 1949
12 1949-12-01        118 1949
13 1950-01-01        115 1950
14 1950-02-01        126 1950
15 1950-03-01        141 1950
16 1950-04-01        135 1950
17 1950-05-01        125 1950
18 1950-06-01        149 1950
19 1950-07-01        170 1950
20 1950-08-01        170 1950

Step 3: Understand the Data Structure We Created

Before plotting, it helps to confirm:

  • the types of columns (Date, numeric, factor)
  • the range of dates
  • how many months per year we have
str(data)
'data.frame':   144 obs. of  3 variables:
 $ Date      : Date, format: "1949-01-01" "1949-02-01" ...
 $ Passengers: num  112 118 132 129 121 135 148 148 136 119 ...
 $ Year      : Factor w/ 12 levels "1949","1950",..: 1 1 1 1 1 1 1 1 1 1 ...
# Check the earliest and latest dates
range(data$Date)
[1] "1949-01-01" "1960-12-01"
# How many months per year? (Should be 12 for each year)
table(data$Year)

1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 
  12   12   12   12   12   12   12   12   12   12   12   12 

Step 4: Define a Function to Create the Grouped Time-Series Line Graph

Why Use a Function?

A function helps us reuse the same plotting logic for other time-series datasets later.

Instead of rewriting the plot code again and again, we write it once and call it with different inputs.

What the Function Inputs Mean

  1. data : the dataframe with time-series values
  2. x_col : name of the time column (e.g., "Date")
  3. y_col : name of the numeric column (e.g., "Passengers")
  4. group_col : name of the grouping column (e.g., "Year")
  5. title : custom plot title

The Key Idea: group aesthetic

  • group = group_col tells ggplot2:
    “connect points only within the same group, not across different groups.”

If we do NOT group by year, ggplot2 would try to draw one continuous line through all points.

What Each ggplot Layer Does

  • geom_line(size = 1.2) draws the line for each group.
  • geom_point(size = 2) adds points to highlight individual months.
  • color = group_col makes each group visually distinct (different colors).
  • theme_minimal() makes the plot clean and readable.
  • theme(legend.position = "top") moves the legend to a clear location.
plot_time_series <- function(data, x_col, y_col, group_col, title = "Air Passenger Trends") {
  ggplot(
    data,
    aes_string(
      x = x_col,
      y = y_col,
      color = group_col,
      group = group_col
    )
  ) +
    geom_line(size = 1.2) +
    geom_point(size = 2) +
    labs(
      title = title,
      x = "Date",
      y = "Number of Passengers",
      color = "Year"
    ) +
    theme_minimal() +
    theme(legend.position = "top")
}

temp=ggplot(
    data,
    aes_string(
      x = data$Date,
      y = data$Passengers,
      color = data$Year,
      group = data$Year
    )
  ) 
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
temp=temp+geom_line(size = 1.2)
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
temp

temp=temp+geom_point(size = 2)
temp

temp=temp+labs(
      title = 'Time series data line and point graph',
      x = "Date",
      y = "Number of Passengers",
      color = data$Year
    )
temp

t

temp=temp+theme_minimal()
temp

temp=temp+theme(legend.position = "top")
temp

Step 5: Call the Function to Generate the Plot

Here we use the function we created.

We pass:

  • "Date" as the time variable
  • "Passengers" as the values to plot
  • "Year" as the grouping variable
plot_time_series(
  data,
  "Date",
  "Passengers",
  "Year",
  "Trend of Airline Passengers Over Time"
)


Discussion Questions

  1. Why do we need group = Year if we already use color = Year?
  2. What changes if we remove geom_point()?
  3. What does the plot look like if we group by Month instead of Year (hint: you would need to create a Month column)?
  4. Can you modify the function to allow changing line thickness and point size as extra parameters?