program 3

Author

AMITH JOSE

Quarto

Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.

Introduction

This document demonstrates how to create a time-series lines graph using the built-in AirPassengers dataset in R.

  • The dataset contains monthly airline passenger counts from 1949 to 1960.
  • We will convert the time-series object into a dataframe (because ggplot2 works best with dataframe).
  • We will visualize passenger trwnds over time using ggplot2.
  • We will draw separate lines for each year using group aesthetic (and color for easy comparison).

Step 1: Load necessary libraries

We load:

  • ggplot2 to create the line plot.
  • dplyr for optical data handling (filtering, summarising, etc).
  • tidyr for optional reshaping (not strictly required here, but commonly used in tidy workflows).
library(ggplot2)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(tidyr)

Step 2: Load the Built-in AirPassengers` Dataset and Convert It to a Dataframe

] ### Why Convert?

AirPassengers is a time-series (ts) object, not a dataframe.

ggplot2 expects data in a tabular structure, where:

  • each row is one observation
  • each column is one variable

So we build a dataframe with:

  • Data: a sequence of monthly dates from 1949-01 to 1960-12

  • Passengers: the numeric values from the time-series

  • Year: extracted from the date, used as the grouping variable (factor)

What This Code Is Doing (Line-by-Line)

  • seq(as.Date("1949-01-01"), by = "month", length.out = length(AirPassengers)) creates a monthly date sequence
# Create a monthly date sequence that matches AirPassengers length
date_seq <- seq(
  as.Date("1949-01-01"),
  by = "month",
  length.out = length(AirPassengers)
)

# Convert the time-series object into a dataframe for ggplot2
data <- data.frame(
  Date = date_seq,
  Passengers = as.numeric(AirPassengers),
  Year = as.factor(format(date_seq, "%Y"))
)

# Display first few rows
head(data, n=20)
         Date Passengers Year
1  1949-01-01        112 1949
2  1949-02-01        118 1949
3  1949-03-01        132 1949
4  1949-04-01        129 1949
5  1949-05-01        121 1949
6  1949-06-01        135 1949
7  1949-07-01        148 1949
8  1949-08-01        148 1949
9  1949-09-01        136 1949
10 1949-10-01        119 1949
11 1949-11-01        104 1949
12 1949-12-01        118 1949
13 1950-01-01        115 1950
14 1950-02-01        126 1950
15 1950-03-01        141 1950
16 1950-04-01        135 1950
17 1950-05-01        125 1950
18 1950-06-01        149 1950
19 1950-07-01        170 1950
20 1950-08-01        170 1950

Step 3: Understand the Data Structure We Created

Before plotting, it helps to confirm:

  • the types of columns (Data, numeric, factor)
  • the range of dates
  • how many months per year we have
str(data)
'data.frame':   144 obs. of  3 variables:
 $ Date      : Date, format: "1949-01-01" "1949-02-01" ...
 $ Passengers: num  112 118 132 129 121 135 148 148 136 119 ...
 $ Year      : Factor w/ 12 levels "1949","1950",..: 1 1 1 1 1 1 1 1 1 1 ...
# Check the earliest and latest dates
range(data$Date)
[1] "1949-01-01" "1960-12-01"
# How many months per year? (Should be 12 for each year)
table(data$Year)

1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 
  12   12   12   12   12   12   12   12   12   12   12   12 

Step 4: Define a Function to Create the Grouped Time-Series Line Graph

What the Function Inputs Mean

  1. data : the dataframe with time-series values
  2. x_col : name of the time column (e.g., "Date")
  3. y_col : name of the numeric column (e.g., "Passengers")
  4. group_col : name if the grouping column (e.g., "Year")
  5. title : custom plot title
plot_time_series <- function(data, x_col, y_col, group_col, title = "Air Passenger Trends") {
  ggplot(
    data,
    aes_string(
      x = x_col,
      y = y_col,
      color = group_col,
      group = group_col
    )
  ) +
    geom_line(size = 1.2) +
    geom_point(size = 2) +
    labs(
      title = title,
      x = "Date",
      y = "Number of Passengers",
      color = "Year"
    ) +
    theme_minimal() +
    theme(legend.position = "top")
}

Step 5: Call the Funtion to Generate the Plot

Here we use the function we created.

We pass:

  • "Date" as the time variable
  • "Passengers" as the values to plot
  • "Year" as the grouping variable
plot_time_series(
  data,
  "Date",
  "Passengers",
  "Year",
  "Trend of Airline Passengers Over Time"
)
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.