prg3

Author

saru

Implement an R funtion to generate a line graph depictiong the trend of a time-series dataset, with separate lines for each group , utilizing ggplot2’s group aesthetic.

Introduction

This document demonstrates how to create a time-series line graph using the built-in Air passengers dataset in R .

  • The dataset contains monthly airline passenger counts from 1949 to 1960 .

  • We will convert the time-series object into a dataframe (because ggplot2 works best with dataframes).

  • We will viualize passenger trends over time using ggplot2.

  • We will draw separate lines for each year using the group aesthetic (and color for each comparison).

Step 1 : Load necessary libraries

We load : - ggplot2 to create the line plot . - dplyr for optional data handling (filtering , summarising , etc .). - tidyr for optional reshaping (not strictly required here, but commonly used in tidy workflows).

library(ggplot2)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(tidyr)

Step 2: Load the built-in Air passengers dataset and convert it to a dataframe

Why Convert ?

AirPassengers is a time-series (ts) object , not a dataframe .

ggplot2 expects data in a tabular structure

So we build a dataframe with :

  • Date : a sequence of monthly dates from 1949 to 1960.
  • Passengers : the numeric values from the time-series
  • Year: extracted from the date , used as the groupvariable(factor)
  # Create a monthly data sequence that matches AirPassengers length
  date_seq <- seq (
    as.Date("1949-01-01"),
    by = "month",
    length.out = length(AirPassengers)
  )

  # Convert the time-series object into a dataframe for ggplot2
  data <- data.frame(
    Date = date_seq,
    Passengers = as.numeric(AirPassengers),
    Year = as.factor(format(date_seq , "%Y"))
  )
  
  # Display first few rows
  head(data , n=20)
         Date Passengers Year
1  1949-01-01        112 1949
2  1949-02-01        118 1949
3  1949-03-01        132 1949
4  1949-04-01        129 1949
5  1949-05-01        121 1949
6  1949-06-01        135 1949
7  1949-07-01        148 1949
8  1949-08-01        148 1949
9  1949-09-01        136 1949
10 1949-10-01        119 1949
11 1949-11-01        104 1949
12 1949-12-01        118 1949
13 1950-01-01        115 1950
14 1950-02-01        126 1950
15 1950-03-01        141 1950
16 1950-04-01        135 1950
17 1950-05-01        125 1950
18 1950-06-01        149 1950
19 1950-07-01        170 1950
20 1950-08-01        170 1950

Step 3 : struct , range and table

  • the types of columns (Date , numeric , factor)
  • the range of dates
  • how many months per year we have
str(data)
'data.frame':   144 obs. of  3 variables:
 $ Date      : Date, format: "1949-01-01" "1949-02-01" ...
 $ Passengers: num  112 118 132 129 121 135 148 148 136 119 ...
 $ Year      : Factor w/ 12 levels "1949","1950",..: 1 1 1 1 1 1 1 1 1 1 ...
# Check the earliest and latest dates
range(data$Date)
[1] "1949-01-01" "1960-12-01"
# How many months per year?
table(data$Year)

1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 
  12   12   12   12   12   12   12   12   12   12   12   12 

Step 4 : Define a Funtion to Create the grouped time-series line graph

What the function inputs mean

  1. data : the dataframe with time-series values
  2. x_col : name of the time column (e.g., "Date")
  3. y_col : name of the numeric column (e.g., "Passengers")
  4. group_col : name of the grouping column (e.g.,"Year")
  5. title : custom plot title .
plot_time_series <- function (data , x_col , y_col , group_col , title= "Air Passengers Trends") {
  ggplot(
    data,
    aes_string(
      x = x_col,
      y = y_col,
      color = group_col,
      group = group_col
    )
  )+
    geom_line(size = 1.2)+
    geom_point(size= 2)+
    labs(
      title = title,
      x = "Date",
      y = "Number of Passengers",
      color = "Year"
    )+
    theme_minimal()+
    theme(legend.position = "top")
}

Step 5 : funtion call

here we use the funtion we created :

we pass :

  • "Date" as the time variable.
  • "Passengers" as the values to plot.
  • "Year" as the grouping variable.
plot_time_series(
  data,
  "Date",
  "Passengers",
  "Year",
  "Trends of Air Passengers Over time"
)
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.