Program_3

Author

Asha

Implement an R function to generate a line graph depicting the trend of a time-series dataset, with separate lines for each group, utilizing ggplot2’s group aesthetic.

Introduction

This document demonstrates how to create a time-series line graph using the built-in AirPassengers dataset in R.

  • The dataset contains monthly airline passenger counts from 1949 to 1960.
  • We will convert the time-series object into a dataframe(because ggplot2 works best with dataframes).
  • We will visualize passenger trends over time using ggplot2.
  • We will draw separate lines for each year using group aesthetic(and color foe easy comparison).

Step 1: Load necessary libraries

We load:

-ggplot2 to create the line plot. - dplyr for optional data handling(filtering,summarising,etc). - tidyr for optional reshaping(not strictly required here, but commonly used in tidy workflows).

  library(ggplot2)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(tidyr)

step 2: Load the built-in AirPassengers Dataset and convert it to a Dataframe.

Why Convert?

AirPassengers is a time-series(ts)object, not a dataframe. ggplot2 expects data in a tabular structure,where:

  • each row is one observation
  • each column is one variable

So we build a dataframe with:

  • Date a sequence of monthly dates from 1949-01 to 1960-12.
  • Passengers: te numeric values from the time-series.
  • Year:extracted from the date,used as the grouping variable(factor).
# create a monthly date sequence that matches AirPassengers length
date_seq <- seq(
  as.Date("1949-01-01"),
  by="month",
  length.out=length(AirPassengers)
)

# Convert the time-series object into a dataframe for ggplot2
data <- data.frame(
  Date=date_seq,
  Passengers= as.numeric(AirPassengers),
  Year= as.factor(format(date_seq,"%Y"))
)

# Display first few rows
head(data,n=20)
         Date Passengers Year
1  1949-01-01        112 1949
2  1949-02-01        118 1949
3  1949-03-01        132 1949
4  1949-04-01        129 1949
5  1949-05-01        121 1949
6  1949-06-01        135 1949
7  1949-07-01        148 1949
8  1949-08-01        148 1949
9  1949-09-01        136 1949
10 1949-10-01        119 1949
11 1949-11-01        104 1949
12 1949-12-01        118 1949
13 1950-01-01        115 1950
14 1950-02-01        126 1950
15 1950-03-01        141 1950
16 1950-04-01        135 1950
17 1950-05-01        125 1950
18 1950-06-01        149 1950
19 1950-07-01        170 1950
20 1950-08-01        170 1950

step 3: Understand the data structure we created

Before plotting, it helps to confirm:

  • the types of columns(Date,numeric,factor)
  • the range of dates
  • how many months per year we have
 str(data)
'data.frame':   144 obs. of  3 variables:
 $ Date      : Date, format: "1949-01-01" "1949-02-01" ...
 $ Passengers: num  112 118 132 129 121 135 148 148 136 119 ...
 $ Year      : Factor w/ 12 levels "1949","1950",..: 1 1 1 1 1 1 1 1 1 1 ...
# Check the earlist and latest dates
range(data$Date)
[1] "1949-01-01" "1960-12-01"
# How many months per year?(Should be 12 for each year)
table(data$Year)

1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 
  12   12   12   12   12   12   12   12   12   12   12   12 

Step 4: Define a function to create the grouped time-series line graph

What the function inputs mean

  1. data : the dataframe with time-series values.
  2. x_col : name of the time column(e.g,"Date").
  3. y_col: name of the numeric column(e.g, "Passengers").
  4. group_col : name of the grouping column(e.g, "Year"). 5.title : custom plot title.
plot_time_series <- function(data,x_col,y_col,group_col,title ="Air Passenger Trends") {
  ggplot(
    data,
    aes_string(
      x=x_col,
      y=y_col,
      color=group_col,
      group=group_col
    )
  )+
    geom_line(size=1.2)+
    geom_point(size=2)+
    labs(
      title=title,
      x="Date",
      y="Number of Passengers",
      color= "Year"
    )+
    theme_minimal()+
    theme(legend.position="top")
}

Step 5: call the function to generate the plot

Here we use the function we created.

We pass:

  • "Date" as the time variable.
  • "Passengers" as the values to plot.
  • "Year" as the grouping variable.
plot_time_series(
  data,
  "Date",
  "Passengers",
  "Year",
  "Trend of Airline Passengers over time"
)
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.