PROGRAM 3

Author

PRACHETAN MS

Implement an R function to generate a line graph depicting the trend of a time seriser data-set, with seperate lines for each group, utilizing ggplot2’s group aesthetic.

Introduction

The document demonstrates how to create a time series line graph using the built in airpassengers data set in R. - the data set contains montly airline passanger counts from 1949 to 1960. we will convert the time series object into a data frame. ( because ggplot2works best with dataframes). - we will visualize passenger trends overtime using ggplot2 - we will draw seperate line for each year using group aesthetic ( and color for easy comparision).

Step 1 : load necessary libraries

we load :

  • ggplot2 to create the line plot.
  • dplyr for optional data handling(filtering , summerising etc).
  • tidyr for optional reshaping (not stricly reqired here , but commonly used in tidy workflows).
library (ggplot2)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(tidyr)

Step 2 : load the built in airpassengers dataset and convert it to a data frame.

why convert?

airpassengers is a time-series (ts) object,dataframe. ggplot2 expects the data in a tabular structure ,where:

-each row is one observation. -each column is one variable.

so we build a dataframe with: -data:the sequence of monthly dates from 1949 - 01 2 1960-12 -passengers : the numerical values from the time-series. -year : Extracted from the date,used as the groups variable (factor).

#Create a monthly data sequence that matches a airpassengers  length
data_seq <- seq(
  as.Date("1949-01-01"),
  by="month",
  length.out = length(AirPassengers)
)
#Convert the time-series object into a dataframe for ggplot2
data <- data.frame(
  Date = data_seq,
  passengers = as.numeric(AirPassengers),
  year= as.factor(format(data_seq,"%Y"))
)
#Display first few rows 
head(data,n=20)
         Date passengers year
1  1949-01-01        112 1949
2  1949-02-01        118 1949
3  1949-03-01        132 1949
4  1949-04-01        129 1949
5  1949-05-01        121 1949
6  1949-06-01        135 1949
7  1949-07-01        148 1949
8  1949-08-01        148 1949
9  1949-09-01        136 1949
10 1949-10-01        119 1949
11 1949-11-01        104 1949
12 1949-12-01        118 1949
13 1950-01-01        115 1950
14 1950-02-01        126 1950
15 1950-03-01        141 1950
16 1950-04-01        135 1950
17 1950-05-01        125 1950
18 1950-06-01        149 1950
19 1950-07-01        170 1950
20 1950-08-01        170 1950

Step 3: Understand the data stucture we created

Before ploting , it helps to confirm :

  • The types of columns ( date, numeric,factor)
  • The range of dates
  • How many months per year we have
str(data)
'data.frame':   144 obs. of  3 variables:
 $ Date      : Date, format: "1949-01-01" "1949-02-01" ...
 $ passengers: num  112 118 132 129 121 135 148 148 136 119 ...
 $ year      : Factor w/ 12 levels "1949","1950",..: 1 1 1 1 1 1 1 1 1 1 ...
# Check the earliest and latest dates
range(data$Date)
[1] "1949-01-01" "1960-12-01"
# How many monthes per year ?( should be 12 for each year )

table(data$Year)
< table of extent 0 >

Step 4:Define a function to create the grouped time-series line graph

what the function input mean

  1. data: the dataframe with time-series values
  2. x_col: name of the time column(e.g., "Date")
  3. y_col: name of the numeric column(e.g., "Passengers")
  4. group_col: name of the grouping column(e.g., "Year")
  5. title: custom plot title
plot_time_series <- function(data,x_col,y_col,group_col,title = "Air Passengers Trends"){
  ggplot(
    data,
    aes_string(
      x = x_col,
      y = y_col,
      color = group_col,
      group = group_col
    )
  )+
    geom_line(size = 1.2)+
    geom_point(size = 2)+
    labs(
      title=title,
      x="Date",
      y="Number of Passengers",
      color = "Year"
      )+
    theme_minimal()+
    theme(legend.position="top")
}

Step 5: Call the function to generate the plot

here we use the function we created.

We pass:

  • "Date" as the time variable
  • "Passengers" as the values to plot
  • "Year" as te grouping variable
plot_time_series(
  data,
  "Date",
  "passengers",
  "year",
  "Trend of Airline Passengers Over Time"
)
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.