Program_3

Author

Anusha

Implement an R function to generate a line graph depicting the trend of a time-series data set,with separate lines for each group,utilizing ggplot2’s group aesthetic.

INTRODUCTION

this document demonstrates how to create a time-series line graph using the built-in airPassengers dataset in R. -the data set contains monthly airline passenger counts from 1949 to 1960. -We will convert the time-series object into a dataframe(becoz ggplot2 works best with data frames). - we will visualize passenger trends over time using ggplot2. -we will draw seperate lines for each year using the group aesthetic (and color for easy comparison).

STEP1 :Load necessary libraries

We load: - ggplot2 to create the line plot. -dplyr for optional data handling(filtering,summarising,etc.). -tidyr for optional reshaping (not strictly req here,but commonly used in tidy workflows).

library(ggplot2)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(tidyr)

STEP2 :Load the built-in air passengers dataset and convert it to a dataframe.

Why convert?

AirPassengers is a time-series (ts) object,not a data frame.

ggplot2 expects data in a tabular structure,where: -each row is one observation -each column is one variable

so we build a data frame with: - Date :a separate of monthly dates from 1949-01 to 1960-12. - Passengers :the numeric values from the time-series. -Year : extracted from the date,used as the grping variable (factor)

#Create a monthly date sequence that matches AirPassengers length
date_seq <- seq(
  as.Date("1949-01-01"),
  by="month",
  length.out=length(AirPassengers)
)

#convert the time-series object into a dataframe for ggplot2
data<-data.frame(
   Date=date_seq,
   Passengers=as.numeric(AirPassengers),
   Year=as.factor(format(date_seq,"%Y"))
)

#display first few rows
head(data,n=20)
         Date Passengers Year
1  1949-01-01        112 1949
2  1949-02-01        118 1949
3  1949-03-01        132 1949
4  1949-04-01        129 1949
5  1949-05-01        121 1949
6  1949-06-01        135 1949
7  1949-07-01        148 1949
8  1949-08-01        148 1949
9  1949-09-01        136 1949
10 1949-10-01        119 1949
11 1949-11-01        104 1949
12 1949-12-01        118 1949
13 1950-01-01        115 1950
14 1950-02-01        126 1950
15 1950-03-01        141 1950
16 1950-04-01        135 1950
17 1950-05-01        125 1950
18 1950-06-01        149 1950
19 1950-07-01        170 1950
20 1950-08-01        170 1950

STEP 3:Understand the data structure we created

Before plotting ,it helps to confirm: -the types of colums(Date, Numeric, factor) -the range f dates -how many months per year we have

str(data)
'data.frame':   144 obs. of  3 variables:
 $ Date      : Date, format: "1949-01-01" "1949-02-01" ...
 $ Passengers: num  112 118 132 129 121 135 148 148 136 119 ...
 $ Year      : Factor w/ 12 levels "1949","1950",..: 1 1 1 1 1 1 1 1 1 1 ...
# check the earliest and latest dates
range(data$Date)
[1] "1949-01-01" "1960-12-01"
# How many months per year?(Should be 12 for each year)
table(data$Year)

1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 
  12   12   12   12   12   12   12   12   12   12   12   12 

STEP 4:Define a function to create the grouped time-series line Graph.

Why Use a Function?

a func helps us resue the same plotting logic for other time-series datasets later. Instead of rewriting the plot code again nd agai,we write it once and call it with different inputs.

What the Function Inputs Mean

1.data: the dataframe with timeseries values. 2.x_col:name of the time column(eg:“Date”) 3.y_col:name of the numeric column(e.g:“Passengers”) 4.group_col:name of the grping column(e.g:“Year”) 5.title:custom plot title.

plot_time_series <- function(data,x_col,y_col,group_col,title ="Air Passenger Trends") {
  ggplot(
    data,
    aes_string(
      x=x_col,
      y=y_col,
      color=group_col,
      group=group_col
    )
  )+
    geom_line(size=1.2)+
    geom_point(size=2)+
    labs(
      title=title,
      x="Date",
      y="Number of Passengers",
      color= "Year"
    )+
    theme_minimal()+
    theme(legend.position="top")
}

STEP:5 Call the function to generate the plot

Here we use the fuction we created.

we pass:

-"Date" as the time variable -"Passengers" as the values to plot -"Year" as the grouping variable

plot_time_series(
  data,
  "Date",
  "Passengers",
  "Year",
  "Trend of Airline Passengers Over Time"
)
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.