Program_08

Author

Pratik_S_K

Program-01

Step 1: Load necessary libaries.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggplot2)

Step 2: Load the dataset.

data <- mtcars 
data$cyl <- as.factor(data$cyl)

Step 3:

summary_data <- data %>%
  group_by(cyl) %>%
  summarise(avg_mpg = mean(mpg), .groups = 'drop')
print(summary_data)
# A tibble: 3 × 2
  cyl   avg_mpg
  <fct>   <dbl>
1 4        26.7
2 6        19.7
3 8        15.1

Step 4: Visualizing the findings.

ggplot(summary_data, aes(x = cyl, y=avg_mpg, fill = cyl)) + geom_bar(stat = "identity")+ labs(title = "Average MPG by cyclinder Count", 
          x= "Number of Cyclinders", 
          y= "Average MPG" ) + theme_minimal()

Program-02

Write an R script to create a scatter plot, incorporating categorical analysis through color-coded data points representing different groups, using ggplot2.

Step 1: Load necessary libraries.

library(ggplot2) 
library(dplyr) 

Step 2: Load the Dataset.

Explanation:

  • The iris dataset contains 150 samples of iris flowers categorized into three species: setosa, versicolor, and virginica.

  • Each sample has sepal and petal measurements.

  • head(data) displays the first few rows.

data <- iris 
head(data)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Step 3: Create a Scatter Plot.

X-Axis (Sepal.Length)

  • Represents the length of the flower’s sepal.

Y-Axis (Sepal.Width)

  • Represents the width of the flower’s sepal.

Color (Species)

  • Differentiates three species using distinct colors

Customization

  • geom_point(size = 3, alpha = 0.7): Increases the size of points and makes them slightly transparent.

  • labs(): Adds a title and axis labels.

  • theme_minimal(): Uses a clean background for readability

  • theme(legend.position = "top"): Moves the legend to the top.

    ggplot(data, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
      geom_point(size = 3, alpha = 0.7) +  
      labs(title = "Scatter Plot of Sepal Dimensions",
           x = "Sepal Length",
           y = "Sepal Width",
           color = "Species") + 
      theme_minimal() +  
      theme(legend.position = "top")  

Program-03

Implement an R function to generate a line graph depicting the trend of a time-series dataset, with separate lines for each group, utilizing ggplot2’s group aesthetic.

Introduction.

This document demonstrates how to create a time-series line graph using the built-in AirPassengers dataset in R.

The dataset contains monthly airline passenger counts from 1949 to 1960. We will use ggplot2 to visualize trends, with separate lines for each year.

Step 1: Load necessary libraries.

library(ggplot2) 
library(dplyr) 
library(tidyr)

Step 2 : Load the Built-in AirPassengers Dataset.

The AirPassengers dataset is a time series object in R.

We first convert it into a dataframe to use it with ggplot2.

  • Date: Represents the month and year (from January 1949 to December 1960).

  • Passengers: Monthly airline passenger counts.

  • Year: Extracted year from the date column, which will be used to group the data.

    data <- data.frame(   
      Date = seq(as.Date("1949-01-01"), by = "month", length.out = length(AirPassengers)),       Passengers = as.numeric(AirPassengers),   
      Year = as.factor(format(seq(as.Date("1949-01-01"), by = "month", length.out = length(AirPassengers)), "%Y"))
    ) 
    head(data, n=20)
             Date Passengers Year
    1  1949-01-01        112 1949
    2  1949-02-01        118 1949
    3  1949-03-01        132 1949
    4  1949-04-01        129 1949
    5  1949-05-01        121 1949
    6  1949-06-01        135 1949
    7  1949-07-01        148 1949
    8  1949-08-01        148 1949
    9  1949-09-01        136 1949
    10 1949-10-01        119 1949
    11 1949-11-01        104 1949
    12 1949-12-01        118 1949
    13 1950-01-01        115 1950
    14 1950-02-01        126 1950
    15 1950-03-01        141 1950
    16 1950-04-01        135 1950
    17 1950-05-01        125 1950
    18 1950-06-01        149 1950
    19 1950-07-01        170 1950
    20 1950-08-01        170 1950

Step 3: Define a Function for Time-Series Line Graph.

We define a function to create a time-series line graph where:

  • The x-axis represents time (Date).

  • The y-axis represents the number of passengers (Passengers).

  • Each year has a separate line to compare trends.

Function Inputs

  1. data – The dataset containing time-series data.

  2. x_col – The column representing time (Date).

  3. y_col – The column representing values (Passengers).

  4. group_col – The categorical variable for grouping (Year).

  5. title – Custom plot title.

Features of the Line Graph

  • Group-based Visualization:
  1. Each year has a distinct line color.

  2. The group aesthetic ensures lines are drawn separately for each year.

  • geom_line(size = 1.2)
  1. Adds a smooth line for trend analysis.
  • geom_point(size = 2)
  1. Highlights individual data points.
  • theme_minimal() & theme(legend.position = “top”)
  1. Enhances readability with a clean layout.

  2. Moves legend to the top for better visualization.

    plot_time_series <- function(data, x_col, y_col, group_col, title="Air Passenger Trends") {   ggplot(data, aes_string(x = x_col, y = y_col, color = group_col, group = group_col)) +     geom_line(size = 1.2) + geom_point(size = 2) +       
        labs(title = title,        
             x = "Year",         
             y = "Number of Passengers",        
             color = "Year") +      
        theme_minimal() + theme(legend.position = "top")
    } 
    plot_time_series(data, "Date", "Passengers", "Year", "Trend of Airline Passengers Over Time")
    Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
    ℹ Please use tidy evaluation idioms with `aes()`.
    ℹ See also `vignette("ggplot2-in-packages")` for more information.
    Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
    ℹ Please use `linewidth` instead.

Program-04

Develop a script in R to produce a bar graph displaying the frequency distribution of categorical data in a given dataset, grouped by a specific variable, using ggplot2.

library(ggplot2)

Step 1: Load the Datatset.

We use the built-in mtcars dataset, which contains information about different car models.

data <- mtcars 
head(data)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Explanation

  • The mtcars dataset includes various car specifications.

  • We will analyze the number of cylinders (cyl) and group by the number of gears (gear).

Step 2: Convert Numeric Data to Categorical

Since cyl (no.of cyclinders) and gear (no.of gears) are numerical, we convert theminto factors

data$cyl <- as.factor(data$cyl) 
data$gear <- as.factor(data$gear)

Why Convert to factors?

ggplot2 traets factors as categories, making it easy to group and visualize.

Step 3: Create a Bar Graph

We now create a bar plot to show the frequency distribution of cyl, grouped by gear.

ggplot(data, aes(x=cyl,fill=gear)) +   geom_bar(position = "dodge") +   
  labs(title = "Frequenncy of Cyclinders Grouped by Gear Type",   
       x = "number of cyclinder",    
       y = "count",        fill = "Gears") +  
  theme_minimal()   

Program-05

Implement an R program to create a histogram Illustrating the distribution of a continuous variable, with overlays of density curves for each group, using ggplot2.

Step 1: Load required Library.

library(ggplot2)

Step 2: Explore the inbuilt Dataset.

str(iris)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Step 3: Create Histogarm with Group_wise censity Curves.

Step 3.1: Initialize the ggplot with aesthetic mappings.

p <- ggplot(data = iris, aes(x = Petal.Length, fill = Species)) 
p

Step 3.2: Add Histogram Layer.

p <- p + geom_histogram(aes(y = ..density..),
                  alpha = 0.4, # Set transparency          
                  position = "identity",# Overlap histograms         
                  bins = 30)            # Number of bins
p
Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.

Step 3.3: Add Density Curve Layer.

p <- p + geom_density(aes(color = Species),   
                      size = 1.2) 
p

Step 3.4: Add Labels and Theme.

p <- p + labs( title = "Distribution of Petal Length with Group-wise Density Curves", 
               x = "Petal Length", 
               y = "Density")+ 
  theme_minimal() 
p 

Step 3.5: Display the Plot.

Program-06

Write an R script to construct a box plot showcasing the distribution of a continuous variable, grouped by a categorical variable, using ggplot2’s fill aesthetic.

Step 1: Load Required Library.

library(ggplot2)

Step 2: Explore the Inbuilt Dataset.

str(iris)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Step 3: Construct Box Plot with Grouping.

Step 3.1: Initialize ggplot with Aesthetic Mappings.

p <- ggplot(data = iris, aes(x = Species, y = Petal.Width, fill = Species))

Explanation:

  • x = Species: Grouping variable (categorical)

  • y = Petal.Width: Continuous variable to show distribution

  • fill = Species: Fill box colors by species

Step 3.2: Add Box Plot Layer.

p <- p + geom_boxplot()

Explanation:

  • geom_boxplot() creates box plots for each group

  • Automatically shows median, quartiles, and outliers

Step 3.3: Add Labels and Theme.

p <- p + labs(title = "Box Plot of Petal Width by Species",            
              x = "Species",      
              y = "Petal Width") +      
          theme_minimal()

Explanation:

  • labs() adds a descriptive title and axis labels

  • theme_minimal() gives a clean, modern look

Step 3.4: Display the Plot.

p

Program-07

jStep 1: Load Required Library.

 library(ggplot2)

Step 2: Create data for the functions.

 x <- seq(-2 * pi, 2 * pi, length.out = 500) 
y1 <- sin(x) 
y2 <- cos(x) 
df <- data.frame( 
  x = c(x, x), 
  y = c(y1, y2), 
  group = rep(c("sin(x)", "cos(x)"), each = length(x)) 
)

Step-3: Createa Function to construct a graph

Step-3.1: Initialize the ggplot Object.

p <- ggplot(df, aes(x = x, y = y, color = group, linetype = group))

Skp-3.2: Add the line Geometry.

 p <- p + geom_line(size = 1.2)

Step 3: 3: Add plot Labels.

 p <- p + labs( 
    title = "Function Curves: sin(x) and cos(x)", 
    x = "x", 
    y = "f(x)", 
    color = "Function", 
    linetype = "Function" 
) 
 p

Step 3.4: Apply a near theme to Display Comaph.

 p <- p + theme_minimal()
 p