prog2

Author

Saru

Step 1 : Load Libraries

We load two libraries :

  • ggplot2 is used to plots using layer-by-layer(We will we it to create the scatter plot).

  • dplyr provides functions for exploring and summarizing data (We will use it to understand the categories in the dataset).

library(ggplot2)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Step 2 : Load the (iris)

We use the built-in dataset iris.

What this dataset contains : - Each row is one flower sample() . - There are 150 total observations. - The column Species is a categorical variable with 3 groups : - setosa - versicolor - virginica - The columns sepal.Length and sepal.Width are numeric measurements that we will plot.

 data <- iris
##Step 3 : Preview the dataset (see the first few rows)
::: {.cell}
{.r .cell-code} head(data, 10)
::: {.cell-output .cell-output-stdout}
Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa 7 4.6 3.4 1.4 0.3 setosa 8 5.0 3.4 1.5 0.2 setosa 9 4.4 2.9 1.4 0.2 setosa 10 4.9 3.1 1.5 0.1 setosa
::: :::

Step 4 : Repeatation

 table(data$Species)

    setosa versicolor  virginica 
        50         50         50 

Step 5 : Create a basic scatter plot

A scatter plot shows the relationship between two numeric variables.

Here we plot : - x-axis : sepal.Length - y-axis : sepal.Width

IMP point : Each dot represents one flower (one row in the dataset).

 ggplot(data , aes(x=Sepal.Length , y=Sepal.Width)) +
  geom_point()


Step 6 : Add categorical grouping using color=species

Now we include the categorical variable : - color=Species tells ggplot2 to assign a different color to each species.

ggplot(data,aes(x=Sepal.Length , y=Sepal.Width, colour = Species))+
  geom_point()


Step 7 : Improve point visibility (size and transparency)

We adjust how points look: - size = 3 makes each dot bigger , so it easier to see . - alpha = 0.7 makes dots slightly transparent , which helps when points overlap.

Why transparency helps : - If many points overlap in the same region , transparency make dense areas more visible .

ggplot(data,aes(x=Sepal.Length , y=Sepal.Width , colour = Species))+
  geom_point(size=3 , alpha=0.7)


Step 8 : Add informative labels ( title , axes , legend )

Good plots should clearly communicate what the viewer is seeing . labs() adds : - title for the plot heading . - x and y are axis labels. - color legend title (so the legend has a meaningful name).

ggplot(data,aes(x=Sepal.Length , y=Sepal.Width , colour = Species))+
  geom_point(size=3 , alpha=0.7)

 labs(
    title = "Scatter point of sepal dimensions" ,
    x = "Sepal Length",
    y = "Sepal Width",
    color = "Species"
  )
<ggplot2::labels> List of 4
 $ x     : chr "Sepal Length"
 $ y     : chr "Sepal Width"
 $ colour: chr "Species"
 $ title : chr "Scatter point of sepal dimensions"

Step 9 : Apply a clear theme and move the legend

Themes control the background , grids and text styling .

  • theme_minimal() removes heavy backgrounds and gives a clean look.
  • theme(legend.position = "top") moves the legend above the plot.

Why move the legend ?

  • When the legend is at the top , it is often easier to notice and read , especially in presentations .
ggplot(data,aes(x=Sepal.Length , y=Sepal.Width , colour = Species))+
  geom_point(size=3 , alpha=0.7)

 labs(
    title = "Scatter point of sepal dimensions" ,
    x = "Sepal Length",
    y = "Sepal Width",
    color = "Species"
 )+
   theme_minimal()+
   theme(legend.position = "top")
NULL