Write an R script to create a scatter plot, incorporating categorical analysis through color-coded data points representing different groups, using ggplot2.
Step 1: Load necessary libraries.
When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:
# Load necessary libraries
library (ggplot2)
library (dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Step 2: Load the Dataset.
Explaination:
The iris
dataset contains 150 samples of iris flowers categorized into three species: setosa, versicolor, and virginica .
Each sample has sepal and petal measurements .
head(data)
displays the first few rows.
# Load the iris dataset
data <- iris
# Display first few rows
head (data)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
Step 3: Create a Scatter Plot
X-Axis (Sepal.Length
)
Represents the length of the flower’s sepal.
Y-Axis (Sepal.Width
)
Represents the width of the flower’s sepal.
Color (Species
)
Differentiates three species using distinct colors
Customization
geom_point(size = 3, alpha = 0.7)
: Increases the size of points and makes them slightly transparent.
labs()
: Adds a title and axis labels.
theme_minimal()
: Uses a clean background for readability
theme(legend.position = "top")
: Moves the legend to the top.
# Create a scatter plot using ggplot2
ggplot (data, aes (x = Sepal.Length, y = Sepal.Width, color = Species)) +
geom_point (size = 3 , alpha = 0.7 ) + # Increase point size & transparency
labs (title = "Scatter Plot of Sepal Dimensions" ,
x = "Sepal Length" ,
y = "Sepal Width" ,
color = "Species" ) + # Legend title
theme_minimal () + # Clean layout
theme (legend.position = "top" ) # Move legend to the top