# Load necessary libraries
library(ggplot2)
Program 04
Develop a script in R to produce a bar graph displaying the frequency distribution of categorical data in a given dataset, grouped by a specific variable, using ggplot2.
Step 1: Load the Dataset.
We use the built-in mtcars
dataset, which contains information about different car models.
# Load dataset
<- mtcars
data
# Display Data few rows
head(data)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Explanation
The
mtcars
dataset includes various car specifications.We will analyze the number of cylinders (
cyl
) and group by the number of gears (gear
).
Step 2: Convert Numeric Data to Categorical.
Since cyl
(number of cylinders) and gear
(number of gears) are numerical, we convert them into factors.
$cyl <- as.factor(data$cyl)
data$gear <- as.factor(data$gear) data
Why Convert to Factors?
ggplot2
treats factors as categories, making it easy to group and visualize.
Step 3: Create a Bar Graph
We now create a bar plot to show the frequency distribution of cyl
, grouped by gear
.
#Create a Bar graph
ggplot(data, aes (x = cyl, fill = gear)) +
geom_bar(position = "dodge") +
labs(title="Frequency of Cylinder grouped by Gear type",
x="Number of Cylinders",
y="Count",
fill="Gears") + # Legend title
theme_minimal()
Explanation of the Plot
X-Axis (cyl
)
- Displays cylinder categories (4, 6, 8 cylinders).
Y-Axis (Frequency Count)
- Represents the number of cars in each category.
Color Fill (gear
)
- Differentiates cars based on number of gears (3, 4, 5 gears).
Grouped Bars (position = "dodge"
)
- Ensures bars are side by side instead of stacked.
Minimal Theme (theme_minimal()
)
- Provides a clean and readable layout.