Code
library(tidyverse)
library(showtext)
library(sysfonts)
library(palmerpenguins) # For penguin data
library(car) # For salary data
# Adding Google Font "Poppins"
font_add_google("Poppins", "poppins")
showtext_auto()Data visualization is more than just aesthetics; it is a tool for storytelling. In this notebook, we utilize the power of ggplot2 to build professional-grade visualizations layer by layer.
We begin by loading the tidyverse suite and configuring professional typography using showtext.
library(tidyverse)
library(showtext)
library(sysfonts)
library(palmerpenguins) # For penguin data
library(car) # For salary data
# Adding Google Font "Poppins"
font_add_google("Poppins", "poppins")
showtext_auto()Objective: To visualize the distribution of a continuous variable.
iris %>%
ggplot(aes(Sepal.Length)) +
geom_histogram(fill = "steelblue", color = "white", bins = 20) +
theme_minimal(base_family = "poppins") +
labs(title = "Sepal Length Distribution", x = "Length", y = "Frequency")Explanation: geom_histogram() groups data into ‘bins’ to show frequency, helping identify skewness or normality in your data.
Objective: Comparing frequencies of different categorical groups.
gss_cat %>%
ggplot(aes(marital)) +
geom_bar(fill = "skyblue", color = "black") +
theme_minimal(base_family = "poppins") +
labs(title = "Marital Status Count", x = "Status", y = "Count")Objective: To identify the median, spread, and outliers within categories.
chickwts %>%
ggplot(aes(weight, feed, fill = feed)) +
geom_boxplot(alpha = 0.6) +
theme_minimal(base_size = 15, base_family = "poppins") +
labs(title = "Chicken Weight by Feed Type", x = "Weight", y = "Feed Type")Explanation: The central line represents the Median. Dots beyond the whiskers are Outliers, indicating values that deviate significantly from the rest of the group.
Objective: Observing fluctuations in data across a chronological timeline.
economics %>%
drop_na() %>%
ggplot(aes(date, psavert)) +
geom_line(color = "steelblue", size = 1) +
theme_minimal(base_size = 15, base_family = "poppins") +
labs(title = "Personal Savings Rate Over Time", x = "Year", y = "Savings Rate (%)")Objective: To visualize correlation and trend lines between two numerical variables.
penguins %>%
drop_na(body_mass_g, flipper_length_mm) %>%
ggplot(aes(flipper_length_mm, body_mass_g)) +
geom_point(aes(color = species), alpha = 0.7, size = 3) +
geom_smooth(method = "lm", color = "red", se = TRUE) +
facet_wrap(~species) +
theme_light(base_size = 15, base_family = "poppins") +
labs(title = "Flipper Length vs Body Mass", subtitle = "Linear Regression by Species")Objective: Creating a publication-ready visualization with polished axes and themes.
Salaries %>%
filter(salary < 220000) %>%
ggplot(aes(rank, salary, fill = sex)) +
geom_boxplot(alpha = 0.5) +
scale_y_continuous(labels = c("$50k", "$100k", "$150k", "$200k"),
breaks = c(50000, 100000, 150000, 200000)) +
scale_x_discrete(labels = c("AsstProf" = "Assistant\nProfessor",
"AssocProf" = "Associate\nProfessor",
"Prof" = "Professor")) +
theme_minimal(base_size = 15, base_family = "poppins") +
theme(legend.position = "top", axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Faculty Salary Analysis", fill = "Gender")| Chart Type | Best Use Case | Function |
|---|---|---|
| Histogram | Distribution of a single numerical variable. | geom_histogram() |
| Bar Chart | Frequency of categorical data. | geom_bar() |
| Scatter Plot | Relationship/Correlation between two numbers. | geom_point() |
| Boxplot | Distribution summary and outlier detection. | geom_boxplot() |
```