# Load necessary libraries
library(ggplot2)
library(ggridges)
LA-Presenetation
Presentation:
📘 LA-1: Age Distribution by Region
📌 Objective
The goal of this analysis is to visualize the distribution of age across various regions using density ridgeline plots. These plots provide an intuitive way to compare distributions, highlighting differences in central tendency, spread, and skewness across regions.
🔧 Step 1: Load Necessary Libraries
The ggplot2
package is used for creating static graphics, while ggridges
extends ggplot2
to create ridgeline plots, which are particularly useful for visualizing the distribution of a continuous variable across different categories.
🧪 Step 2: Generate Synthetic Data
set.seed(123)
<- data.frame(
data age = c(
rnorm(100, 30, 5), rnorm(100, 40, 7), rnorm(100, 50, 10),
rnorm(100, 60, 12), rnorm(100, 70, 13), rnorm(100, 80, 16),
rnorm(100, 90, 20)
),region = rep(
c("North", "South", "East", "West", "North-South", "South-West", "East-West"),
each = 100
) )
In this step, synthetic data is generated using the rnorm()
function, which creates random numbers from a normal distribution. The set.seed(123)
ensures reproducibility. The age
variable represents the age of individuals, and the region
variable denotes the geographical region.
📊 Step 3: Basic Ridgeline Plot
ggplot(data, aes(x = age, y = region, fill = region)) +
geom_density_ridges(alpha = 0.7) +
labs(title = "Age Distribution by Region", x = "Age", y = "Region")
Picking joint bandwidth of 4.05
This code creates a basic ridgeline plot where:
x = age
: The age variable is mapped to the x-axis.y = region
: The region variable is mapped to the y-axis.fill = region
: Each region is assigned a different fill color.geom_density_ridges(alpha = 0.7)
: Adds density ridgelines with a transparency level of 0.7.The plot provides a visual representation of how age distributions vary across different regions.
🎨 Step 4: Enhanced Visualization with Minimal Theme
ggplot(data, aes(x = age, y = region, fill = region)) +
geom_density_ridges(alpha = 0.7) +
labs(title = "Age Distribution by Region", x = "Age", y = "Region") +
theme_minimal()
Picking joint bandwidth of 4.05
Applying theme_minimal()
removes background grids and axes, focusing attention on the data. This enhances the clarity and aesthetic appeal of the plot.
📈 Interpretation of Results
The ridgeline plot reveals several key insights:
Central Tendency: The peak of each ridgeline indicates the mode of the age distribution for that region
Spread: The width of the ridgeline shows the variability in age within each region.
Skewness: The asymmetry of the ridgeline can indicate skewness in the age distribution.
For instance, regions like “North” and “South” might exhibit a younger population, while “East-West” and “North-South” could show older age distributions.
🛠️ Customization Options
The ggridges
package offers various parameters to customize the appearance of ridgeline plots:
scale
: Controls the vertical scaling of ridgelines. A value of 1 means ridgelines just touch the baseline of the next higher one.rel_min_height
: Sets a relative minimum height for the ridgelines. Values below this threshold are removed.alpha
: Adjusts the transparency of the ridgelines.fill
: Specifies the fill color of the ridgelinesFor more advanced customization, you can refer to the Introduction to ggridges vignette.