In Base R, plotting is additive. You start with a basic command and then keep adding “layers” using additional functions.
Phase 1: Creating the Foundation (plot)
Every visualization starts with the plot() function. This creates the axes and the canvas.
The simplest form: plot(x, y).
Adding context so others can understand the graph.
Phase 2: Adding Structural Layers (abline & grid)
Once the canvas is ready, we add layers to help interpret the data.
We use abline() to add straight lines (like regression lines).
Phase 3: Fine-Tuning (text & legend)
This is the “decoration” phase where we make the plot self-explanatory.
Use text() to label specific data points.
Crucial for plots with multiple groups.
Pro-Tip for Learners: The par() function To see multiple plots at once (systemic comparison), use par(mfrow = c(rows, cols)).
Phase 4: Single Variable Plots (Distribution)
These are used to understand the “shape” of your data—where most values lie and if there are outliers.
Best for seeing the frequency of continuous data.
A smooth version of a histogram that shows the probability distribution.
Phase 5: Two Variable Plots (Relationships)
These plots help you see how one variable changes in relation to another.
The gold standard for two continuous variables.
Essential for time-series or ordered sequences.
Phase 6: Multivariate Plots (Multiple Variables)
When you have more than two variables and want to see complex patterns.
Visualizes every possible relationship in a dataset at once.
Phase 7: Special & Proportional Plots
Used for specific statistical needs or part-to-whole relationships.
Used to show proportions (use sparingly!).
A 1D scatter plot, useful for small datasets.
Phase 8: Statistical Comparison & Distribution Density
These plots are specifically designed to compare groups and handle “Overplotting” (when you have too many points on top of each other).
When multiple data points fall on the same exact coordinate, a sunflower plot adds “petals” to show the density.
Used to check if a dataset follows a normal distribution. If the points fall on the line, the data is “Normal.”
Phase 9: Mathematical & Function Plots
R can be used as a graphing calculator to visualize mathematical formulas.
Visualizing a mathematical function over a range.
Phase 10: Composition & Layouts (The Professional View)
In professional reports, you often need to show multiple perspectives of the same data side-by-side.
Use par(mfrow = c(rows, cols)) to divide the plotting area.
Code
# Setting the stage: 2 rows, 2 columns
par(mfrow = c(2, 2))
# 1. Histogram
hist(mtcars$hp, col = "gold", main = "Engine Power")
# 2. Boxplot
boxplot(mtcars$hp, col = "tomato", main = "Power Outliers")
# 3. Density
plot(density(mtcars$hp), main = "Power Density")
# 4. Scatter
plot(mtcars$hp, mtcars$mpg, pch = 16, main = "Power vs MPG")The layout() function allows for asymmetrical grids (e.g., one large plot on top, two small ones below).
Code
Phase 11: Multi-Layered Visuals (The COnditional View)
To master Base R, learners must move from single-variable plots to Multi-Layered Visuals. These are considered “complex” because they combine statistical calculations, data subsetting, and multiple graphical functions into a single, cohesive output.
Here are three systemic examples of complex Base R visuals, following the “Layered Building” approach.
This approach is used when you have grouped data (like different subjects or trials) and want to compare their trajectories on one canvas.
Systemic Strategy: Initialize a “blank” coordinate system using type = “n”, then iterate through groups using a for loop to draw each individual series.
Code
# Setup colors for 5 different trees
tree_colors <- c("#5E2CE8", "#2078F4", "#F42069", "#20F4AB", "#F4A420")
# 1. Create the empty Frame (The Foundation)
plot(Orange$age, Orange$circumference,
type = "n",
main = "Longitudinal Growth of Orange Trees",
xlab = "Age (days)",
ylab = "Circumference (mm)",
las = 1, bty = "l")
# 2. Add a Grid for readability
grid(nx = NULL, ny = NULL, col = "gray90", lty = "solid")
# 3. Use a loop to add lines for each Tree (The Structural Layer)
for(i in 1:5) {
# Subset data for the specific tree
tree_data <- subset(Orange, Tree == i)
# Add lines and points
lines(tree_data$age, tree_data$circumference,
type = "b", # "b" for both points and lines
col = tree_colors[i],
pch = 15 + i, # Unique symbols for each tree
lwd = 2)
}
# 4. Final Decorations (The Legend)
legend("topleft",
legend = paste("Tree ID:", 1:5),
col = tree_colors,
pch = 16:20,
lty = 1,
bty = "n",
cex = 0.8,
title = "Tree Groups")For publication-level analysis, showing just the curve isn’t enough. We need to show the Individual Distribution alongside the Central Tendency.
Systemic Strategy: Combine density(), polygon(), abline(), and rug() to show multiple statistical dimensions.
Code
# 1. Calculate the Density object
dens_mpg <- density(mtcars$mpg)
# 2. Plot the main curve
plot(dens_mpg,
main = "MPG Distribution Density",
xlab = "Miles Per Gallon",
xlim = c(5, 40),
lwd = 2, col = "darkblue")
# 3. Fill the area (Shading Layer)
polygon(dens_mpg, col = rgb(0.1, 0.1, 0.8, 0.2), border = NA)
# 4. Add Rug (Raw Data Layer)
# Shows exactly where the actual data points are concentrated
rug(mtcars$mpg, col = "red", lwd = 1.5)
# 5. Add Statistical Markers (Reference Layer)
abline(v = mean(mtcars$mpg), col = "darkgreen", lwd = 2, lty = 2) # Mean
abline(v = median(mtcars$mpg), col = "purple", lwd = 2, lty = 3) # Median
# 6. Add explanatory text
text(x = mean(mtcars$mpg), y = 0.01, labels = "Average", pos = 4, col = "darkgreen")In scientific research, bar charts must include Standard Error bars to indicate the precision of the mean.
Systemic Strategy: Use tapply() for group calculations and arrows() to draw the error bars manually based on coordinate mapping.
Code
# 1. Data Processing: Mean and Standard Error for MPG grouped by Cylinder
means <- tapply(mtcars$mpg, mtcars$cyl, mean)
st_err <- tapply(mtcars$mpg, mtcars$cyl, function(x) sd(x)/sqrt(length(x)))
# 2. Draw the Bar Foundation
par(mar = c(5, 5, 4, 2))
bp <- barplot(means,
ylim = c(0, max(means + st_err) + 5),
col = "steelblue",
border = "white",
main = "Average MPG with Standard Error Bars",
xlab = "Cylinders", ylab = "Mean MPG")
# 3. Add Error Bars (The Arrows Layer)
# angle = 90 makes the flat top/bottom of the error bar
arrows(x0 = bp, y0 = means - st_err,
x1 = bp, y1 = means + st_err,
angle = 90, code = 3, length = 0.1, lwd = 2, col = "black")
# 4. Add raw data points (The Overlay Layer)
points(rep(bp, table(mtcars$cyl)), mtcars$mpg, pch = 21, bg = "white", col = "gray40")Courses that contain short and easy to digest video content are available at premieranalytics.com.bd Each lessons uses data that is built into R or comes with installed packages so you can replicated the work at home. premieranalytics.com.bd also includes teaching on statistics and research methods.