🎨 The Systemic Approach to Base R Plotting

Author

Abdullah Al Shamim

In Base R, plotting is additive. You start with a basic command and then keep adding “layers” using additional functions.

Phase 1: Creating the Foundation (plot)

Note

Every visualization starts with the plot() function. This creates the axes and the canvas.

The simplest form: plot(x, y).

Code
# Step 1: Just the data
plot(mtcars$wt, mtcars$mpg)

Adding context so others can understand the graph.

Code
# Step 2: Adding main title and axis labels
plot(mtcars$wt, mtcars$mpg,
     main = "Vehicle Weight vs. Fuel Efficiency",
     xlab = "Weight (1000 lbs)",
     ylab = "Miles Per Gallon")

Changing the “Points Character” (pch) and Color (col).

Code
# Step 3: Changing point style (16 is a solid circle) and color
plot(mtcars$wt, mtcars$mpg,
     main = "Styled Scatter Plot",
     xlab = "Weight", ylab = "MPG",
     pch = 16, 
     col = "steelblue")

Phase 2: Adding Structural Layers (abline & grid)

Note

Once the canvas is ready, we add layers to help interpret the data.

We use abline() to add straight lines (like regression lines).

Code
plot(mtcars$wt, mtcars$mpg, pch = 16, col = "darkgray")

# Adding a red regression line
# lm() calculates the Linear Model
abline(lm(mpg ~ wt, data = mtcars), col = "red", lwd = 2)

Grids help the eye follow the coordinates.

Code
plot(mtcars$wt, mtcars$mpg, pch = 16)
grid(nx = NULL, ny = NULL, col = "lightgray", lty = "dotted")

Phase 3: Fine-Tuning (text & legend)

Note

This is the “decoration” phase where we make the plot self-explanatory.

Use text() to label specific data points.

Code
plot(mtcars$wt, mtcars$mpg, pch = 16, col = "purple")

# Labeling only points with high MPG
text(mtcars$wt, mtcars$mpg, 
     labels = ifelse(mtcars$mpg > 30, rownames(mtcars), ""), 
     pos = 3, cex = 0.7)

Crucial for plots with multiple groups.

Code
plot(mtcars$wt, mtcars$mpg, pch = 16, col = "purple")
abline(lm(mpg ~ wt, data = mtcars), col = "red", lwd = 2)

# Adding the legend
legend("topright", 
       legend = c("Car Data", "Linear Trend"),
       col = c("purple", "red"), 
       pch = c(16, NA), 
       lty = c(NA, 1))

Pro-Tip for Learners: The par() function To see multiple plots at once (systemic comparison), use par(mfrow = c(rows, cols)).

Code
par(mfrow = c(1, 2)) # 1 row, 2 columns

hist(mtcars$mpg, col = "orange", main = "Distribution")
boxplot(mtcars$mpg, col = "cyan", main = "Outliers")

Code
par(mfrow = c(1, 1)) # Reset to default

Phase 4: Single Variable Plots (Distribution)

Note

These are used to understand the “shape” of your data—where most values lie and if there are outliers.

Best for seeing the frequency of continuous data.

Code
hist(mtcars$mpg, 
     col = "skyblue", 
     border = "white", 
     main = "Histogram of MPG", 
     xlab = "Miles Per Gallon")

A smooth version of a histogram that shows the probability distribution.

Code
d <- density(mtcars$mpg)
plot(d, main = "Density Plot of MPG")
polygon(d, col = "orange", border = "black") # Fills the area

Used for categorical counts.

Code
counts <- table(mtcars$cyl)
barplot(counts, 
        main = "Cylinder Count Bar Plot", 
        col = c("red", "green", "blue"), 
        xlab = "Number of Cylinders")

Phase 5: Two Variable Plots (Relationships)

Note

These plots help you see how one variable changes in relation to another.

The gold standard for two continuous variables.

Code
plot(mtcars$hp, mtcars$qsec, 
     pch = 19, 
     col = "darkgreen", 
     main = "Horsepower vs Quarter Mile Time")

Essential for time-series or ordered sequences.

Code
# Using the Orange dataset
plot(Orange$age[1:7], Orange$circumference[1:7], 
     type = "o", # "o" for points AND lines
     lwd = 2, 
     col = "purple", 
     main = "Growth Over Time")

The best way to see a continuous variable split by a category.

Code
boxplot(mpg ~ gear, data = mtcars, 
        main = "MPG by Number of Gears", 
        col = "tomato", 
        horizontal = TRUE) # Flipped for a different view

Phase 6: Multivariate Plots (Multiple Variables)

Note

When you have more than two variables and want to see complex patterns.

Visualizes every possible relationship in a dataset at once.

Code
pairs(mtcars[, 1:4], 
      main = "Scatterplot Matrix (MPG, Cyl, Disp, HP)", 
      pch = 21, 
      bg = "gold")

Visualizes a matrix of numbers using colors.

Code
# Creating a small correlation matrix
data_matrix <- as.matrix(mtcars[1:10, 1:5])
heatmap(data_matrix, 
        Colv = NA, Rowv = NA, 
        col = cm.colors(256), 
        main = "Data Intensity Heatmap")

Phase 7: Special & Proportional Plots

Note

Used for specific statistical needs or part-to-whole relationships.

Used to show proportions (use sparingly!).

Code
slices <- c(10, 12, 4, 16)
lbls <- c("US", "UK", "Australia", "Germany")
pie(slices, labels = lbls, 
    main = "Simple Pie Chart", 
    col = rainbow(length(lbls)))

A 1D scatter plot, useful for small datasets.

Code
stripchart(mpg ~ cyl, data = mtcars, 
           vertical = TRUE, 
           method = "jitter", 
           pch = 1, 
           main = "Stripchart of MPG by Cylinder")

A cleaner alternative to a bar plot for comparing values.

Code
dotchart(mtcars$mpg, 
         labels = row.names(mtcars), 
         cex = .7, 
         main = "Gas Mileage for Car Models", 
         xlab = "Miles Per Gallon")

Phase 8: Statistical Comparison & Distribution Density

Note

These plots are specifically designed to compare groups and handle “Overplotting” (when you have too many points on top of each other).

When multiple data points fall on the same exact coordinate, a sunflower plot adds “petals” to show the density.

Code
# Using iris data to show overlapping points
sunflowerplot(iris$Sepal.Length, iris$Sepal.Width, 
              main = "Sunflower Plot for Overlapping Data",
              seg.col = "red")

Used to check if a dataset follows a normal distribution. If the points fall on the line, the data is “Normal.”

Code
qqnorm(mtcars$mpg, main = "Q-Q Plot: Checking for Normality")
qqline(mtcars$mpg, col = "red", lwd = 2)

Adds 1D tick marks to the side of an existing plot to show where the individual data points are concentrated.

Code
plot(density(mtcars$mpg), main = "Density with Rug Plot")
polygon(density(mtcars$mpg), col = "lavender")
rug(mtcars$mpg, col = "darkblue", lwd = 2)

Phase 9: Mathematical & Function Plots

Note

R can be used as a graphing calculator to visualize mathematical formulas.

Visualizing a mathematical function over a range.

Code
# Plotting a Sin wave
curve(sin(x), from = 0, to = 2*pi, 
      col = "blue", lwd = 3, 
      main = "Mathematical Function: sin(x)")

Visualizing three-dimensional data as a surface.

Code
x <- seq(-10, 10, length.out = 30)
y <- x
f <- function(x, y) { r <- sqrt(x^2+y^2); 10 * sin(r)/r }
z <- outer(x, y, f)

persp(x, y, z, 
      phi = 30, theta = 30, 
      col = "lightblue", shade = 0.5,
      main = "3D Perspective Surface")

Phase 10: Composition & Layouts (The Professional View)

Note

In professional reports, you often need to show multiple perspectives of the same data side-by-side.

Use par(mfrow = c(rows, cols)) to divide the plotting area.

Code
# Setting the stage: 2 rows, 2 columns
par(mfrow = c(2, 2))

# 1. Histogram
hist(mtcars$hp, col = "gold", main = "Engine Power")

# 2. Boxplot
boxplot(mtcars$hp, col = "tomato", main = "Power Outliers")

# 3. Density
plot(density(mtcars$hp), main = "Power Density")

# 4. Scatter
plot(mtcars$hp, mtcars$mpg, pch = 16, main = "Power vs MPG")

Code
# Reset to default 1x1
par(mfrow = c(1, 1))

The layout() function allows for asymmetrical grids (e.g., one large plot on top, two small ones below).

Code
# Create a matrix for the layout
# 1 on top, 2 and 3 on the bottom
m <- matrix(c(1, 1, 2, 3), nrow = 2, ncol = 2, byrow = TRUE)
layout(m)

hist(mtcars$mpg, col = "lightblue", main = "Main Focus: MPG")
plot(mtcars$wt, mtcars$mpg, main = "Sub-plot 1")
boxplot(mtcars$mpg, main = "Sub-plot 2")

Code
# Reset layout
layout(1)

Phase 11: Multi-Layered Visuals (The COnditional View)

Note

To master Base R, learners must move from single-variable plots to Multi-Layered Visuals. These are considered “complex” because they combine statistical calculations, data subsetting, and multiple graphical functions into a single, cohesive output.

Here are three systemic examples of complex Base R visuals, following the “Layered Building” approach.

This approach is used when you have grouped data (like different subjects or trials) and want to compare their trajectories on one canvas.

Systemic Strategy: Initialize a “blank” coordinate system using type = “n”, then iterate through groups using a for loop to draw each individual series.

Code
# Setup colors for 5 different trees
tree_colors <- c("#5E2CE8", "#2078F4", "#F42069", "#20F4AB", "#F4A420")

# 1. Create the empty Frame (The Foundation)
plot(Orange$age, Orange$circumference, 
     type = "n", 
     main = "Longitudinal Growth of Orange Trees",
     xlab = "Age (days)", 
     ylab = "Circumference (mm)",
     las = 1, bty = "l")

# 2. Add a Grid for readability
grid(nx = NULL, ny = NULL, col = "gray90", lty = "solid")

# 3. Use a loop to add lines for each Tree (The Structural Layer)
for(i in 1:5) {
  # Subset data for the specific tree
  tree_data <- subset(Orange, Tree == i)
  
  # Add lines and points
  lines(tree_data$age, tree_data$circumference, 
        type = "b",          # "b" for both points and lines
        col = tree_colors[i], 
        pch = 15 + i,        # Unique symbols for each tree
        lwd = 2)
}

# 4. Final Decorations (The Legend)
legend("topleft", 
       legend = paste("Tree ID:", 1:5), 
       col = tree_colors, 
       pch = 16:20, 
       lty = 1, 
       bty = "n", 
       cex = 0.8, 
       title = "Tree Groups")

For publication-level analysis, showing just the curve isn’t enough. We need to show the Individual Distribution alongside the Central Tendency.

Systemic Strategy: Combine density(), polygon(), abline(), and rug() to show multiple statistical dimensions.

Code
# 1. Calculate the Density object
dens_mpg <- density(mtcars$mpg)

# 2. Plot the main curve
plot(dens_mpg, 
     main = "MPG Distribution Density",
     xlab = "Miles Per Gallon", 
     xlim = c(5, 40), 
     lwd = 2, col = "darkblue")

# 3. Fill the area (Shading Layer)
polygon(dens_mpg, col = rgb(0.1, 0.1, 0.8, 0.2), border = NA)

# 4. Add Rug (Raw Data Layer)
# Shows exactly where the actual data points are concentrated
rug(mtcars$mpg, col = "red", lwd = 1.5)

# 5. Add Statistical Markers (Reference Layer)
abline(v = mean(mtcars$mpg), col = "darkgreen", lwd = 2, lty = 2) # Mean
abline(v = median(mtcars$mpg), col = "purple", lwd = 2, lty = 3) # Median

# 6. Add explanatory text
text(x = mean(mtcars$mpg), y = 0.01, labels = "Average", pos = 4, col = "darkgreen")

In scientific research, bar charts must include Standard Error bars to indicate the precision of the mean.

Systemic Strategy: Use tapply() for group calculations and arrows() to draw the error bars manually based on coordinate mapping.

Code
# 1. Data Processing: Mean and Standard Error for MPG grouped by Cylinder
means <- tapply(mtcars$mpg, mtcars$cyl, mean)
st_err <- tapply(mtcars$mpg, mtcars$cyl, function(x) sd(x)/sqrt(length(x)))

# 2. Draw the Bar Foundation
par(mar = c(5, 5, 4, 2))
bp <- barplot(means, 
              ylim = c(0, max(means + st_err) + 5), 
              col = "steelblue", 
              border = "white",
              main = "Average MPG with Standard Error Bars",
              xlab = "Cylinders", ylab = "Mean MPG")

# 3. Add Error Bars (The Arrows Layer)
# angle = 90 makes the flat top/bottom of the error bar
arrows(x0 = bp, y0 = means - st_err, 
       x1 = bp, y1 = means + st_err, 
       angle = 90, code = 3, length = 0.1, lwd = 2, col = "black")

# 4. Add raw data points (The Overlay Layer)
points(rep(bp, table(mtcars$cyl)), mtcars$mpg, pch = 21, bg = "white", col = "gray40")

Courses that contain short and easy to digest video content are available at premieranalytics.com.bd Each lessons uses data that is built into R or comes with installed packages so you can replicated the work at home. premieranalytics.com.bd also includes teaching on statistics and research methods.