Data visualization in R is primarily done using base R functions or popular libraries like ggplot2, which is part of the tidyverse package. Data visualization transforms raw data into meaningful insights through charts, graphs, and plots. R offers two main methods for visualization:
Base R functions is helpful for Quick and simple visualizations. ggplot2 is a powerful library for creating complex and customizable plots.
Make sure you have the necessary R packages installed: install.packages(“ggplot2”) # For data visualization install.packages(“dplyr”) # For data manipulation
Load the ggplot2 and dplyr libraries
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.2
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
A bar plot (or bar chart) is a graphical representation of categorical data using rectangular bars where the length or height of each bar corresponds to the data’s value or frequency.
Sample data
data <- data.frame(
Category = c("A", "B", "C", "D"),
Value = c(10, 23, 17, 35)
)
Bar Plot
barplot(data$Value, names.arg = data$Category,
col = "skyblue", main = "Base R Bar Plot", ylab = "Value")
ggplot(data, aes(x = Category, y = Value, fill = Category)) +
geom_bar(stat = "identity") +
theme_minimal() +
labs(title = "ggplot2 Bar Plot", y = "Value")
Plotting the count of cars by the number of cylinders (cyl). Displays how many cars have 4, 6, or 8 cylinders.
Load mtcars dataset
data(mtcars)
Convert ‘cyl’ to a factor
mtcars$cyl <- as.factor(mtcars$cyl)
Create a frequency table for cylinders
cyl_counts <- table(mtcars$cyl)
Base R Bar Plot
barplot(cyl_counts,
main = "Number of Cars by Cylinders",
xlab = "Number of Cylinders",
ylab = "Count of Cars",
col = c("skyblue", "salmon", "lightgreen"),
border = "black")
Bar plot showing counts with ggplot2.
ggplot(mtcars, aes(x = cyl, fill = cyl)) +
geom_bar() +
labs(title = "Count of Cars by Number of Cylinders",
x = "Number of Cylinders", y = "Count of Cars") +
scale_fill_manual(values = c("skyblue", "salmon", "lightgreen")) +
theme_minimal()
See how automatic and manual cars are distributed within each cylinder category.
Visualizing the number of cylinders stacked by the type of transmission (am).
Convert ‘am’ to a factor (0 = Automatic, 1 = Manual)
mtcars$am <- factor(mtcars$am, labels = c("Automatic", "Manual"))
Stacked Bar Plot
ggplot(mtcars, aes(x = cyl, fill = am)) +
geom_bar(position = "stack") +
labs(title = "Cylinders by Transmission Type",
x = "Number of Cylinders", y = "Count of Cars") +
scale_fill_manual(values = c("lightblue", "orange")) +
theme_light()
Separate bars for each transmission type.
Easy to compare the number of manual vs. automatic cars within each cylinder group.
ggplot(mtcars, aes(x = cyl, fill = am)) +
geom_bar(position = "dodge") +
labs(title = "Cylinders by Transmission Type (Grouped)",
x = "Number of Cylinders", y = "Count of Cars") +
scale_fill_manual(values = c("lightblue", "orange")) +
theme_classic()
ggplot(mtcars, aes(x = cyl, fill = cyl)) +
geom_bar() +
coord_flip() + # Flips the axes
labs(title = "Horizontal Bar Plot: Cars by Cylinders",
x = "Count of Cars", y = "Number of Cylinders") +
theme_minimal()
A box plot (or box-and-whisker plot) is a standardized way of displaying the distribution of data based on a five-number summary: Minimum, First Quartile (Q1), Median (Q2), Third Quartile (Q3), and Maximum. It is especially useful for identifying outliers, spread, and central tendency in a dataset.
The lowest and highest data points within the whiskers.
Outliers:
Easy to compare distributions between groups. Summarizes data spread, central tendency, and outliers. Useful for large datasets.
Sample data
radd <- data.frame(
Group = rep(c("A", "B", "C"), each = 20),
Value = c(rnorm(20, mean = 5), rnorm(20, mean = 7), rnorm(20, mean = 6))
)
Create boxplot
boxplot(Value ~ Group, data = radd,
main = "Boxplot Example (Base R)",
xlab = "Group",
ylab = "Value",
col = c("skyblue", "salmon", "lightgreen"),
border = "black")
Sample data
datas <- data.frame(
Group = rep(c("A", "B", "C"), each = 20),
Value = c(rnorm(20, mean = 5), rnorm(20, mean = 7), rnorm(20, mean = 6))
)
Create boxplot
ggplot(datas, aes(x = Group, y = Value, fill = Group)) +
geom_boxplot() +
labs(title = "Boxplot Example (ggplot2)",
x = "Group", y = "Value") +
theme_minimal()
Compare fuel efficiency across cars with 4, 6, and 8 cylinders. Cars with 4 cylinders generally have higher MPG.
Load mtcars dataset
data(mtcars)
Convert ‘cyl’ to a factor for categorical plotting
mtcars$cyl <- as.factor(mtcars$cyl)
Base R Boxplot
boxplot(mpg ~ cyl, data = mtcars,
main = "MPG by Number of Cylinders",
xlab = "Number of Cylinders",
ylab = "Miles Per Gallon (MPG)",
col = c("lightblue", "salmon", "lightgreen"),
border = "black")
Load ggplot2 library
ggplot2 Boxplot
ggplot(mtcars, aes(x = cyl, y = mpg, fill = cyl)) +
geom_boxplot() +
labs(title = "MPG by Number of Cylinders (ggplot2)",
x = "Number of Cylinders",
y = "Miles Per Gallon (MPG)") +
scale_fill_manual(values = c("skyblue", "salmon", "lightgreen")) +
theme_minimal()
Outliers are in red. Jittered points show the exact data distribution.
ggplot(mtcars, aes(x = cyl, y = mpg, fill = cyl)) +
geom_boxplot(outlier.color = "red", outlier.shape = 16, outlier.size = 2) +
geom_jitter(width = 0.2, alpha = 0.6) + # Adds data points
labs(title = "MPG by Number of Cylinders with Outliers Highlighted",
x = "Number of Cylinders", y = "MPG") +
scale_fill_manual(values = c("skyblue", "salmon", "lightgreen")) +
theme_light()
ggplot(mtcars, aes(y = mpg, x = cyl, fill = cyl)) +
geom_boxplot() +
coord_flip() + # Flips axes
labs(title = "Horizontal Boxplot: MPG by Cylinders",
x = "Number of Cylinders", y = "MPG") +
theme_classic()
A line plot (or line graph) is a type of chart used to display data points connected by straight lines. It is commonly used to show trends over time, continuous data, or relationships between variables.
Straight lines connect data points to highlight trends and changes.
df <- data.frame(Time = 1:10, Measurement = cumsum(rnorm(10)))
ggplot(df, aes(x = Time, y = Measurement)) +
geom_line(color = "blue", size = 1.2) +
labs(title = "Line Plot", x = "Time", y = "Measurement") +
theme_light()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Plotting mpg (miles per gallon) as a function of car index (just the row number) in mtcars.
Load mtcars dataset
data(mtcars)
Basic Line Graph in Base R This graph shows MPG against the car index (row number), connecting the data points to show the trend.
plot(mtcars$mpg, type = "o", col = "blue",
main = "Miles per Gallon (MPG) by Car Index",
xlab = "Car Index (Row Number)", ylab = "Miles per Gallon (MPG)",
pch = 16, lwd = 2)
Plotting mpg versus hp (horsepower) as a line graph.
ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_line(color = "blue", size = 1) +
labs(title = "MPG vs Horsepower", x = "Horsepower", y = "Miles per Gallon (MPG)") +
theme_minimal()
Plotting mpg versus hp with different lines for number of cylinders (cyl). Each cylinder group (4, 6, or 8 cylinders) is represented by a different line with a distinct color.
ggplot(mtcars, aes(x = hp, y = mpg, color = factor(cyl))) +
geom_line(size = 1) +
labs(title = "MPG vs Horsepower by Number of Cylinders",
x = "Horsepower", y = "Miles per Gallon (MPG)") +
scale_color_manual(values = c("red", "green", "blue")) +
theme_minimal()
Plotting mpg as a function of hp and adding data points.
ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_line(color = "blue", size = 1) +
geom_point(color = "red", size = 3) +
labs(title = "MPG vs Horsepower with Data Points",
x = "Horsepower", y = "Miles per Gallon (MPG)") +
theme_light()
Adding a smoothed line to visualize trends more clearly.
ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_point(color = "red", size = 3) +
geom_smooth(method = "loess", color = "blue", size = 1) +
labs(title = "MPG vs Horsepower with Trend Line",
x = "Horsepower", y = "Miles per Gallon (MPG)") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
A scatter plot (or scatter chart) is a type of graph used to represent the relationship between two continuous variables. Each point on the graph represents an observation in the dataset, with the position on the X-axis and Y-axis corresponding to the values of the variables.
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point(color = "darkgreen") +
geom_smooth(method = "lm", col = "red") +
labs(title = "MPG vs Weight", x = "Weight", y = "Miles Per Gallon")
## `geom_smooth()` using formula = 'y ~ x'
Plotting mpg (miles per gallon) against hp (horsepower) in mtcars.
Load mtcars dataset
data(mtcars)
Basic Scatter Plot in Base R The scatter plot shows the relationship between horsepower and miles per gallon.
plot(mtcars$hp, mtcars$mpg,
main = "Scatter Plot of MPG vs Horsepower",
xlab = "Horsepower", ylab = "Miles per Gallon (MPG)",
pch = 19, col = "blue")
Plotting mpg versus hp with custom styling using ggplot2.
ggplot2 Scatter Plot
ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_point(color = "blue", size = 3) +
labs(title = "Scatter Plot of MPG vs Horsepower", x = "Horsepower", y = "Miles per Gallon (MPG)") +
theme_minimal()
Points are colored based on the number of cylinders (4, 6, 8 cylinders). Each group (cylinder type) has a different color for better distinction.
Scatter plot of mpg vs hp, where points are colored by the number of cylinders (cyl).
ggplot(mtcars, aes(x = hp, y = mpg, color = factor(cyl))) +
geom_point(size = 3) +
labs(title = "MPG vs Horsepower by Number of Cylinders", x = "Horsepower", y = "Miles per Gallon (MPG)") +
scale_color_manual(values = c("red", "green", "blue")) +
theme_light()
Adding a smooth trend line to the scatter plot to visualize the relationship. The red trend line (linear regression) helps show the overall relationship between horsepower and MPG.
ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_point(color = "blue", size = 3) +
geom_smooth(method = "lm", color = "red", size = 1) +
labs(title = "MPG vs Horsepower with Linear Trend Line", x = "Horsepower", y = "Miles per Gallon (MPG)") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
Using jitter to add noise to overlapping points for better visibility. Jittering helps separate points that are clustered in the same spot, especially when multiple cars have the same mpg or hp values.
ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_jitter(color = "blue", size = 3, width = 0.2, height = 0.2) +
labs(title = "MPG vs Horsepower with Jitter", x = "Horsepower", y = "Miles per Gallon (MPG)") +
theme_minimal()
Adding a regression line with the confidence interval to the scatter plot. Confidence interval (shaded area around the red line) shows the range within which the true regression line is expected to fall.
ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_point(color = "blue", size = 3) +
geom_smooth(method = "lm", se = TRUE, color = "red", size = 1) +
labs(title = "MPG vs Horsepower with Regression Line and Confidence Interval",
x = "Horsepower", y = "Miles per Gallon (MPG)") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
In this tutorial, we explored different methods for visualizing data in R. We started with basic plotting techniques in Base R and then moved to more advanced visualizations using ggplot2.
Base R plots like bar plots, scatter plots, histograms, and boxplots provide a quick way to visualize data and can be useful for exploratory data analysis.
ggplot2 allows for more refined and customizable visualizations. It provides a wide range of functionalities such as regression lines, faceting, and aesthetics, which help present data in a more clear and insightful manner.