The base R scatterplot demonstrates a negative correlation between a car’s fuel efficiency and the number of cylinders in its engine. Vehicles with fewer cylinders typically achieve better miles per gallon, whereas vehicles with more cylinders typically achieve lower mileage. This pattern makes intuitive sense because larger, more powerful engines—which are often found in cars with more cylinders—consume more fuel to run. For immediately identifying this trend and gaining a general understanding of the correlation between the two variables, the base R plot is helpful.
#Load the mtcars dataset
data(mtcars)
#Base R Scatterplot
plot(mpg ~ cyl, data = mtcars, xlab = "Number of Cylinders", ylab = "Miles per Gallon")
The shorter box and narrower whiskers in the boxplot show that vehicles with fewer cylinders often have higher median fuel efficiency and a smaller range of values. The longer box and larger whiskers, on the other hand, show that vehicles with more cylinders typically have lower median fuel efficiency and a wider range of values. This supports the earlier graphs’ findings that there is an inverse correlation between the number of cylinders and fuel economy. The boxplot can be used to compare the data distribution across various categories visually and spot any potential outliers or variations in variability.
#Base R boxplot
boxplot(mpg ~ cyl, data = mtcars,
xlab = "Number of Cylinders", ylab = "Miles per Gallon",
main = "Boxplot of Fuel Efficiency by Number of Cylinders",
col = "lightblue", border = "black", notch = TRUE)
## Warning in (function (z, notch = FALSE, width = NULL, varwidth = FALSE, : some
## notches went outside hinges ('box'): maybe set notch=FALSE
The ggplot scatterplot, which offers more flexibility and aesthetic improvements, exhibits the same inverse relationship between the number of cylinders and fuel economy as the standard R plot. The plot has a cleaner, more contemporary appearance, the points are slightly larger, and they have an alpha of 0.7 to make them more noticeable. The use of the ggplot graph facilitates the comparison of data across several categories and allows for the creation of publication-quality visuals (in this case, the number of cylinders).
#ggplot graph
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.2.3
ggplot(mtcars, aes(x = cyl, y = mpg)) +
geom_point(size = 3, alpha = 0.7) +
labs(x = "Number of Cylinders", y = "Miles per Gallon") +
theme_classic()
In comparison to the ggplot scatterplot, the plotly scatterplot has interactive capabilities that let the user zoom in on particular areas of the plot, hover over the points to view their precise values, and do other things. The plotly graph gives the user a more interesting, interactive experience while reinforcing the inverse relationship between the number of cylinders and fuel economy. Also, it enables the user to quickly move between several chart types (such a bar chart or a line chart) to further study the data.
#Plotly graph
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
plot_ly(mtcars, x = ~cyl, y = ~mpg, type = 'scatter', mode = 'markers',
marker = list(size = 10, opacity = 0.8)) %>%
layout(xaxis = list(title = "Number of Cylinders"),
yaxis = list(title = "Miles per Gallon"))
Four distinct types of plots—a base R scatterplot, a ggplot scatterplot, a plotly scatterplot, and a base R boxplot—were used to analyze the link between the number of cylinders and fuel economy in the mtcars dataset. Every kind of plot had its own merits, such as the base R scatterplot’s simplicity and speedy display, the ggplot scatterplot’s customization and aesthetics, the plotly scatterplot’s interaction, and the base R boxplot’s distribution insights. Collectively, these plots gave a thorough knowledge of how the variables related to one another, demonstrating the effectiveness of various visualization techniques in data analysis.