Histogram

We use the data frame “mtcars” already available from R to demonstrate the creation of a few commonly used graphs. The data are displayed below:

mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2

The data can also be displayed with the code below using the DT package:

datatable(mtcars)

Now, we create a histogram for the mpg column. Note that a histogram or boxplot is only for numeric data.

ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(bins = 5, fill = "blue", color = "black", alpha = 0.7) +
  labs(title = "Histogram of MPG",
       x = "Miles Per Gallon (MPG)",
       y = "Frequency")

Explanation of the above code:

Interpretation of results:

Box Plot

ggplot(mtcars, aes(x = as.factor(cyl), y = mpg)) +
  geom_boxplot(fill = "lightblue", color = "black") +
  labs(title = "Box Plot of MPG by Number of Cylinders",
       x = "Number of Cylinders",
       y = "Miles Per Gallon (MPG)")

Interpretation of results:

Scatter Plot

ggplot(mtcars, aes(x = mpg, y = disp)) +
  geom_point() +
  geom_text(aes(label = 1:32), vjust = -0.5, hjust = -0.5) +
  labs(title = "Scatter Plot of MPG vs. Displacement",
       x = "Miles Per Gallon (MPG)",
       y = "Displacement")

Explanation of the above code:

Explanation of results:

Interactive Scatter Plot using plotly

library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
# We store a scatterplot in an object called p
p <- ggplot(mtcars, aes(x = mpg, y = disp)) +
  geom_point() +
  labs(title = "Scatter Plot of MPG vs. Displacement",
       x = "Miles Per Gallon (MPG)",
       y = "Displacement")

# Make the plot interactive
ggplotly(p)

Explanation of the above code:

Scatterplot Matrix

pairs(mtcars[c("mpg", "disp", "hp", "wt")])

Barplot Based on Individual Data

ggplot(mtcars, aes(x = cyl)) +
  geom_bar() +
  labs(title = "Distributin of the Number of Cylinders",
       x = "Number of Cylinders") + 
  theme(plot.title = element_text(hjust = 0.5))

Explanation of the above code:

Barplot Based on Summary Data

# Calculate average mpg for each number of cylinders
avg_mpg <- aggregate(mpg ~ cyl, data = mtcars, FUN = mean)

# Barplot
ggplot(avg_mpg, aes(x = cyl, y = mpg)) +
  geom_col() +
  labs(title = "Average MPG by Number of Cylinders",
       x = "Number of Cylinders",
       y = "Average Miles Per Gallon (MPG)") + 
  theme(plot.title = element_text(hjust = 0.5))

Time Series Plot

Let’s say that stock price of 3M Company has mean of 106 and Standard Deviation of 1.26. The Standard Deviation is a measure of how spread out the prices or returns of an asset are on average. It is the most widely used risk indicator in the field of investing and finance.

We generate monthly time series data from a normal distribution with mean 106 and standard deviation of 0.27.

stock = ts(rnorm(100, 106, 0.27), frequency = 12, start = c(1998, 2))
stock
##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 1998          105.6254 105.3020 106.0901 106.3761 105.5460 105.7923 106.1339
## 1999 105.8000 105.9569 106.1227 106.1920 106.3724 105.7462 105.5297 105.8498
## 2000 106.4730 105.7630 105.6875 105.7641 106.0937 105.7381 105.4277 106.0798
## 2001 106.0806 105.9288 105.4425 106.0464 105.7975 106.3408 105.6732 105.9791
## 2002 105.9559 105.8260 106.0372 106.0449 105.8435 105.8834 105.6386 105.8515
## 2003 105.8820 106.2117 106.0813 106.0965 105.6684 105.8549 105.9249 105.8704
## 2004 105.7583 106.3315 106.3935 106.0003 106.2597 106.0432 106.4831 106.1453
## 2005 105.8015 106.0931 105.7109 106.5101 106.4558 105.9197 105.5926 106.1594
## 2006 106.2911 105.7376 105.7855 106.3559 105.8911                           
##           Sep      Oct      Nov      Dec
## 1998 105.9876 105.7674 105.8960 105.9745
## 1999 106.2392 105.8667 106.1186 105.8361
## 2000 106.5948 105.7216 105.9366 106.2797
## 2001 106.3516 105.5821 105.8648 106.0571
## 2002 106.1627 105.8432 105.9771 105.9157
## 2003 105.8601 106.0661 106.2748 106.1002
## 2004 106.1806 106.3307 106.1268 106.1987
## 2005 105.9013 105.7073 105.4566 106.0048
## 2006

Plot the time series:

library(forecast)
autoplot(stock) 

Explanation of the above code:

To plot the monthly means, we do

monthly_stock_means = tapply(stock, cycle(stock), mean)
plot(monthly_stock_means, type = "l", xlab = "Month", ylab = "Mean Stock Price", main = "Monthly Means of 3M Stock Price")

Explanation of the above code:

Another way:

Month = cycle(stock)
monthly_stock_means = aggregate(stock~Month, FUN=mean)
monthly_stock_means$Month = factor(monthly_stock_means$Month, labels = month.abb)

ggplot(monthly_stock_means, aes(x = Month, y = stock)) +
  geom_line(group = 1)

Explanation of the above code:

Heatmap

correlation_matrix <- cor(mtcars)
heatmap(correlation_matrix, col = heat.colors(20), main = "Correlation Heatmap")

Explanation of the above code:

Explanation of the colors in the heatmap:

In a heatmap, the color represents the intensity or strength of a particular value. For correlation heatmaps specifically, the colors indicate the strength and direction of the correlation coefficient between pairs of variables. Here’s a general guide on interpreting the colors in a correlation heatmap:

Color: Usually a shade of blue or green. Interpretation: High positive correlation. As one variable increases, the other variable tends to increase as well.

Color: Usually a shade of red or orange. Interpretation: High negative correlation. As one variable increases, the other variable tends to decrease.

Color: Neutral or close to white. Interpretation: Little to no correlation. Changes in one variable do not systematically predict changes in the other.

Color: The most intense shades of color (brightest). Interpretation: Perfect positive correlation (if the color is in the positive range) or perfect negative correlation (if the color is in the negative range).

Dark Colors: Stronger correlation. Light Colors: Weaker correlation.

The heatmap can be also used to show missing values in a data frame.

data <- data.frame(x = c(2, 8, 9, NA, 3, NA, NA, 9),
                   y = c(6, NA, 9, 5, NA, 3, 3, NA),
                   z = c(6, NA, 5, 5, 8, NA, 9, 0))
# Create a heatmap with missing values
heatmap(is.na(data)*1, col = c("white", "red"), main = "Missing Values Heatmap")

Explanation of the above code:

Explanation of the results:

Tree Maps

The following code creates a tree map visualization of the GNI (Gross National Income) per capita data for the year 2014.

library(treemap)
data(GNI2014)

treemap(GNI2014,
        index=c("continent", "iso3"),  # You can try index=c("iso3") to see the difference
        vSize="population",
        vColor="GNI",
        type="manual", 
        palette = "RdYlGn"
       )

The code uses the treemap function from the treemap package to create a treemap visualization of the GNI (Gross National Income) per capita data for the year 2014. Let’s break down the code:

Interpretation of results:

This treemap provides a hierarchical and visually appealing representation of GNI per capita data, allowing for quick comparisons and insights into the distribution of GNI per capita and population across continents and countries.

Visualizing Geographic Data

There are many R packages such as plot_ly and leaflet for visualizing geographic data. We introduce the use of the plot_ly package.

library(plotly)

# Sample data: Store locations and sales
store_data <- data.frame(Store = c("Store A", "Store B", "Store C"),
                         Latitude = c(37.7749, 34.0522, 40.7128),
                         Longitude = c(-122.4194, -118.2437, -74.0060),
                         Sales = c(12000, 15000, 18000))

# Create an interactive map using Plotly
map <- plot_ly(data = store_data, type = "scattergeo", mode = "markers",
               lat = ~Latitude, lon = ~Longitude, text = ~Store,
               marker = list(size = ~Sales / 1000, color = ~Sales, colorscale = "Viridis"))

# Customize map layout and annotation
map %>% layout(
  geo = list(showland = TRUE),
  title = "Interactive Map of Store Locations and Sales",
  annotations = list(
    list(x = 0.5, y = -0.1, text = "Marker size indicates sales amount", showarrow = FALSE),
    list(x = 0.5, y = -0.15, text = "Hover over markers to view store details", showarrow = FALSE)
  )
)

Network Graph (Example: Random graph for illustration)

This is for fun.

library(igraph)
## 
## Attaching package: 'igraph'
## The following object is masked from 'package:plotly':
## 
##     groups
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
set.seed(123)
graph <- erdos.renyi.game(10, p = 0.3)
plot(graph, layout = layout_with_fr)

Arrange Plots in a Matrix

We will use the R package “gridExtra”.

library(gridExtra)

# Create individual plots
plot1 <- ggplot(mtcars, aes(x = mpg, y = disp)) +
  geom_point() +
  labs(title = "Scatter Plot of MPG vs. Displacement")

plot2 <- ggplot(mtcars, aes(x = mpg, y = hp)) +
  geom_point() +
  labs(title = "Scatter Plot of MPG vs. Horsepower")

plot3 <- ggplot(mtcars, aes(x = mpg, y = wt)) +
  geom_point() +
  labs(title = "Scatter Plot of MPG vs. Weight")

plot4 <- ggplot(mtcars, aes(x = mpg, y = qsec)) +
  geom_point() +
  labs(title = "Scatter Plot of MPG vs. Quarter Mile Time")

# Arrange the plots in a 2x2 matrix
grid.arrange(plot1, plot2, plot3, plot4, ncol = 2)

Explanation of the above code: