Set up Rstudio

Setting up RMarkdown when opening it enables you to create dynamic, reproducible, and visually appealing reports, presentations, and documents, that can help you communicate your data analysis and research findings more effectively.

An introduction to Heatmap

A heatmap is a graphical representation of data that displays the relative intensity of values in a matrix as colors. It is commonly used to visualize complex data sets and identify patterns, correlations, and outliers. Heatmaps are especially useful when dealing with large data sets or data with high dimensionality, as they can provide an overview of the data at a glance.

Heatmaps consist of a grid of cells, where each cell represents a data point in the matrix. The color of each cell represents the value of the data point, with a gradient of colors used to indicate increasing or decreasing values. Typically, a color scale is used to map values to colors, with low values mapped to cool colors like blue, and high values mapped to warm colors like red.

Heatmaps can be used to visualize a wide range of data types, including gene expression, demographic data, financial data, and more. They can also be customized in various ways, such as by adding row and column labels, changing the color scale or color scheme, and adjusting the layout and size of the plot.

In R, heatmaps can be created using the heatmap() function, which takes a matrix or data frame as input, and produces a plot with the relative intensity of values displayed as colors. Heatmaps are a powerful tool for exploring and visualizing complex data sets, and are widely used in data analysis and research.

##### Example data
set.seed(123)                                                     # Set seed for reproducibility
data<- matrix(rnorm(100, 0, 10), nrow = 10, ncol = 10)           # Create example data                                                   # Apply heatmap function

The code above creates a 10x10 matrix named data with random values generated from a normal distribution with a mean of 0 and a standard deviation of 10.

Here’s a breakdown of the code:

colnames(data)<- paste0("col", 1:10)                             # Column names
rownames(data)<- paste0("row", 1:10)                             # Row names
head(data,5)
           col1      col2       col3      col4       col5       col6       col7
row1 -5.6047565 12.240818 -10.678237  4.264642  -6.947070  2.5331851   3.796395
row2 -2.3017749  3.598138  -2.179749 -2.950715  -2.079173 -0.2854676  -5.023235
row3 15.5870831  4.007715 -10.260044  8.951257 -12.653964 -0.4287046  -3.332074
row4  0.7050839  1.106827  -7.288912  8.781335  21.689560 13.6860228 -10.185754
row5  1.2928774 -5.558411  -6.250393  8.215811  12.079620 -2.2577099 -10.717912
           col8        col9     col10
row1  -4.910312  0.05764186  9.935039
row2 -23.091689  3.85280401  5.483970
row3  10.057385 -3.70660032  2.387317
row4  -7.092008  6.44376549 -6.279061
row5  -6.880086 -2.20486562 13.606524
##### Example 1
heatmap(data)  

##### Example 2
heatmap(data, Rowv = NA, Colv = NA)                               # Remove dendogram

##### Example 3
my_colors<- colorRampPalette(c("cyan", "deeppink3"))             # Manual color range
heatmap(data, col = my_colors(100))                               # Heatmap with manual colors

##### Example 4                                # Install reshape package
library(reshape)                                                # Load reshape package

data_melt <- melt(data)                                           # Reorder data
library(ggplot2)                                                # Load ggplot2 package

ggp <- ggplot(data_melt, aes(X1, X2)) +                           # Create heatmap with ggplot2
  geom_tile(aes(fill = value))
ggp                                                               # Print heatmap

##### Example 5
ggp + scale_fill_gradient(low = "green", high = "black")          # Manual colors of heatmap

##### Example 6
library(plotly)                                                 # Load plotly package

plot_ly(z = data, type = "heatmap")                               # Apply plot_ly function
##### Example 7
plot_ly(z = data, colorscale = "Greys", type = "heatmap")         # Manual colors

Method 1: Using geom_tile Function [ggplot2 Package]

# load required packages
library(ggplot2)
library(reshape2)

# create example data
example_data <- matrix(rnorm(100), nrow = 10)
head(example_data,10)
             [,1]        [,2]        [,3]        [,4]       [,5]       [,6]
 [1,] -0.71040656 -0.57534696  0.11764660  1.44455086  0.7017843  0.7877388
 [2,]  0.25688371  0.60796432 -0.94747461  0.45150405 -0.2621975  0.7690422
 [3,] -0.24669188 -1.61788271 -0.49055744  0.04123292 -1.5721442  0.3322026
 [4,] -0.34754260 -0.05556197 -0.25609219 -0.42249683 -1.5146677 -1.0083766
 [5,] -0.95161857  0.51940720  1.84386201 -2.05324722 -1.6015362 -0.1194526
 [6,] -0.04502772  0.30115336 -0.65194990  1.13133721 -0.5309065 -0.2803953
 [7,] -0.78490447  0.10567619  0.23538657 -1.46064007 -1.4617556  0.5629895
 [8,] -1.66794194 -0.64070601  0.07796085  0.73994751  0.6879168 -0.3724388
 [9,] -0.38022652 -0.84970435 -0.96185663  1.90910357  2.1001089  0.9769734
[10,]  0.91899661 -1.02412879 -0.07130809 -1.44389316 -1.2870305 -0.3745809
            [,7]        [,8]        [,9]       [,10]
 [1,]  1.0527115 -0.21538051 -1.06332613  0.21444531
 [2,] -1.0491770  0.06529303  1.26318518 -0.32468591
 [3,] -1.2601552 -0.03406725 -0.34965039  0.09458353
 [4,]  3.2410399  2.12845190 -0.86551286 -0.89536336
 [5,] -0.4168576 -0.74133610 -0.23627957 -1.31080153
 [6,]  0.2982276 -1.09599627 -0.19717589  1.99721338
 [7,]  0.6365697  0.03778840  1.10992029  0.60070882
 [8,] -0.4837806  0.31048075  0.08473729 -1.25127136
 [9,]  0.5168620  0.43652348  0.75405379 -0.61116592
[10,]  0.3689645 -0.45836533 -0.49929202 -1.18548008
# convert data to long format
melted_data <- melt(example_data)
head(melted_data)
  Var1 Var2       value
1    1    1 -0.71040656
2    2    1  0.25688371
3    3    1 -0.24669188
4    4    1 -0.34754260
5    5    1 -0.95161857
6    6    1 -0.04502772
attach(melted_data)
# create heatmap using ggplot2
ggplot(melted_data, aes(x = Var1, y = Var2, fill = value)) +
  geom_tile() +
  scale_fill_gradient(low = "white", high = "red") +
  labs(x = "Column", y = "Row", title = "Example Heatmap")

Method 2: Using plot_ly Function [plotly Package]

# load required packages
library(plotly)
library(reshape2)
# create example data
example_data1 <- matrix(rnorm(100), nrow = 10)
head(example_data1,5)
           [,1]       [,2]       [,3]        [,4]       [,5]        [,6]
[1,]  2.1988103  0.1192452 -0.5739735  1.95529397 -0.7886220 -0.37560287
[2,]  1.3124130  0.2436874  0.6179858 -0.09031959 -0.5021987 -0.56187636
[3,] -0.2651451  1.2324759  1.1098481  0.21453883  1.4960607 -0.34391723
[4,]  0.5431941 -0.5160638  0.7075884 -0.73852770 -1.1373036  0.09049665
[5,] -0.4143399 -0.9925072 -0.3636573 -0.57438869 -0.1790516  1.59850877
            [,7]       [,8]       [,9]     [,10]
[1,] -0.52111732  0.8450130 -1.6674751  1.168384
[2,] -0.48987045  0.9625280  0.7364960  1.054181
[3,]  0.04715443  0.6843094  0.3860266  1.145263
[4,]  1.30019868 -1.3952743 -0.2656516 -0.577468
[5,]  2.29307897  0.8496430  0.1181445  2.002483
# convert data to long format
melted_data1 <- melt(example_data1)
head(melted_data1,5)
  Var1 Var2      value
1    1    1  2.1988103
2    2    1  1.3124130
3    3    1 -0.2651451
4    4    1  0.5431941
5    5    1 -0.4143399
# create heatmap using plotly
plot_ly(melted_data1, x = ~Var1, y = ~Var2, z = ~value, type = "heatmap") %>%
  layout(title = "Example Heatmap", xaxis = list(title = "Column"), yaxis = list(title = "Row"))

Overall, each method has its own pros and cons. For instance, the heatmap() function is simple to use but has limited customization options, while ggplot2 and plotly offer more flexibility but require more code. Additionally, the ggplot2 and plotly methods require the data to be in a long format, which may require additional data manipulation steps.

Heatmap using cordinates

To create a heat map in R using coordinates, we can use the ggplot2 package. Here is a step-by-step guide on how to create a heat map using coordinates:

# load required packages
library(ggplot2)
library(reshape2)
# create example data with coordinates and values
x_coord <- c(1, 2, 3, 4)
y_coord <- c(1, 2, 3, 4)
value <- c(12, 15, 19, 30)
foo <- data.frame(x_coord, y_coord, value)
foo
  x_coord y_coord value
1       1       1    12
2       2       2    15
3       3       3    19
4       4       4    30
# convert data frame to matrix
matrix_data1 <- acast(foo, y_coord ~ x_coord, value.var = "value")
matrix_data1
   1  2  3  4
1 12 NA NA NA
2 NA 15 NA NA
3 NA NA 19 NA
4 NA NA NA 30
# create heat map using ggplot2
ggplot(melt(matrix_data1), aes(Var2, Var1, fill = value)) +
  geom_tile() +
  scale_fill_gradient(low = "white", high = "red") +
  labs(x = "X Coordinate", y = "Y Coordinate", title = "Example Heat Map")

What are the use of heatmap

Heatmaps are a graphical representation of data where values are depicted by color. Heatmaps are used to visualize complex data and understand it at a glance. Heatmaps are essential in detecting what does or doesn’t work on a website or product page. Heatmaps are widely used in web analytics to track user behavior visually and to make data-driven changes. Heatmaps can be created by hand, using Excel spreadsheets, or with product experience insights tools like Hotjar. The following are some of the use cases of heatmaps:

Website and Product Analysis

Heatmaps give product teams, marketers, digital and data analysts, UX designers, social media specialists, and anyone who sells anything online deep insights into people’s behavior on their site. Heatmaps facilitate data analysis by combining quantitative and qualitative data, and give a snapshot understanding of how your target audience interacts with an individual website or product page—what they click on, scroll through, or ignore—which helps you identify trends and optimize your product and site to increase user engagement and sales. Heatmaps also usually display the average fold, which is the portion of the page people see on their screen without scrolling as soon as they land on it.

Visualizing 2D Data

Heatmaps can be used to visualize 2D data, such as 2D density plots. 2D density plots show the density of points in a 2D space. They are useful for visualizing the distribution of data points in a scatterplot and can be used to identify clusters and patterns in the data.

Conversion Rate Optimization

Website heatmaps are part of larger conversion rate optimization (CRO) efforts, since they’re mainly used to improve conversion rates. Heatmaps help identify areas where visitors are engaging with important content, experiencing issues based on device type or browser, or where non-clickable elements are creating distractions that harm conversion.

Overall, heatmaps are an effective tool to visualize complex data and understand it at a glance. Heatmaps are essential in detecting what does or doesn’t work on a website or product page, and can be used to optimize user engagement, retention, and sales. Heatmaps can also be used to visualize 2D data and identify popular and unpopular areas on a website, and are a key component of conversion rate optimization efforts.

Creating Heatman using Global Coordinates

To create a heatmap using global coordinates in RStudio, we can use the ggplot2 package. Here is a step-by-step guide on how to create a heatmap using global coordinates:

# load required packages
library(ggplot2)
library(mapdata)
# create example data with coordinates and values
lon <- c(9.481544, 2.352222, -74.005973, 139.650312)
lat <- c(51.312801, 48.856613, 40.712776, 35.676191)
value <- c(12, 15, 19, 30)
foo1 <- data.frame(lon, lat, value)
foo1
         lon      lat value
1   9.481544 51.31280    12
2   2.352222 48.85661    15
3 -74.005973 40.71278    19
4 139.650312 35.67619    30
# get world map data
world_map <- map_data("world")
head(world_map,10)
        long      lat group order region subregion
1  -69.89912 12.45200     1     1  Aruba      <NA>
2  -69.89571 12.42300     1     2  Aruba      <NA>
3  -69.94219 12.43853     1     3  Aruba      <NA>
4  -70.00415 12.50049     1     4  Aruba      <NA>
5  -70.06612 12.54697     1     5  Aruba      <NA>
6  -70.05088 12.59707     1     6  Aruba      <NA>
7  -70.03511 12.61411     1     7  Aruba      <NA>
8  -69.97314 12.56763     1     8  Aruba      <NA>
9  -69.91181 12.48047     1     9  Aruba      <NA>
10 -69.89912 12.45200     1    10  Aruba      <NA>
# create heat map using ggplot2
ggplot() +
  geom_polygon(data = world_map, aes(x = long, y = lat, group = group), fill = "white", color = "grey") +
  geom_point(data = foo1, aes(x = lon, y = lat, fill = value), size = 5, shape = 21) +
  scale_fill_gradient(low = "white", high = "red") +
  labs(x = "Longitude", y = "Latitude", title = "Example Heat Map")

In the example code, we first created a data frame foo with the coordinates and values. Next, we used the map_data() function to get a map of the world. Then, we used the ggplot() function to create the plot and added the map layer using the geom_polygon() function, specifying the world map data. We also added the heat map layer using the geom_point() function, specifying the values from the data frame as the fill aesthetic. We also customized the heat map using scale_fill_gradient() to change the color scheme and labs() to add axis labels and a title to the plot. Overall, creating a heat map using global coordinates in RStudio is similar to creating a heat map with other types of data. The main difference is that we need to use a map of the world as a layer and specify the lon and lat aesthetics in the ggplot() function.

Heatmap of US population density

To create a heatmap of population density of the US, we can use the ggplot2 and tidycensus packages in R. Here is a step-by-step guide on how to create a heatmap of population density:

# load required packages
library(ggplot2)
library(reshape2)

# create a dataframe with county, population and area data
county_data <- data.frame(
  County = c("County1", "County2", "County3", "County4", "County5", "County6"),
  Population = c(100000, 50000, 75000, 25000, 125000, 10000),
  Area = c(150, 100, 200, 50, 300, 25)
)

head(county_data,5)
   County Population Area
1 County1     100000  150
2 County2      50000  100
3 County3      75000  200
4 County4      25000   50
5 County5     125000  300
# calculate the population density
county_data$Density <- county_data$Population / county_data$Area
county_data
   County Population Area  Density
1 County1     100000  150 666.6667
2 County2      50000  100 500.0000
3 County3      75000  200 375.0000
4 County4      25000   50 500.0000
5 County5     125000  300 416.6667
6 County6      10000   25 400.0000
# melt the data for use in ggplot2
county_data_melted <- melt(county_data, id.vars = "County")
county_data_melted
    County   variable       value
1  County1 Population 100000.0000
2  County2 Population  50000.0000
3  County3 Population  75000.0000
4  County4 Population  25000.0000
5  County5 Population 125000.0000
6  County6 Population  10000.0000
7  County1       Area    150.0000
8  County2       Area    100.0000
9  County3       Area    200.0000
10 County4       Area     50.0000
11 County5       Area    300.0000
12 County6       Area     25.0000
13 County1    Density    666.6667
14 County2    Density    500.0000
15 County3    Density    375.0000
16 County4    Density    500.0000
17 County5    Density    416.6667
18 County6    Density    400.0000
# create the heatmap
ggplot(county_data_melted, aes(x = variable, y = County)) +
  geom_tile(aes(fill = value), colour = "white") +
  scale_fill_gradient(low = "white", high = "red") +
  theme_minimal()

Additional Example

state <- c("Alabama", "Alaska", "Arizona", "Arkansas", "California")
population <- c(4903185, 731545, 7278717, 3017825, 39538223)
area <- c(52420, 665384, 113990, 53179, 163696)
density <- population / area
data <- data.frame(state, population, area, density)
head(data,5)
       state population   area    density
1    Alabama    4903185  52420  93.536532
2     Alaska     731545 665384   1.099433
3    Arizona    7278717 113990  63.853996
4   Arkansas    3017825  53179  56.748435
5 California   39538223 163696 241.534448

HeatMap for Population Density of the United States

To create a heatmap of population density for the US in R, there are several approaches that can be used.

One approach is to use the kde2d function in the MASS package to create a 2D density estimate.

This approach is explained in stats.stackexchange.com. Here are the steps to follow:

  • Load the MASS package: library(MASS)
  • Read in your data: data <- read.csv(“data.csv”)
  • Create a 2D density estimate: density <- kde2d(data\(x, data\)y)
  • Plot the density estimate using filled contour: filled.contour(density)

Another approach is to use ggplot2 and the geom_density2d function to create a continuous density heatmap of the data.

This approach is explained in stackoverflow.com. Here are the steps to follow:

  • Load the ggplot2 package: library(ggplot2)
  • Read in your data: data <- read.csv(“data.csv”)
  • Create a continuous density heatmap: ggplot(data, aes(x=x, y=y)) + geom_density2d()

If the data is sparse, you can increase the smoothing parameter of the kernel density estimator. This is explained in stackoverflow.com. Here is the code to increase the smoothing parameter:

  • Load the ggplot2 package: library(ggplot2)
  • Read in your data: data <- read.csv(“data.csv”)
  • Create a continuous density heatmap with increased smoothing: ggplot(data, aes(x=x, y=y)) + stat_density2d(geom=“polygon”, aes(fill=..level..), alpha=0.3, kernel=“rectangular”, contour=FALSE) + scale_fill_gradient(low=“white”, high=“red”)

Another approach is to create a US heatmap by state using plotly, as explained in jeffgswanson.com.

Here are the steps to follow:

  • Load the plotly package: library(plotly)
  • Read in your data: data <- read.csv(“data.csv”)
  • Create a list of map details: