In this article, we will create the following visualizations:
Histogram
Bar / Line Chart
Box plot
Scatter plot
Heat Map
Mosaic Map
Map Visualization
3D Graphs
Correlogram
Histogram is basically a plot that breaks the data into bins (or breaks) and shows frequency distribution of these bins.
You can change the breaks also and see the effect it has data visualization in terms of understandability.
Let me give you an example.
Note: We have used par(mfrow=c(2,5)) command to fit multiple graphs in same page for sake of clarity( see the code below).
The following commands show this in a better way. In the code below, the main option sets the Title of Graph and the col option calls in the color pallete from RColorBrewer to set the colors.
library(RColorBrewer)
data(VADeaths)
par(mfrow=c(2,3))
hist(VADeaths,breaks=10, col=brewer.pal(3,"Set3"),main="Set3 3 colors")
hist(VADeaths,breaks=3 ,col=brewer.pal(3,"Set2"),main="Set2 3 colors")
hist(VADeaths,breaks=7, col=brewer.pal(3,"Set1"),main="Set1 3 colors")
hist(VADeaths,,breaks= 2, col=brewer.pal(8,"Set3"),main="Set3 8 colors")
hist(VADeaths,col=brewer.pal(8,"Greys"),main="Greys 8 colors")
hist(VADeaths,col=brewer.pal(8,"Greens"),main="Greens 8 colors")
Line Chart
Below is the line chart showing the increase in air passengers over given time period. Line Charts are commonly preferred when we are to analyse a trend spread over a time period. Furthermore, line plot is also suitable to plots where we need to compare relative changes in quantities across some variable (like time). Below is the code:
plot(AirPassengers,type="l")
Bar Chart
Bar Plots are suitable for showing comparison between cumulative totals across several groups.
Stacked Plots are used for bar plots for various categories. Here’s the code:
barplot(iris$Petal.Length) #Creating simple Bar Graph
barplot(iris$Sepal.Length,col = brewer.pal(3,"Set1"))
barplot(table(iris$Species,iris$Sepal.Length),col = brewer.pal(3,"Set1"))
Box Plot shows 5 statistically significant numbers- the minimum, the 25th percentile, the median, the 75th percentile and the maximum.
data(iris)
par(mfrow=c(2,2))
boxplot(iris$Sepal.Length,col="red")
boxplot(iris$Sepal.Length~iris$Species,col="red")
boxplot(iris$Sepal.Length~iris$Species,col=heat.colors(3))
boxplot(iris$Sepal.Length~iris$Species,col=topo.colors(3))
Scatter plots help in visualizing data easily and for simple data inspection. Here’s the code for simple scatter and multivariate scatter plot:
plot(x=iris$Petal.Length,y=iris$Species) #Multivariate Scatter Plot
Scatter Plot Matrix can help visualize multiple variables across each other.
plot(iris,col=brewer.pal(3,"Set1"))
However, if you like pie-chart, use:
pie(table(iris$Species))
Hexbin Binning: We can use the hexbin package in case we have multiple points in the same place (overplotting).
Hexagon binning is a form of bivariate histogram useful for visualizing the structure in datasets with large n.
Here’s the code:
library(hexbin)
## Warning: package 'hexbin' was built under R version 3.4.3
library(ggplot2)
a=hexbin(diamonds$price,diamonds$carat,xbins=40)
library(RColorBrewer)
plot(a)
We can also create a color palette and then use the hexbin plot function for a better visual effect. Here’s the code:
library(RColorBrewer)
rf <- colorRampPalette(rev(brewer.pal(40,'Set3')))
## Warning in brewer.pal(40, "Set3"): n too large, allowed maximum for palette Set3 is 12
## Returning the palette you asked for with that many colors
hexbinplot(diamonds$price~diamonds$carat, data=diamonds, colramp=rf)
A mosaic plot can be used for plotting categorical data very effectively with the area of the data showing the relative proportions.
data(HairEyeColor)
mosaicplot(HairEyeColor)
Heat maps enable you to do exploratory data analysis with two dimensions as the axis and the third dimension shown by intensity of color.
However you need to convert the dataset to a matrix format. Here’s the code:
heatmap(as.matrix(mtcars))
The latest thing in R is data visualization through Javascript libraries.
Leaflet is one of the most popular open-source JavaScript libraries for interactive maps.
It is based at https://rstudio.github.io/leaflet/
You can install it straight from github using:
devtools::install_github(“rstudio/leaflet”)
library(magrittr)
library(leaflet)
m <- leaflet() %>%
addTiles() %>% # Add default OpenStreetMap map tiles
addMarkers(lng=77.2310, lat=28.6560, popup="The delicious food of chandni chowk")
m
cor(iris[1:4])
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Sepal.Length 1.0000000 -0.1175698 0.8717538 0.8179411
## Sepal.Width -0.1175698 1.0000000 -0.4284401 -0.3661259
## Petal.Length 0.8717538 -0.4284401 1.0000000 0.9628654
## Petal.Width 0.8179411 -0.3661259 0.9628654 1.0000000
library(corrgram)
## Warning: package 'corrgram' was built under R version 3.4.3
corrgram(iris)