December 18, 2017

Comapring R and Tableau

Both the software are used for Data Visualization. But according to experts, it is not right thing to compare the two tools. R is best used for exploratory analysis, while Tableau is better utilized for visual information in form of dashboards.

-R is difficult to learn compared to Tableau. To learn R, basic background and coding syntax is required such as how data is stored- vectors or matrix. Whereas Tableau is user friendly as is just a click away from getting the result.

-Tableau is limited in visualization whereas R is flexible for any types of visualization or even for most of the software platforms.

-R is an opensource tool. New packages and libraries released periodically with no cost which enhances the user ability to learn whereas Tableau can be costly for upgraded features.

Examples with MiniClasses

  1. ScatterPlot
  • Top 10 countries with highest life expectancy and top 10 countries with highest infant mortality.

  • Showing how the life expectancy will be better/higher if there is less infant mortality. There can be multiple factors but we can assume that one of the them is the medical facilites provides better services to its citizen living in that country.

library(tidyverse) 
countrytest <- read_csv(
"C:/Users/senet/Desktop/CSC 463 Data Visualization Tools/R/countrytest.csv"
)
  • Head of dataset
head(countrytest)
# A tibble: 6 x 3
                 country life_exp inf_mort
                   <chr>    <dbl>    <dbl>
1            Afghanistan    49.72   121.63
2               Anguilla    80.98     3.44
3             Azerbaijan    71.32    28.76
4                Bermuda    80.82     2.47
5 Bosnia and Herzegovina    78.96     8.47
6 British Virgin Islands    77.95    14.43
#The data are based on per 1000 live births

ggplot wth x=life expectancy and y=infant mortality

a1 = ggplot(countrytest, aes(x=life_exp, y=inf_mort))+geom_point()
a1

a2=a1 + geom_point(aes(color=country))
a2

Using Linear Model Function:

a3=a2 + geom_smooth(method="lm", se=F) + 
  xlim(c(0, 130)) + 
  ylim(c(0, 130)) + 
  labs(subtitle="Life Expectancy Vs Infant Mortality", 
       x="Life Expectancy", 
       y="Infant Mortality", 
       title="Scatterplot", 
      caption = "Source: Country Dataset")

a3

library(magick)
scatterplot<-image_read("http://i67.tinypic.com/2ij6sjm.png")
image_scale(scatterplot, "x900")

2. Ordered Bar Chart

-Ordered Bar Chart is a chart that is ordered by the Y axis variable.

-The data is on Invasive cancer in different states in which rates are 100,000 per person.

library(readxl)
Cancer_by_State<- read_excel(
"C:/Users/senet/Desktop/Cancer by State.xlsx")
head(Cancer_by_State,4)
# A tibble: 4 x 2
      Location  Rate
         <chr> <dbl>
1     Kentucky 513.7
2     Delaware 488.1
3    Louisiana 478.7
4 Pennsylvania 477.3

library(ggplot2)
ggplot(Cancer_by_State, aes(x=Location, y=Rate)) +
  geom_bar(stat = "identity",width = .5, fill="Red")+
  labs(title="Bar Chart", 
       subtitle="Location vs Avg. Rate", 
       caption="Source: cdc")+
  theme(axis.text.x= element_text(angle = 65, vjust = 0.6))

Just sorting the dataframe by the variable of interest isn't enough to order the bar chart. In order for the bar chart to retain the order of the rows, the X axis variable (i.e. the categories) has to be converted into a factor.

colnames(Cancer_by_State) <-c("Location", "Rate")
Cancer_by_State<-Cancer_by_State[order(-Cancer_by_State$Rate),]
Cancer_by_State$Location <- factor(Cancer_by_State$Location, levels =
                                     Cancer_by_State$Location)
head(Cancer_by_State)
## # A tibble: 6 x 2
##       Location  Rate
##         <fctr> <dbl>
## 1     Kentucky 513.7
## 2     Delaware 488.1
## 3    Louisiana 478.7
## 4 Pennsylvania 477.3
## 5     New York 476.5
## 6        Maine 474.6

theme_set(theme_bw())
ggplot(Cancer_by_State, aes(x=Location, y=Rate)) +
  geom_bar(stat = "identity",width = .5, fill="Red")+
  labs(title="Ordered Bar Chart", 
       subtitle="Location vs Avg. Rate", 
       caption="Source: cdc")+
  theme(axis.text.x= element_text(angle = 65, vjust = 0.6))

OrderedBarChart<-image_read("http://i68.tinypic.com/2ags0b9.png")
image_scale(OrderedBarChart, "x900")

3.TreeMap in R & Tableau

Treemapping is a method for displaying hierarchical data using nested figures, usually rectangles.(Wiki)

Data received: Built in R studi0 - GNI2014

#devtools::install_github("wilkox/treemapify")
library(ggplot2)
library(treemapify)
library(treemap)
library(dplyr)

data("GNI2014")
head(GNI2014)
##   iso3          country     continent population    GNI
## 3  BMU          Bermuda North America      67837 106140
## 4  NOR           Norway        Europe    4676305 103630
## 5  QAT            Qatar          Asia     833285  92200
## 6  CHE      Switzerland        Europe    7604467  88120
## 7  MAC Macao SAR, China          Asia     559846  76270
## 8  LUX       Luxembourg        Europe     491775  75990
GNI2014%>%filter(continent=="Asia")->Asia
head(Asia,5)
##   iso3              country continent population   GNI
## 1  QAT                Qatar      Asia     833285 92200
## 2  MAC     Macao SAR, China      Asia     559846 76270
## 3  SGP            Singapore      Asia    4657542 55150
## 4  KWT               Kuwait      Asia    2691158 49300
## 5  ARE United Arab Emirates      Asia    4798491 44600

In a treemap, each tile represents a single observation, with the area of the tile proportional to a variable.

ggplot(Asia, aes(area = GNI, fill = population, label = country)) +
geom_treemap() +geom_treemap_text(fontface = "italic",
colour = "white",place = "centre",grow = TRUE)

ggplot(GNI2014, aes(area = GNI, fill = continent, label = country))+geom_treemap()+
geom_treemap_text(grow = T, reflow = T, colour = "black")+
scale_fill_brewer(palette = "Set1")+theme(legend.position = "bottom") +labs(
title = "The Gross National Income per capital 2014 ",caption = 
"The area of each country is proportional to its relative ",
fill = "continent")

ggplot(GNI2014, aes(area = GNI, fill = continent, label = country))+
geom_treemap() +geom_treemap_text(grow = T, reflow = T,colour="black")+
facet_wrap( ~continent, ncol=4)+scale_fill_brewer(palette = "Set1") +
theme(legend.position = "bottom") +labs(
title = "The Gross National Income per capital 2014 ",caption = 
"The area of each country is proportional to its relative GNI ",
fill = "continent"
  )

Tableautreemap<-image_read("http://i64.tinypic.com/1zx4mlc.png")
image_scale(Tableautreemap, "x900")

Other Examples:

  • Bubble Chart for Movies Data(ccm.rdata) from Miniclass 4
Bubblechart<-image_read("http://i64.tinypic.com/abglxd.png")
image_scale(Bubblechart, "200%x100%")

-Time Series for Inflation Rate Vs Unemployment Rate from MiniClass5

Timeseries<-image_read("http://i64.tinypic.com/afc4gh.png")
image_scale(Timeseries, "x900")

References: