Data Visualization in R With ggplot2


Ehouman Evans
Abidjan R users

12.07.2019

Introduction

Introduction

  • ggplot2 is a data visualization package for the statistical programming language R. Created by Hadley Wickham in 2005
  • ggplot2 stands for Grammar of Graphics

Agenda

  • ggplot2 presenation : Install and load the package
  • Basic ggplot2 graphs (Density, Bar, …)
  • Case study

ggplot2 presenation

Installation

  • To install R packages in RStudio: GUI versus R Console :
  1. Using the GUI: Go to the Packages tab and click Install
  2. Using the R Console: install.packages(“package_name”)
  • Try this R Code: install.packages(“ggplot2”)

Loading an R Package For Use

  • Once you’ve installed an R package, it’s then bundled with R and RStudio.
  • However, to access these files you must load your R package.
  • Try this R Code: library(ggplot2)

ggplot2 code

General structure

ggplot2 structure exemple (1)

ggplot2 structure exemple (2)

ggplot2 structure exemple (3)

ggplot2 structure exemple (4)

ggplot2 full syntax

The data…

List of datasets

Gapminder: For each of 142 countries, the package provides values for life expectancy, GDP per capita, and population, every five years, from 1952 to 2007.

airquality: Daily air quality measurements in New York, May to September 1973.

mtcars: Motor Trend Car Road Tests.

mpg: Mileage per gallon performances of various cars.

Loading the data set to process

install.packages(“gapminder”)

Viewing the Data as a Spreadsheet

Viewing the Data as a Spreadsheet

Descritption of datasets (1)

## Classes 'tbl_df', 'tbl' and 'data.frame':    1704 obs. of  6 variables:
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ pop      : int  8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
##  $ gdpPercap: num  779 821 853 836 740 ...
## Observations: 1,704
## Variables: 6
## $ country   <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, ...
## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia...
## $ year      <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992...
## $ lifeExp   <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.8...
## $ pop       <int> 8425333, 9240934, 10267083, 11537966, 13079460, 1488...
## $ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 78...

Descritption of datasets (2)

## [1] 142
##   [1] Afghanistan              Albania                 
##   [3] Algeria                  Angola                  
##   [5] Argentina                Australia               
##   [7] Austria                  Bahrain                 
##   [9] Bangladesh               Belgium                 
##  [11] Benin                    Bolivia                 
##  [13] Bosnia and Herzegovina   Botswana                
##  [15] Brazil                   Bulgaria                
##  [17] Burkina Faso             Burundi                 
##  [19] Cambodia                 Cameroon                
##  [21] Canada                   Central African Republic
##  [23] Chad                     Chile                   
##  [25] China                    Colombia                
##  [27] Comoros                  Congo, Dem. Rep.        
##  [29] Congo, Rep.              Costa Rica              
##  [31] Cote d'Ivoire            Croatia                 
##  [33] Cuba                     Czech Republic          
##  [35] Denmark                  Djibouti                
##  [37] Dominican Republic       Ecuador                 
##  [39] Egypt                    El Salvador             
##  [41] Equatorial Guinea        Eritrea                 
##  [43] Ethiopia                 Finland                 
##  [45] France                   Gabon                   
##  [47] Gambia                   Germany                 
##  [49] Ghana                    Greece                  
##  [51] Guatemala                Guinea                  
##  [53] Guinea-Bissau            Haiti                   
##  [55] Honduras                 Hong Kong, China        
##  [57] Hungary                  Iceland                 
##  [59] India                    Indonesia               
##  [61] Iran                     Iraq                    
##  [63] Ireland                  Israel                  
##  [65] Italy                    Jamaica                 
##  [67] Japan                    Jordan                  
##  [69] Kenya                    Korea, Dem. Rep.        
##  [71] Korea, Rep.              Kuwait                  
##  [73] Lebanon                  Lesotho                 
##  [75] Liberia                  Libya                   
##  [77] Madagascar               Malawi                  
##  [79] Malaysia                 Mali                    
##  [81] Mauritania               Mauritius               
##  [83] Mexico                   Mongolia                
##  [85] Montenegro               Morocco                 
##  [87] Mozambique               Myanmar                 
##  [89] Namibia                  Nepal                   
##  [91] Netherlands              New Zealand             
##  [93] Nicaragua                Niger                   
##  [95] Nigeria                  Norway                  
##  [97] Oman                     Pakistan                
##  [99] Panama                   Paraguay                
## [101] Peru                     Philippines             
## [103] Poland                   Portugal                
## [105] Puerto Rico              Reunion                 
## [107] Romania                  Rwanda                  
## [109] Sao Tome and Principe    Saudi Arabia            
## [111] Senegal                  Serbia                  
## [113] Sierra Leone             Singapore               
## [115] Slovak Republic          Slovenia                
## [117] Somalia                  South Africa            
## [119] Spain                    Sri Lanka               
## [121] Sudan                    Swaziland               
## [123] Sweden                   Switzerland             
## [125] Syria                    Taiwan                  
## [127] Tanzania                 Thailand                
## [129] Togo                     Trinidad and Tobago     
## [131] Tunisia                  Turkey                  
## [133] Uganda                   United Kingdom          
## [135] United States            Uruguay                 
## [137] Venezuela                Vietnam                 
## [139] West Bank and Gaza       Yemen, Rep.             
## [141] Zambia                   Zimbabwe                
## 142 Levels: Afghanistan Albania Algeria Angola Argentina ... Zimbabwe

ggplot2 geom_a()

Part 1: Density Plots (geom_density())

Simple

Part 1: Density Plots (geom_density())

Simple

Part 1: Density Plots (geom_density())

Simple

Part 1: Density Plots (geom_density())

Complexe

Part 1: Density Plots (geom_density())

Complexe

Part 3: Bar charts (geom_bar()/ geom_col())

Part 3: Bar charts (geom_bar()/ geom_col())

Part 3: Bar charts (geom_bar()/ geom_col())

Part 3: Bar charts (geom_histogram())

Part 3: Bar charts (geom_histogram())

Part 3: Bar charts (geom_histogram())

Part 4: Scatterplots (geom_point())

Part 4: Scatterplots (geom_point())

Part 5: Line Type

Part 5: Line Type

Scale Limits

Part 5: Line Type

Coloring

Part 5: Line Type

Coloring

Part 5: Line Type

Coloring

Part 5: Line Type

Part 5: Faceting

One colum

Part 5: Faceting

Five colum

Part 5: Faceting

Change axis title of graph

Part 5: Faceting

Change titel position, ncol=1(library(grid)

Part 5: Faceting

Change titel position, ncol=1(library(grid)

Part 6: Boxblot (geom_boxplot())

Part 6: Boxblot (geom_boxplot())

Aesthetics / Tweaking

Color

Color

Shape

Layers

Regression Line

Regression Line

Regression Line

Regression Line

Re-create The Economist graph using ggplot2

Required Packages

The Economist dataset

Cleaning up

Basic graph

Basic graph

Basic graph

Fit line

geom_line

Fit line

geom_smooth

Fit line

geom_smooth

Fit line

Change the color of the line

Labeling point

All Countries

Labeling point

identify some countries

Labeling point

identify some countries

Labeling point

Avoiding overlapping the data point

Labeling point

Avoiding overlapping the data point

Labeling point

Avoiding overlapping the data point

Legend box

Color

Legend box

Position

Legend box

Position

Grid line

Position

X-axis and Y axis

Normal

X-axis and Y axis

Font

Title and Footnote

Title and Footnote

Recap

At this point you should have

  • Created density plots and histograms
  • Generated bar plots and box plots
  • Customise ggplot2 graph
  • Created scatter plots with smoothing lines and labels
  • Mapped social science data

References

Scripts

https://honingds.com/blog/data-visualization-using-ggplot2/#ftoc-line-plots

https://ggplot.library.duke.edu/#aesthetics__tweaking

http://rstudio-pubs-static.s3.amazonaws.com/284329_c7e660636fec4a42a09eed968dc47f32.html

http://tutorials.iq.harvard.edu/R/Rgraphics/Rgraphics.html

Data source 1

gapminder

https://cran.r-project.org/web/packages/gapminder/README.html

NCHS - Death rates and life expectancy at birt

https://catalog.data.gov/dataset/age-adjusted-death-rates-and-life-expectancy-at-birth-all-races-both-sexes-united-sta-1900

landdata-states.csv

https://github.com/IQSS/dss-workshops/blob/master/R/Rgraphics/dataSets/landdata-states.csv

Data source 2

The Economist Dataset

http://tutorials.iq.harvard.edu/R/Rgraphics.zip

Economic Data freely available online

https://www.economicsnetwork.ac.uk/data_sets

Data source 3

Motor Trend Car Road Tests

https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/mtcars.html

Exploration of MPG Dataset

https://rpubs.com/shailesh/mpg-exploration

https://www.rdocumentation.org/packages/ggplot2/versions/3.2.1/topics/mpg