Using packages and functions in R

Using packages and functions in R

The book covers the following:

1. Categorical Data

2. Distributions

3. Time Series

4. Scatter Plots

5. Maps

This is the basics on Graphics concetps:

Graphics in R

Graphics in R

When attempting a systematisation of statistical visualisation, it is natural to either start with the number of shown variables and their scale of measurement, or with the geometry.

1. Categorical Data.

The presentation of simple frequencies or of parameters such as percentages or averages by bar and column charts is certainly one of the most widely used visualisations. Sometimes though, one might want to also plot parameters that are located on a scale between two statements. In such cases, the use of a profile plot makes sense. If there are a lot of attributes in a bar chart and/or you want to plot several variables simultaneously, then a different illustration form is needed. In such cases, a dot chart comes in handy.

Tree Maps for Two Levels (Panel)

Population per capita

Population per capita

Bar Chart for Multiple Response Questions – All Response Categories, Grouped

Reading attitude in México

Reading attitude in México

#Now a simple example in R
pdf_file<-"pdf/barcharts_simple.pdf"
#cairo_pdf(bg="grey98", pdf_file,width=9,height=6.5)

par(omi=c(0.65,0.25,0.75,0.75),mai=c(0.3,2,0.35,0),mgp=c(3,3,0),
    family="Lato Light", las=1)  

# Import data and prepare chart

library(gdata)
## gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.
## 
## gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED.
## 
## Attaching package: 'gdata'
## The following object is masked from 'package:stats':
## 
##     nobs
## The following object is masked from 'package:utils':
## 
##     object.size
## The following object is masked from 'package:base':
## 
##     startsWith
ipsos<-read.xls("~/Maestría/MCAA/myData/ipsos.xlsx", encoding="latin1")
sort.ipsos<-ipsos[order(ipsos$Percent) ,]
attach(sort.ipsos)

# Create chart

x<-barplot(Percent,names.arg=F,horiz=T,border=NA,xlim=c(0,100),col="grey", cex.names=0.85,axes=F)

# Label chart

for (i in 1:length(Country))
{
if (Country[i] %in% c("Germany","Brasil")) 
    {myFont<-"Lato Black"} else {myFont<-"Lato Light"}
text(-8,x[i],Country[i],xpd=T,adj=1,cex=0.85,family=myFont)
text(-3.5,x[i],Percent[i],xpd=T,adj=1,cex=0.85,family=myFont)
}

# Other elements

rect(0,-0.5,20,28,col=rgb(191,239,255,80,maxColorValue=255),border=NA)
rect(20,-0.5,40,28,col=rgb(191,239,255,120,maxColorValue=255),border=NA)
rect(40,-0.5,60,28,col=rgb(191,239,255,80,maxColorValue=255),border=NA)
rect(60,-0.5,80,28,col=rgb(191,239,255,120,maxColorValue=255),border=NA)
rect(80,-0.5,100,28,col=rgb(191,239,255,80,maxColorValue=255),border=NA)

myValue2<-c(0,0,0,0,27,0,0,0,0,0,0,0,0,84,0,0)
myColour2<-rgb(255,0,210,maxColorValue=255)
x2<-barplot(myValue2,names.arg=F,horiz=T,border=NA,xlim=c(0,100),col=myColour2,cex.names=0.85,axes=F,add=T)

arrows(45,-0.5,45,20.5,lwd=1.5,length=0,xpd=T,col="skyblue3") 
arrows(45,-0.5,45,-0.75,lwd=3,length=0,xpd=T)
arrows(45,20.5,45,20.75,lwd=3,length=0,xpd=T)
text(41,20.5,"Average",adj=1,xpd=T,cex=0.65,font=3)
text(44,20.5,"45",adj=1,xpd=T,cex=0.65,family="Lato",font=4)
text(100,20.5,"All values in percent",adj=1,xpd=T,cex=0.65,font=3)
mtext(c(0,20,40,60,80,100),at=c(0,20,40,60,80,100),1,line=0,cex=0.80)

# Titling

mtext("'I Definitely Believe in God or a Supreme Being'",3,line=1.3,adj=0,cex=1.2,family="Lato Black",outer=T)
mtext("was said in 2010 in:",3,line=-0.4,adj=0,cex=0.9,outer=T)
mtext("Source: www.ipsos-na.com, Design: Stefan Fichtel, ixtract",1,line=1,adj=1.0,cex=0.65,outer=T,font=3)

2. Distributions

In addition to purely statistical presentation forms, such as histograms and box plots, visualisations of distributions include traditional presentation forms such as population pyramids or Lorenz curves. Obviously, not only populations can be shown in pyramids, and obviously an inequality, which is what a Lorenz curve commonly depicts, can be shown in a different form, too.

Comparison of Income Proportion with Panel-Bar Chart (Quintile)

Income Distribution #Histograms (Panel) Distribution of Income in 47 Countries. Household Income

pdf_file<-"pdf/pyramids_multicoloured.pdf"
#cairo_pdf(bg="grey98", pdf_file,width=9,height=9)

par(mai=c(0.5,1,0.5,0.5),omi=c(0.5,0.5,0.5,0.5),family="Lato Light",las=1)

# Import data and prepare chart

myWomen<-read.csv("~/Maestría/MCAA/myData/women.txt",header =F,sep=",")
for(i in 1:111) colnames(myWomen)[i]<-paste("x",i+1949,sep="")
myMen<-read.csv("~/Maestría/MCAA/myData/men.txt",header =F,sep=",")
for(i in 1:111) colnames(myMen)[i]<-paste("x",i+1949,sep="")

right<-myWomen$x2010
left<-myMen$x2010

myColour_right<-c(rep(rgb(210,210,210,maxColorValue=255),15),rep(rgb(144,157,172,maxColorValue=255),50),rep(rgb(225,152,105,maxColorValue=255),length(right)-65))
myColour_left<-myColour_right

# Create chart and other elements

barplot(right,axes=F,horiz=T,axis.lty=0,border=NA,col=myColour_right,xlim=c(-750,750))
barplot(-left,axes=F,horiz=T,axis.lty=0,border=NA,col=myColour_left,xlim=c(-750,750),add=T)

abline(v=0,lwd=28,col=par("bg"))
for (i in seq(10,90,by=10)) text(0,i+i*0.2,i,cex=1.1)
mtext(abs(seq(-600,600,by=200)),at=seq(-600,600,by=200),1,line=-1,cex=0.80)

rect(-1000,15+15*0.2,1000,66+66*0.2,xpd=T,col=rgb(210,210,210,90,maxColorValue=255), border=NA)

mtext("working age",2,line=1.5,las=3,adj=0.38)
mtext("Men",3,line=-5,adj=0.25,cex=1.5,col="grey")
mtext("Women",3,line=-5,adj=0.75,cex=1.5,col="grey")

# Titling

mtext("Age structure of the population in Germany in 2010",3,line=-1.5,adj=0,cex=1.75,family="Lato Black",outer=T)
mtext("Values in thousands per year of age",3,line=-3.25,adj=0,cex=1.25,font=3,outer=T)
mtext("Source: www.destatis.de/bevoelkerungspyramide/",1,line=0,adj=1.0,cex=0.95,font=3,outer=T)

3. Time Series

As before, there are typical application forms to be differentiated. First, we will look at how to present “short” time series. For this, we will also use columns. Then, we show how to plot areas below, between or above time series. In our experience, the presentation of daily, weekly or monthly values always proves a bit tricky; so we deal with those, too. The chapter concludes with special cases that cannot be allocated to the above groups.

Daily Values with Labels

Number of deaths

Number of deaths

Daily Values with Labels and Week Symbols (Panel)

National Happiness

National Happiness

pdf_file<-"pdf/timeseries_with_trend_3x1_inc.pdf"
#cairo_pdf(bg="grey98", pdf_file,width=11,height=9.5)

par(mfcol=c(3,1),cex.axis=1.4,mgp=c(5,1,0),family="Lato Light",las=1)
par(omi=c(0.5,0.5,1.1,0.5),mai=c(0,2,0,0.5))

# Prepare chart and import data

myColour1_150<-rgb(68,90,111,150,maxColorValue=255) 
myColour1_50<-rgb(68,90,111,50,maxColorValue=255)   
myColour2_150<-rgb(255,97,0,150,maxColorValue=255)  
myColour2_50<-rgb(255,97,0,50,maxColorValue=255)    

library(gdata)
myData<-read.xls("~/Maestría/MCAA/myData/z8053.xlsx", encoding="latin1")
attach(myData)

# Define graphic and other elements

par(mai=c(0,1.0,0.25,0))
plot(year,marriage,axes=F,type="n",xlab="",ylab="number (per 100 thousand)",cex.lab=1.5,xlim=c(1820,1920),ylim=c(700,1000),xpd=T)
axis(2,at=py<-c(700,800,900,1000),labels=format(py,big.mark=","),col=par("bg"),col.ticks="grey81",lwd.ticks=0.5,tck=-0.025)
lines(year,marriage,type="l",col=myColour1_150,lwd=3,xpd=T)
lines(year,marriagetrend,type="l",col=myColour1_50,lwd=10)
text(1910,880,"marriages with trend",cex=1.5,col=myColour1_150)

par(mai=c(0,1.0,0,0))
plot(year,agricultural,axes=F,type="n",xlab="",ylab="index",cex.lab=1.5,xlim=c(1820,1920),ylim=c(40,130))
axis(4,at=c(40,70,100,130),col=par("bg"),col.ticks="grey81",lwd.ticks=0.5,tck=-0.025)
lines(year,agricultural,type="l",col=myColour2_150,lwd=3)
lines(year,agriculturaltrend,type="l",col=myColour2_50,lwd=10)
text(1910,125,"agricultural prices with trend",cex=1.5,col=myColour2_150,xpd=T,) 
text(1913,60,"1913=100",cex=1.5,col=rgb(100,100,100,maxColorValue=255))

arrows(1913,68,1913,90,length=0.10,angle=10,code=0,lwd=2,col=rgb(100,100,100,maxColorValue=255))
points(1913,100,pch=19,col="white",cex=3.5)
points(1913,100,pch=1,col=rgb(25,25,25,200,maxColorValue=255),cex=3.5)
points(1913,100,pch=19,col=rgb(25,25,25,200,maxColorValue=255),cex=2.5)

par(mai=c(0.5,1.0,0,0))
plot(year,marriagez,axes=F,type="n",xlab="",ylab="deviations",cex.lab=1.5,xlim=c(1820,1920),ylim=c(-70,70))
axis(1,at=pretty(year))
axis(2,at=c(-60,-30,0,30,60),col=par("bg"),col.ticks="grey81",lwd.ticks=0.5,tck=-0.025)
rect(1820,-70,1867,70,border=F,col="grey90")
lines(year,marriagez,type="l",col=myColour1_150,lwd=3)
lines(year,agriculturalz,type="l",col=myColour2_150,lwd=3)
text(1910,-40,"marriages",col=myColour1_150,cex=1.5)
text(1910,40,"agricultural prices ",col=myColour2_150,cex=1.5)

# Titling

mtext("Growth Trends and Economic Cycles",3,adj=0.5,line=3,cex=2.1,outer=T,family="Lato Black")
mtext("Annual Figures",3,adj=0.06,line=0,cex=1.7,outer=T,font=3)

4. Scatter Plots

Up to four variables can be plotted in a scatter plot: two numerical variables on the x- and y-axis, a numerical or ordinal variable for definition of point size, and a nominal variable for colour definition. Additional elements can be:

•Any type of smoothing, e.g., a regression line

•Labels of individual data points

•A cross identifying the average

•An area or line (ellipse) that identifies a bivariate distribution

•A line that connects the individual points

Scatter Plot Variant 4: Superimposed Ellipse

Men and Women in Germany #Scatter Plot Variant 2: Outliers Highlighted Unemployed Population

5. Maps

R it is NOT only particularly useful for generation of graphical representations, but also very specifically for maps. The collection site http://cran.r-project.org/web/ views/spatial.html lists more than 100 packages for processing of geo data. After introductory examples, we first distinguish between those maps that visualise points, icons or entire diagrams in maps, and finally so-called choropleth maps, in which the areas within the maps illustrate the information.

Map of Tunisia with Self-defined Symbols

Unrest in Tunisia

Unrest in Tunisia

Choropleth Map of Europe at Country-Level (Panel)

Life Satisfaction

Life Satisfaction