Data Visualizing Techniques in R
Introduction
Data visualization is a powerful tool for understanding and communicating information. In this post, we will explore various techniques for visualizing data in R, using real-world applications to illustrate their effectiveness. We will cover the following topics:
- Scatterplots
- Line graphs
- Barplots
- Justaposed barplots
- Stacked barplots
- Histograms
Scatterplots, ordinary lines and smooth lines
A data frame is created from the respective vectors and an empty plot is created taking lower and higher values for the limits of the x-axis and yaxis. The subsequent scatterplots are produced using points() while the smooth lines between the scatterplots are produces using lines(lowess(). mtext is used to place the title and the x-axis label. The legend is placed on the topleft but only lower by 0.05 of the length of the y-axis and to the left by 0.05 of length of the x-axis.
par(bg="light green")
x=c(1,5,6,7,8,8,9)
y1=c(1,3,5,6,7,8,9)
y2=c(8,7,6,5,4,2,1)
z=cbind(x,y1,y2)
zz=data.frame(z)
plot(x,y1, type = "n", xlab="
",ylab="Numbers",ylim=c(min(y1),max(y1)),xlim=c(min(x),max(x)))
points(x,y1, pch=19, col="blue")
#Add a seconds series of points, y2
points(x,y2,pch=19,col="black")
legend("topleft", inset=0.05, title="Growth Forms",
c("trees","shrubs"),pch=c(19,19),col=c("black","blue"), bg = "yellow")
#Add smooth (lowess) curves to each set of points in the scatterplot
#Optional
lines(lowess(x,y1),lwd=2,lty=1,col="dark green")
lines(lowess(x,y2),lwd=2,lty=1,col="dark blue")
mtext("Inverse Relationship \n between Trees and shrubs", side=3, line=1)
mtext("Year", side=1, line=2)Line Graph
A line graph is a type of chart that displays information as a series of data points called ‘markers’ connected by straight line segments. It is a basic type of chart common in many fields. Line graphs are often used to visualize data that changes over time, such as stock prices or temperature readings.
Program for drawing a scatterplot and lines
#******************************************
par(xpd=T, mar= par()$mar+c(0,0,3,8))
Juniperus=c(2, 4, 4, 5, 12)
Podocarpus= c(1, 3, 6, 7, 9)
plot(Juniperus, type="o", pch=19, col="blue", ylim=c(0,12), xlab="",
ylab="", cex.axis=0.8)
lines(Podocarpus, type="o", pch=19, lty=2, col="red")
title(main="Forest Trees", col.main="red", font.main=4)
par(new=TRUE)
x =1:5
y = c(1,3,4, 2.5,2)
plot(x, y, axes=FALSE ,xlab="", ylab="", pch=19,col="green",
cex.axis=0.8)
sp = spline(x, y, n = 50)
lines(sp)
axis(4,cex.axis=0.8)
mtext("Year", side=1, line=2, cex=0.8)
mtext("Height ", side=2, line=2, cex=0.8)
mtext("Leaf Biomass ", side=4, line=2, cex=0.8)
title(main="Forest Trees", col.main="red", font.main=4)
legend("bottom", title="Trees & Leaves",
c("Juniperus","Podocarpus","Leaf Biomass"),pch=c(19,19,
19),col=c("red","blue", "green"), cex=0.8,
bg = "white", text.col="blue")Barplot
The data frame for a barplot of one bar can have only one vector. The standard deviation (sd) may be calculated from the vector or it can be supplied in a second vector. The color, density and angle of the hatchings of the bars are determined by col, density and angle respectively. The default angle of the hatchings is 450 and box() draws a box around the plot in the end. Labels and/or legends can be important components of a barplot.
par(mar=c(4.2, 3.8, 0.2, 0.2))
Rawdata=data.frame(Sand =c(6.3, 5.7, 6.2, 6.4, 5.8), Silt=c(9.7, 10.0, 10.2,
10.4, 9.6),Clay=c(9.0, 9.3, 8.7, 9.1, 8.9), Loam=c(6.3, 6.4, 6.8, 6.7, 6.8))
mean=sapply(Rawdata,mean)
mean=apply(Rawdata, FUN=mean,2)
sd=sapply(Rawdata,sd)
sd=apply(Rawdata, FUN=mean,2)
barplotdata=data.frame(mean=c(mean),sd=c(sd))max_y <- max(Rawdata$Sand)
Rawdata$sd <- c(0.2, 0.3, 0.25, 0.15, 0.2)
Rawdata$mean <- rowMeans(Rawdata) # Calculates mean across each row
colors = c("blue","red", "forestgreen", " green")
bp = barplot(height=Rawdata$mean, col=colors, ylim=c(0, max_y+0.5),
density=c(10,20,30,40),
names.arg=paste(row.names(Rawdata)))
title (main="Biomass of Urtica simensis in different media")
arrows(bp, Rawdata$mean-Rawdata$sd, bp, Rawdata$mean+Rawdata$sd, lwd=1.5, angle=90,
length=0.1, code=3)
#legend(locator (1), title="Growth",
c(row.names(Rawdata),density=c(10,20,30,40), fill = colors, cex=0.6, ncol = 2,
bg="white")##
## "1" "2" "3" "4" "5"
## density1 density2 density3 density4 fill1
## "10" "20" "30" "40" "blue"
## fill2 fill3 fill4 cex ncol
## "red" "forestgreen" " green" "0.6" "2"
## bg
## "white"
Justaposed Barplot
Comparison of total number of some tree species /ha in different parts of the afromontane forests, (length(rowSums(Matrix)))) refers to the number of columns (species) in the data.
Forestvegetation=data.frame(Juniperus=c(1,3,6,4,9),
Podocarpus=c(2,5,4,6,16), Prunus=c(4,4,6,6,16))
rownames(Forestvegetation)=c("Jibat", "Chilimo", "Wofasha", "Suba",
"Jemjem")par(mfrow=c(2,1))
Matrix=as.matrix(Forestvegetation)
y=max(Matrix)
barplot(Matrix, ylab= "Total",
beside=TRUE, col=rainbow(length(rowSums(Matrix))),ylim=c(0,y+0.5),
space=c(0.15, 1))
title(main="Forest Trees")
legend("topleft", title="Forests", c( row.names(Matrix)),cex=0.6, bty="n",
fill=rainbow(length(rowSums(Matrix))))
box()
Matrix=as.matrix(Forestvegetation)
Matrix=t(Matrix)
y=max(Matrix)
barplot(Matrix, ylab= "Total",
beside=TRUE, col=rainbow(length(rowSums(Matrix))),ylim=c(0,y+0.5),
space=c(0.2, 1))
#title(main="Forest Trees")
legend("topleft",title="Tree species", c(row.names(Matrix)),cex=0.6,
bty="n", fill=rainbow(length(rowSums(Matrix))), bg="white")
box()Stacked Barplots
Total number of trees/ha in different Afromontane forests of Ethiopia using a stacked barplot and placing the legend outside of the plot area: (length(colSums(Matrix))) or (length(rowSums(Matrix))) refers to the three columns (species) in the data.
Forestvegetation=data.frame(Juniperus=c(1,3,6,4,9),
Podocarpus=c(2,5,4,6,16), Prunus=c(4,4,6,6,16))
rownames(Forestvegetation)=c("Jibat", "Chilimo", "Wofasha", "Suba",
"Jemjem")
Matrix=as.matrix(Forestvegetation)
Matrix=t(Matrix)
barplot(Matrix, main="Forest Trees", ylab= "Total", xlab="Sites",
beside=FALSE, col=rainbow(length(colSums(Matrix))), cex.axis=0.8,
cex=0.8, ylim=c(0,max(colSums(Matrix))+0.5))
legend("topleft",title="Tree species", c(row.names(Matrix)),cex=0.8,
bty="n", fill=rainbow(length(colSums(Matrix))))
box()#or
Forestvegetation=data.frame(Juniperus=c(1,3,6,4,9),
Podocarpus=c(2,5,4,6,16), Prunus=c(4,4,6,6,16))
rownames(Forestvegetation)=c("Jibat", "Chilimo", "Wofasha", "Suba",
"Jemjem")
Trans=t(Forestvegetation)
barplot(Trans, main="Forest Trees", ylab="Total",xlab="Sites",
col=rainbow(length(rowSums(Trans))), space=0.1, cex.axis=0.8, las=1,
names.arg=row.names(Forestvegetation),cex=0.8,ylim=c(0,max(colSums(
Trans))+0.5))
legend("topleft",title="Tree species", c(row.names(Trans)), cex=0.8,
bty="n", fill=rainbow(length(rowSums(Trans))))
box()Histogram
A histogram is a graphical representation showing a visual impression of the distribution of data. A histogram consists of tabular frequencies shown as adjacent rectangles erected over discrete intervals (bins or breaks), with an area equal to the frequency of the observations in the interval. The height of a rectangle is also equal to the frequency density of the interval, i.e., the frequency divided by the width of the interval. A histogram may also be normalized displaying relative frequencies. A normalized histogram shows the proportion of cases that fall into each of several categories with the total area equaling 1.
list=c(5,20,20,30,30,30,30,40,40,40,40,40,50,50,50,50,50,50,60,60,60,60,60,70,70,70,70,80,80,90)
hist(list, breaks=10, density=c(10,20,30,40,50,40,30,20,10), angle=40,
col=rainbow(9), main="")
title(main=" Histogram", font.main=3)
#par(new=TRUE)
#plot(density(list), yaxt="n", xaxt="n", ann=FALSE, col="blue")
#curve(dnorm(x, mean(list), sd(list)), col="red", lwd=2, add=T)
box() Boxplot
A boxplot produces box-and-whisker plot of given or grouped values. The grouping provided in the data frame is indicated using ~ as in ANOVA or regression analysis. Box plots can be used to make a visual comparison of the differences among groups, which produce one-dimensional scatterplots can be good alternatives to boxplots when sample sizes are small.
Simple Boxplot
boxplotdata=data.frame(Altitude = c(2150, 2170, 2180, 2700, 2640, 2660,
2460, 2470, 2200, 2220, 2300, 2330, 2350, 2350, 2350, 1900, 1950, 2000,
1900, 1900, 1900, 1900, 1940, 1940, 1930, 2100, 2100, 2300, 2290, 2320,
2250, 2250, 1590, 1600, 1620, 1560, 2320, 2330, 2350, 2300, 2300, 2530,
2540, 2600, 2600, 2460, 2120, 2110, 2150, 2130, 2210, 2210, 2050, 1900,
1750, 1680, 1710, 1950, 2000, 1990, 2590, 2620,1800, 1900, 1350),
groupcode = c(1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 2, 2, 3, 2, 2, 4, 4, 4, 4, 4, 4, 4, 5, 5,
5, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 5, 5, 3, 5, 5, 5, 4, 4,
5, 5, 5, 5, 5, 5, 3, 3,5, 4, 5))
attach(boxplotdata)
boxplot(Altitude~groupcode, main="Altitude", las=1, col=rainbow(5),
varwidth=TRUE, cex.lab=0.8, cex.axis=0.8, ylab="Altitude", xlab="Group
Number")Advanced Boxplot
treelength=data.frame(Height=c(21,23,34,23,24,43,23,23,43,42,32,43,42,54,65,23,42,45,54,32,43,54,23,32,43,54,23,54,65,45,54,34,24,34,54,54),
Slope =gl(2,18,36, labels=c("Plain","Steep")),
Altitude=gl(3,6,36, labels=c(1500,1800,2400)))
library(ggplot2)
boxplot(Height~Slope+Altitude, data=treelength, boxwex=0.50,
col=c("yellow", "orange") ,xlab="Altitude in m", ylab="Tree height",
ylim=c(0,70), cex.axis = 0.8, cex.lab=0.8)
title(main="Height of Juniperus Procera along altitudinal gradient",
font.main=2, cex.main= 0.8)
legend("topleft", c("Southern Ethiopia", "Western Ethiopia"), fill =
c("yellow", "orange"), horiz=TRUE, cex=0.8)Pie Chart
A pie chart is a circular chart divided into sectors illustrating proportions. The pie chart is perhaps the most common statistical chart in the business world and the mass media and can be an effective way of displaying information in some cases, in particular if the intent is to compare the size of a slice with the whole pie, rather than comparing the slices among themselves. Pie charts can be drawn in either 2D or 3D modes. The size of the pie is controlled by radius and the gap between the slices is controlled by for example explode=0.1. The direction of placing the slices is controlled by clockwise =TRUE. In 3D piechart, label color and label text size are controlled by labelcol and labelcex. Different methods of drawing pie charts are given below:-
Worldpopulation=data.frame(pop_million=c(4246900, 542500, 757400,396900, 1019900, 37800))
rownames(Worldpopulation) =c("Asia", "N.America", "Europe",
"S.America", "Africa", "Others")
attach(Worldpopulation)
lbls = row.names(Worldpopulation)
pct = round(pop_million /sum(pop_million)*100)
lbls = paste(lbls, " - ", pct) # add percents to labels
lbls = paste(lbls,"%",sep="") # add % to the labels
pie(pop_million,labels = lbls, clockwise=FALSE,
col=rainbow(length(lbls)))
title( main="Pie Chart of World Population")