Practice from the book - “R For Everyone” by Jared P. Lander Chapter - 6 STATISTICAL GRAPHICS #D:Programingfor Everyone

LATTICE and GGPLOT are the two builtin packages in R for plotting graphics

Graphs can be used for EDA and for Presenting Results

ggplot has a builtin dataset called Diamonds.

Histogram

hist(diamonds$carat,main = "Caret Histogram",xlab = "Caret",ylab = "Freq of caret")

Scatter plot

plot(price~carat,data = diamonds,main = "Scatter plot")

Boxplot

boxplot(diamonds$carat)

Histogram and Densities using GGplot2

ggplot(data = diamonds)+geom_histogram(aes(x=carat),fill = "red")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

GGplot2 density plots

ggplot(data = diamonds) + geom_density(aes(x=carat),fill = "green")

Ggplot2 Scatter plots

ggplot(diamonds,aes(x=carat,y=price)) + geom_point()

Saving ggplot in a variable g and marking colors

g <- ggplot(diamonds,aes(x=carat,y=price))
g + geom_point(aes(color = color)) # Here the color on the lhs side is the variable in the dataset

Using facet_wrap and facet_grid

facet_wrap takes levels of one variable (categorical), cuts up the underlying data according to them, makes a seperate pane for each set and arrange them to fit in the plot. Here the row and the column placement do not have any meaning.

g + geom_point(aes(color=color)) + facet_wrap(~cut)

#g+geom_point(aes(color = color)) + facet_wrap(~color)

facet_grid acts similarly but assigns all levels of a variable to either row or a column.

g + geom_point(aes(color = color)) + facet_grid(cut~clarity)

facet_wrap & facet_grid also works with a histograms

ggplot(diamonds,aes(x=carat)) + geom_histogram() + facet_wrap(~color)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#ggplot(diamonds,aes(x=carat)) + geom_histogram(bins=40) + facet_wrap(~color)

Boxplots

ggplot(diamonds,aes(y=carat,x=1)) + geom_boxplot()

Multiple box plots for a each level of a variable.

ggplot(diamonds,aes(y=carat,x=cut)) + geom_boxplot()

Violin plots These are similar to boxplots except that the boxes are curved, giving a sence of density of the data. Thease provode more information than the straight sides of an ordinary box plots.

ggplot(diamonds,aes(y=carat,x=cut)) + geom_violin()

Violin plot on top of a point graph

#ggplot(diamonds,aes(y=carat,x = cut)) + geom_point()
ggplot(diamonds,aes(y=carat,x = cut)) + geom_point() + geom_violin()

Violin plot below point graph

ggplot(diamonds,aes(y=carat,x = cut)) + geom_violin() +geom_point()

Line graphs Usually used with one variable as to show contunity. But that is not necessary always.

For this we load economics data from ggplot2 library

data("economics")
ggplot(economics,aes(x=date,y=pop)) + geom_line()

head(economics)
## # A tibble: 6 × 6
##         date   pce    pop psavert uempmed unemploy
##       <date> <dbl>  <int>   <dbl>   <dbl>    <int>
## 1 1967-07-01 507.4 198712    12.5     4.5     2944
## 2 1967-08-01 510.5 198911    12.5     4.7     2945
## 3 1967-09-01 516.3 199113    11.7     4.6     2958
## 4 1967-10-01 512.9 199311    12.5     4.9     3143
## 5 1967-11-01 518.1 199498    12.5     4.7     3066
## 6 1967-12-01 525.8 199657    12.1     4.8     3018

We prepare economics data according to years, we use the ‘lubridate’ package which has funs for manipulating with dates.

library(lubridate)
## Warning: package 'lubridate' was built under R version 3.3.3
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
economics$year <- year(economics$date)
economics$month <- month(economics$date,label = T) # here label is set to true to return a character rather than month number
econ2000 <- economics[which(economics$year >= 2000),]

we use ‘scales’ packjage for beter axis formatting

library(scales)
## Warning: package 'scales' was built under R version 3.3.3
g <- ggplot(econ2000,aes(x=month,y = pop))
g <- g + geom_line(aes(color = factor(year),group = year))  # add lines color coded grouped by year
g <- g + scale_color_discrete(name="Year")#naiming the legend as "Year"
g <- g + scale_y_continuous(labels = comma) # format y axis  to have commas for numbers
g <- g + labs(title="Plopulation growth by month",xlab = "Month",ylab = "population") # add a title and axis lables
g

Themes ggplot2 has the ability to change the way how the plots look by using themes. we use a plot called ‘ggthemes’ for this task.

library(ggthemes)
## Warning: package 'ggthemes' was built under R version 3.3.3
g2 <- ggplot(diamonds,aes(x=carat,y=price)) + geom_point(aes(color=color))

#applying themes
g2 + theme_economist() + scale_color_economist()

g2 + theme_excel() + scale_color_excel()

g2 + theme_tufte() 

g2 + theme_wsj() + labs(title = "Wall street Jounral theme")

Conclusion:

Basic plots of Boxplot, histogram, scatter plot using ggplot: boxplot, violin plots, histogram, density plot, scatter plot, facet_grid, facet_wrap, line plots. graphs using scale package, dates handling using ‘lubridate’, applying themes using - ggthemes package