Practice from the book - “R For Everyone” by Jared P. Lander Chapter - 6 STATISTICAL GRAPHICS #D:Programingfor Everyone
LATTICE and GGPLOT are the two builtin packages in R for plotting graphics
Graphs can be used for EDA and for Presenting Results
ggplot has a builtin dataset called Diamonds.
Histogram
hist(diamonds$carat,main = "Caret Histogram",xlab = "Caret",ylab = "Freq of caret")
Scatter plot
plot(price~carat,data = diamonds,main = "Scatter plot")
Boxplot
boxplot(diamonds$carat)
Histogram and Densities using GGplot2
ggplot(data = diamonds)+geom_histogram(aes(x=carat),fill = "red")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
GGplot2 density plots
ggplot(data = diamonds) + geom_density(aes(x=carat),fill = "green")
Ggplot2 Scatter plots
ggplot(diamonds,aes(x=carat,y=price)) + geom_point()
Saving ggplot in a variable g and marking colors
g <- ggplot(diamonds,aes(x=carat,y=price))
g + geom_point(aes(color = color)) # Here the color on the lhs side is the variable in the dataset
Using facet_wrap and facet_grid
facet_wrap takes levels of one variable (categorical), cuts up the underlying data according to them, makes a seperate pane for each set and arrange them to fit in the plot. Here the row and the column placement do not have any meaning.
g + geom_point(aes(color=color)) + facet_wrap(~cut)
#g+geom_point(aes(color = color)) + facet_wrap(~color)
facet_grid acts similarly but assigns all levels of a variable to either row or a column.
g + geom_point(aes(color = color)) + facet_grid(cut~clarity)
facet_wrap & facet_grid also works with a histograms
ggplot(diamonds,aes(x=carat)) + geom_histogram() + facet_wrap(~color)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#ggplot(diamonds,aes(x=carat)) + geom_histogram(bins=40) + facet_wrap(~color)
Boxplots
ggplot(diamonds,aes(y=carat,x=1)) + geom_boxplot()
Multiple box plots for a each level of a variable.
ggplot(diamonds,aes(y=carat,x=cut)) + geom_boxplot()
Violin plots These are similar to boxplots except that the boxes are curved, giving a sence of density of the data. Thease provode more information than the straight sides of an ordinary box plots.
ggplot(diamonds,aes(y=carat,x=cut)) + geom_violin()
Violin plot on top of a point graph
#ggplot(diamonds,aes(y=carat,x = cut)) + geom_point()
ggplot(diamonds,aes(y=carat,x = cut)) + geom_point() + geom_violin()
Violin plot below point graph
ggplot(diamonds,aes(y=carat,x = cut)) + geom_violin() +geom_point()
Line graphs Usually used with one variable as to show contunity. But that is not necessary always.
For this we load economics data from ggplot2 library
data("economics")
ggplot(economics,aes(x=date,y=pop)) + geom_line()
head(economics)
## # A tibble: 6 × 6
## date pce pop psavert uempmed unemploy
## <date> <dbl> <int> <dbl> <dbl> <int>
## 1 1967-07-01 507.4 198712 12.5 4.5 2944
## 2 1967-08-01 510.5 198911 12.5 4.7 2945
## 3 1967-09-01 516.3 199113 11.7 4.6 2958
## 4 1967-10-01 512.9 199311 12.5 4.9 3143
## 5 1967-11-01 518.1 199498 12.5 4.7 3066
## 6 1967-12-01 525.8 199657 12.1 4.8 3018
We prepare economics data according to years, we use the ‘lubridate’ package which has funs for manipulating with dates.
library(lubridate)
## Warning: package 'lubridate' was built under R version 3.3.3
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
economics$year <- year(economics$date)
economics$month <- month(economics$date,label = T) # here label is set to true to return a character rather than month number
econ2000 <- economics[which(economics$year >= 2000),]
we use ‘scales’ packjage for beter axis formatting
library(scales)
## Warning: package 'scales' was built under R version 3.3.3
g <- ggplot(econ2000,aes(x=month,y = pop))
g <- g + geom_line(aes(color = factor(year),group = year)) # add lines color coded grouped by year
g <- g + scale_color_discrete(name="Year")#naiming the legend as "Year"
g <- g + scale_y_continuous(labels = comma) # format y axis to have commas for numbers
g <- g + labs(title="Plopulation growth by month",xlab = "Month",ylab = "population") # add a title and axis lables
g
Themes ggplot2 has the ability to change the way how the plots look by using themes. we use a plot called ‘ggthemes’ for this task.
library(ggthemes)
## Warning: package 'ggthemes' was built under R version 3.3.3
g2 <- ggplot(diamonds,aes(x=carat,y=price)) + geom_point(aes(color=color))
#applying themes
g2 + theme_economist() + scale_color_economist()
g2 + theme_excel() + scale_color_excel()
g2 + theme_tufte()
g2 + theme_wsj() + labs(title = "Wall street Jounral theme")
Conclusion:
Basic plots of Boxplot, histogram, scatter plot using ggplot: boxplot, violin plots, histogram, density plot, scatter plot, facet_grid, facet_wrap, line plots. graphs using scale package, dates handling using ‘lubridate’, applying themes using - ggthemes package