library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
We are trying to see what influences the price of a diamond using data in the diamonds dataset from the ggplot2 package.
There are a few bad values in the variables, x, y and z. Eliminate these and make a cleaned copy of diamonds as d.
d = diamonds[diamonds$x > 0 &
diamonds$y > 0 &
diamonds$z > 0,]
d$ppc = d$price/d$carat
# Convert factors to character
d$color = as.character(d$color)
d$cut = as.character(d$cut)
d$clarity = as.character(d$clarity)
str(d)
## Classes 'tbl_df', 'tbl' and 'data.frame': 53920 obs. of 11 variables:
## $ carat : num 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
## $ cut : chr "Ideal" "Premium" "Good" "Premium" ...
## $ color : chr "E" "E" "E" "I" ...
## $ clarity: chr "SI2" "SI1" "VS1" "VS2" ...
## $ depth : num 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
## $ table : num 55 61 65 58 58 57 57 55 61 61 ...
## $ price : int 326 326 327 334 335 336 336 337 337 338 ...
## $ x : num 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
## $ y : num 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
## $ z : num 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
## $ ppc : num 1417 1552 1422 1152 1081 ...
The following chunk changes the height and width of the graphic produced in the notebook.
ggplot(data=d,aes(x=cut,y=ppc)) +
geom_jitter(aes(color=cut),alpha=.1) +
facet_grid(color~clarity) +
theme(axis.text.x = element_text(face="bold", color="#993333",
angle=45))
The following chunk produces an external file in png format with a scale factor of 2.
ggplot(data=d,aes(x=cut,y=ppc)) +
geom_jitter(aes(color=cut),alpha=.1) +
facet_grid(color~clarity) +
theme(axis.text.x = element_text(face="bold", color="#993333",
angle=45))
ggsave("grid2.png",scale=2)
## Saving 14 x 10 in image
We can use the capabilities of dplyr to help us look at diamonds.
Stop and look at dplyr section in the text.
d %>%
select(ppc,cut,color,clarity) %>%
filter(color=="F",clarity=="VS1") %>%
mutate(cutF = factor(cut,levels=c("Fair","Good","Very Good","Premium","Ideal"))) -> F_VS1
# Note the reversed arrow
Now, let’s look at the details in this cell.
F_VS1 %>% ggplot() + geom_boxplot(aes(x=cut,y=ppc))
#Try that witht the factor version of cut
F_VS1 %>% ggplot() + geom_point(aes(x=cutF,y=ppc),alpha=.1)
Let’s look at the density of ppc for each value of cut.
F_VS1 %>% ggplot() + geom_density(aes(x=ppc,color=cutF))
Let’s try that with separate graphs for each cut.
F_VS1 %>% ggplot() + geom_density(aes(x=ppc)) + facet_wrap(~cutF)
Try that with freqpoly()
F_VS1 %>% ggplot() + geom_freqpoly(aes(x=ppc)) + facet_wrap(~cutF)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.