library(tidyverse)
library(ggplot2)

Question 1

Make a new data set that has the average depth and price of the diamonds in the data set

avgprice<-mean(c(diamonds$price))
avgdepth<-mean(c(diamonds$depth))

DepthPrice<-cbind(avgprice,avgdepth)
DepthPrice
##      avgprice avgdepth
## [1,]   3932.8  61.7494

Question 2

Add a new column to the data set that records each diamond’s price per carat.

pricePerCarat<-diamonds$price/diamonds$carat

Question 3

Create a new data set that groups diamonds by their cut and displays the average price of each group.

diamonds %>% group_by(cut) %>% summarize(meanPrice=mean(price))
## # A tibble: 5 x 2
##   cut       meanPrice
##   <ord>         <dbl>
## 1 Fair          4359.
## 2 Good          3929.
## 3 Very Good     3982.
## 4 Premium       4584.
## 5 Ideal         3458.

Question 4

Duplicate!

Question 5

Create a new data set that groups diamonds by color and displays the average depth and average table for each group.

diamonds %>% group_by(color) %>% summarize(meanDepth=mean(depth), meanTable=mean(table))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 7 x 3
##   color meanDepth meanTable
##   <ord>     <dbl>     <dbl>
## 1 D          61.7      57.4
## 2 E          61.7      57.5
## 3 F          61.7      57.4
## 4 G          61.8      57.3
## 5 H          61.8      57.5
## 6 I          61.8      57.6
## 7 J          61.9      57.8

Extra Credit

Add two columns to the diamonds data set. The first column should display the average depth of diamonds in the diamond’s color group. The second column should display the average table of diamonds in the diamonds color group.

MeanDepth<-diamonds%>% group_by(color) %>% summarize(AvgDepth=mean(depth))
## `summarise()` ungrouping output (override with `.groups` argument)
MeanDepth
## # A tibble: 7 x 2
##   color AvgDepth
##   <ord>    <dbl>
## 1 D         61.7
## 2 E         61.7
## 3 F         61.7
## 4 G         61.8
## 5 H         61.8
## 6 I         61.8
## 7 J         61.9
diamonds<-left_join(diamonds,MeanDepth)
## Joining, by = "color"
MeanTable<-diamonds%>% group_by(color) %>% summarize(AvgTable=mean(table))
## `summarise()` ungrouping output (override with `.groups` argument)
MeanTable
## # A tibble: 7 x 2
##   color AvgTable
##   <ord>    <dbl>
## 1 D         57.4
## 2 E         57.5
## 3 F         57.4
## 4 G         57.3
## 5 H         57.5
## 6 I         57.6
## 7 J         57.8
diamonds<-left_join(diamonds,MeanTable)
## Joining, by = "color"

Question 6

Which color diamonds seem to be largest on average (in terms of carats)?

diamonds %>% group_by(color) %>% summarize(AvgCarat=mean(carat))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 7 x 2
##   color AvgCarat
##   <ord>    <dbl>
## 1 D        0.658
## 2 E        0.658
## 3 F        0.737
## 4 G        0.771
## 5 H        0.912
## 6 I        1.03 
## 7 J        1.16

The worst color of diamonds(J diamonds) seem to be the biggest on average. The data shows that the better the diamond’s color the smaller the diamond.

Question 7

diamonds %>% group_by(color) %>% count('ideal')
## # A tibble: 7 x 3
## # Groups:   color [7]
##   color `"ideal"`     n
##   <ord> <chr>     <int>
## 1 D     ideal      6775
## 2 E     ideal      9797
## 3 F     ideal      9542
## 4 G     ideal     11292
## 5 H     ideal      8304
## 6 I     ideal      5422
## 7 J     ideal      2808

Diamonds with color G occur most frequently among diamonds with ideal cuts.

Question 8

Which clarity of diamonds has the largest average table per carats?

diamonds %>% group_by(clarity) %>% summarize(TablePerCarat= mean((table/carat)))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 8 x 2
##   clarity TablePerCarat
##   <ord>           <dbl>
## 1 I1               56.3
## 2 SI2              69.1
## 3 SI1              89.6
## 4 VS2             103. 
## 5 VS1             107. 
## 6 VVS2            127. 
## 7 VVS1            141. 
## 8 IF              140.

IF has the largest average table per carat ratio.

Question 9

What is the average price per carat of diamonds that cost more than $10000?

diamonds %>% filter(price>10000) %>% summarize(AvgPpCExpensiveDiamond=mean(pricePerCarat))
## # A tibble: 1 x 1
##   AvgPpCExpensiveDiamond
##                    <dbl>
## 1                  4008.

Question 10

Of the diamonds that cost more than $10000, what is the most common clarity?

diamonds %>% filter(price>10000) %>% count(clarity) 
## # A tibble: 8 x 2
##   clarity     n
##   <ord>   <int>
## 1 I1         30
## 2 SI2      1239
## 3 SI1      1184
## 4 VS2      1155
## 5 VS1       747
## 6 VVS2      452
## 7 VVS1      247
## 8 IF        168

The most common clarity is SI2 in diamonds that cost more than $10,000