library(tidyverse)
library(ggplot2)
Make a new data set that has the average depth and price of the diamonds in the data set
avgprice<-mean(c(diamonds$price))
avgdepth<-mean(c(diamonds$depth))
DepthPrice<-cbind(avgprice,avgdepth)
DepthPrice
## avgprice avgdepth
## [1,] 3932.8 61.7494
Add a new column to the data set that records each diamond’s price per carat.
pricePerCarat<-diamonds$price/diamonds$carat
Create a new data set that groups diamonds by their cut and displays the average price of each group.
diamonds %>% group_by(cut) %>% summarize(meanPrice=mean(price))
## # A tibble: 5 x 2
## cut meanPrice
## <ord> <dbl>
## 1 Fair 4359.
## 2 Good 3929.
## 3 Very Good 3982.
## 4 Premium 4584.
## 5 Ideal 3458.
Duplicate!
Create a new data set that groups diamonds by color and displays the average depth and average table for each group.
diamonds %>% group_by(color) %>% summarize(meanDepth=mean(depth), meanTable=mean(table))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 7 x 3
## color meanDepth meanTable
## <ord> <dbl> <dbl>
## 1 D 61.7 57.4
## 2 E 61.7 57.5
## 3 F 61.7 57.4
## 4 G 61.8 57.3
## 5 H 61.8 57.5
## 6 I 61.8 57.6
## 7 J 61.9 57.8
Add two columns to the diamonds data set. The first column should display the average depth of diamonds in the diamond’s color group. The second column should display the average table of diamonds in the diamonds color group.
MeanDepth<-diamonds%>% group_by(color) %>% summarize(AvgDepth=mean(depth))
## `summarise()` ungrouping output (override with `.groups` argument)
MeanDepth
## # A tibble: 7 x 2
## color AvgDepth
## <ord> <dbl>
## 1 D 61.7
## 2 E 61.7
## 3 F 61.7
## 4 G 61.8
## 5 H 61.8
## 6 I 61.8
## 7 J 61.9
diamonds<-left_join(diamonds,MeanDepth)
## Joining, by = "color"
MeanTable<-diamonds%>% group_by(color) %>% summarize(AvgTable=mean(table))
## `summarise()` ungrouping output (override with `.groups` argument)
MeanTable
## # A tibble: 7 x 2
## color AvgTable
## <ord> <dbl>
## 1 D 57.4
## 2 E 57.5
## 3 F 57.4
## 4 G 57.3
## 5 H 57.5
## 6 I 57.6
## 7 J 57.8
diamonds<-left_join(diamonds,MeanTable)
## Joining, by = "color"
Which color diamonds seem to be largest on average (in terms of carats)?
diamonds %>% group_by(color) %>% summarize(AvgCarat=mean(carat))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 7 x 2
## color AvgCarat
## <ord> <dbl>
## 1 D 0.658
## 2 E 0.658
## 3 F 0.737
## 4 G 0.771
## 5 H 0.912
## 6 I 1.03
## 7 J 1.16
The worst color of diamonds(J diamonds) seem to be the biggest on average. The data shows that the better the diamond’s color the smaller the diamond.
diamonds %>% group_by(color) %>% count('ideal')
## # A tibble: 7 x 3
## # Groups: color [7]
## color `"ideal"` n
## <ord> <chr> <int>
## 1 D ideal 6775
## 2 E ideal 9797
## 3 F ideal 9542
## 4 G ideal 11292
## 5 H ideal 8304
## 6 I ideal 5422
## 7 J ideal 2808
Diamonds with color G occur most frequently among diamonds with ideal cuts.
Which clarity of diamonds has the largest average table per carats?
diamonds %>% group_by(clarity) %>% summarize(TablePerCarat= mean((table/carat)))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 8 x 2
## clarity TablePerCarat
## <ord> <dbl>
## 1 I1 56.3
## 2 SI2 69.1
## 3 SI1 89.6
## 4 VS2 103.
## 5 VS1 107.
## 6 VVS2 127.
## 7 VVS1 141.
## 8 IF 140.
IF has the largest average table per carat ratio.
What is the average price per carat of diamonds that cost more than $10000?
diamonds %>% filter(price>10000) %>% summarize(AvgPpCExpensiveDiamond=mean(pricePerCarat))
## # A tibble: 1 x 1
## AvgPpCExpensiveDiamond
## <dbl>
## 1 4008.
Of the diamonds that cost more than $10000, what is the most common clarity?
diamonds %>% filter(price>10000) %>% count(clarity)
## # A tibble: 8 x 2
## clarity n
## <ord> <int>
## 1 I1 30
## 2 SI2 1239
## 3 SI1 1184
## 4 VS2 1155
## 5 VS1 747
## 6 VVS2 452
## 7 VVS1 247
## 8 IF 168
The most common clarity is SI2 in diamonds that cost more than $10,000