library(ggplot2)
library(trelliscopejs)
library(tidyverse)
library(purrr)

# look at carat variable
ggplot(diamonds, aes(carat)) +
  geom_histogram()

summary(diamonds$carat) # most carats are less than 2

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.2000  0.4000  0.7000  0.7979  1.0400  5.0100

sum(diamonds$carat > 2) # not leaving out many observations

## [1] 1889

# functions for cognostic measures
price_range_fn <- function(x){
  max(x$price) - min(x$price)
}
price_per_carat_fn <- function(x){
  mean(x$price / x$carat)
}
count_fn <- function(x){
  nrow(x)
}

# create new subset 
diamonds_sub <- diamonds %>%
  # filter out carat outliers
  filter(carat <= 2) %>%
  # group by cut
  group_by(cut) %>%
  nest() %>%
  # find price range, number of diamonds, and avg price per carat for each cut
  mutate(price_range = map_dbl(data, price_range_fn),
         price_range = cog(price_range, desc = "Price Range", default_label = TRUE),
         
         diamond_count = map_dbl(data, count_fn),
         diamond_count = cog(diamond_count, desc = "Diamond Count", default_label = TRUE),
         
         price_per_carat = map_dbl(data, price_per_carat_fn),
         price_per_carat = cog(price_per_carat, desc = "Avg price per carat", default_label = TRUE),
         
         # make plots
         plots = map_plot(data, function(d){
           ggplot(d, aes(x = carat, y = price, color = color)) +
             geom_point(alpha = 0.5)
         })) %>% 
  ungroup() 
  
# trelliscope plots faceted on diamond cut
trelliscope(diamonds_sub,
            name = "Price vs. Carat of Diamonds by Color",
            desc = "How does price of diamond vary based on carat?",
            nrow = 1, ncol = 2,
            path = "output/assignment4")

Based on the results of the trelliscope plots, it is clear that as a diamond’s carat measurement increases, its price tends to increase. There is an overall positive relationship between these values based on this data. When faceting by cut, it is evident that once accounting for carat, more ideal cut diamonds tend to be priced higher. The plots are all similar in shape with slight differences in steepness. The diamonds appear to be less consistent amongst prices as their cut worsens. This means that even as carat increases, a worse cut can be what lowers the price of the diamond. However, there are still diamonds with less ideal cuts that are priced high due to other factors such as carat and possibly other variables influencing this. Color tends to be better for smaller carat diamonds based on these plots, and prices tend to increase as color rating gets improves.

I created a few cognostic measures that the reader might want to investigate. The first measure is the price range which provides the difference between the maximum and minimum price in each cut category. The price ranges are similar across the different levels, so it seems that the spread of prices are comparable regardless of cut. I also thought it may be important to consider the number of diamonds in the dataset at each cut level in order to see how unbalanced the categories are. The majority of diamonds in this dataset fall into the “Ideal” cut and very few of them are in the “Fair” and “Good” categories which may make them difficult to compare. This is important to consider during analysis. The average price per carat for each cut is also an interesting measure to consider. This better represents the price difference of diamonds based on cut. It provides a ratio of each diamond’s price to its carat and then finds the average within each cut level. The diamonds with “Very Good” and “Premium” cuts have higher average price per carat values compared to “Fair” and “Good” cut diamonds, with the “Ideal” level falling directly in the middle. Since there are so many “Ideal” cut diamonds, there may be some outliers that are skewing this measurement. Also, there are other aspects of the diamonds that could be effecting price more than the cut does.

Exploring Diamonds Dataset

Allison Buck