The Diamond dataset (available from the data repository) includes the prices and other attributes of almost 54,000 diamonds. The variables are as follows:
price. price in US dollars ($326 - $18,823) carat. weight of the diamond (0.2 - 5.01) cut. quality of the cut (Fair, Good, Very Good, Premium, Ideal) color. diamond colour, from J (worst) to D (best) clarity. a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best)) x. length in mm (0-10.74) y. width in mm (0-58.9) z. depth in mm (0-31.8) depth. total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43-79) table. width of top of diamond relative to widest point (43-95) Ensure you define the variables properly using the following code:
library(readr)
Diamonds <- read_csv("Diamonds.csv")
## Parsed with column specification:
## cols(
## carat = col_double(),
## cut = col_character(),
## color = col_character(),
## clarity = col_character(),
## depth = col_double(),
## table = col_double(),
## price = col_integer(),
## x = col_double(),
## y = col_double(),
## z = col_double()
## )
##View(Diamonds)
##Diamonds <- read.csv("../data/Diamonds.csv")
Diamonds$cut<- factor(Diamonds$cut, levels=c('Fair','Good','Very Good','Premium','Ideal'),
ordered=TRUE)
Diamonds$color<- factor(Diamonds$color, levels=c('J','I','H','G','F','E','D'),
ordered=TRUE)
Diamonds$clarity<- factor(Diamonds$clarity, levels=c('I1','SI2','SI1','VS2','VS1','VVS2','VVS1','IF'),
ordered=TRUE)
Using this dataset, you will practice using ggplot2 to produce a visualisation that helps to explain the factors that determine a diamond’s price.
Submission Your submission must be uploaded as a Word document or PDF ## Your RMarkdown document must contain the following: your code your visualisation a brief caption explaining the visualisation Participation Mark Completion of this exercise will contribute towards a participation mark. If you submit and follow the instructions above, you will receive a full participation mark (1%). If you do not follow the instructions above, you will be given feedback in order to clarify any issues with the submission.
#plot(Diamonds$cut)
#plot(Diamonds$color)
#plot(Diamonds$clarity)
qplot(x = cut, y = price, data = Diamonds ,geom = "boxplot") +
stat_summary(fun.y = mean, colour = "red", geom = "point")
qplot(x = color, y = price, data = Diamonds ,geom = "boxplot") +
stat_summary(fun.y = mean, colour = "red", geom = "point")
qplot(x = clarity, y = price, data = Diamonds ,geom = "boxplot") +
stat_summary(fun.y = mean, colour = "red", geom = "point")
That was a little weird , price and colour show the strongest trend line , whereas I would have thought price and clarity might have been the winner. So my answer is Colour is what that determines a diamond’s price.