##Set Working Directory
setwd("~/CST-435")
##Reading in the Data
In reading the data, you will be able to take a look at the data set. The data set is on diamonds, which shows the cut, price, color, clarity, depth, table, and the dimensions of the diamonds in x, y, z coordinates.
diamonds <- read.csv("~/CST-435/diamonds.csv")
print(head(diamonds))
## X carat cut color clarity depth table price x y z
## 1 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
## 5 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
##Cleaning the Data
I decided to use a boxplot to determine the outliers that are in the data set. The plot is looking at the cut of the diamonds and the price. Looking at the plot, the cut fair and ideal seem to have an equal amount of outliers even though ideal is a better cut than fair.
ggplot(diamonds, aes(x = cut, y = price, fill = cut))+ geom_boxplot(outlier.colour = "black", outlier.shape = 16, outlier.size = 2)
##Clarity
The histogram below depicts the price of the diamonds when considered with the clarity of the diamond. I chose to go with the histogram because it can show with the better clarity of the diamond, the more expensive it will be. The histogram below shows a huge spike of diamonds that are priced low and have good clarity.
ggplot(data = diamonds, aes(x = price, fill = clarity)) + geom_histogram(binwidth = 250)
##Final conclusions
Looking at the cut of the diamonds versus the price, the better the cut, the more likely it is to be priced higher. The cut of the diamond does affect the pricing as it can determine how much the diamond is worth. The premium cut of the diamond has the highest pricing. The clarity of the diamond also affects how the diamond is priced. The better the clarity of the diamond, the higher the price is going to be. These are just a couple of columns that can show the effect on the cost of the diamond.