A 3.20 Carat, Radiant Cut, D Color, SI1 Clarity, Natural Diamond
#Load white diamonds inventory list 2023-10-25 from my home directory
library(tidyverse)
library(dplyr)
library(readr)
library(mice)
library(ggplot2)
White_Diamond_Inventory <- read_csv("~/White_Diamond_Inventory2023-10-25.csv")
# Create a data frame from the diamonds dataset
diamonds_df <- as.data.frame(White_Diamond_Inventory)
# # Find missing values in the diamonds_df data frame
missing_values <- sapply(diamonds_df, function(x) sum(is.na(x)))
missing_df<- data.frame(
Column = names(diamonds_df),
Missing_Values = missing_values
)
#Keep all the existing variables and just want to rename two columns of Shape and Weight to Cut and Carat respectively in order to get generalized terms 4Cs.
diamonds_df <- diamonds_df %>%
rename(Cut = Shape, Carat = Weight)
# display diamonds data that have a Brilliant Pear or Oval shapes and a price at or over $1000 and save the plot as an image in the working directory of the poject proposal folder
diamonds_filtered <- diamonds_df %>%
filter(Cut %in% c("PS", "OV"), Total >= 1000)
diamond_plot <- ggplot(diamonds_filtered, aes(x = Carat, y = Total)) +
geom_hex()
image_path <- file.path(getwd(), "graph_diamond.png")
ggsave(filename = image_path, plot = diamond_plot)
#Count the number of occurrences for each Clarity and mutate the result to create a new column in the original data frame
diamonds_df <- diamonds_df %>%
group_by(Clarity) %>%
mutate(Count = n())
What is the relationship between the physical dimensions (length, width, depth) of a diamond and its price or quality (cut, carat, color, clarity)?
There are 2,255 cases and 20 descriptive variables such as: 1. No.:
Serial Number 2. Cut: Shape 3. Carat: weight 4. Color: diamond color
codes 5. Clarity: measurement of how clear the diamond is 6. PricePc,
Price per carat 7. Total: price in US dollars 8. List: a discount
probability 9. ListPrice: discounted price 10. Lab: GIA
11. CertNum: GIA graded number 12. Depth%: total depth percentage 13.
Table%: width of top of diamond relative to widest point 14. Len: length
in mm 15. Width: width in mm 16. Depth: depth in mm 17. Ratio: the
proportions of a diamond 18. Polish: overall smoothness and condition of
the diamond’s surface 19. Symmetry: overall outline, placement and
alignment of individual facets. 20. Fluorescence: the light a diamond
emits when exposed to UV light
The csv file of white diamond inventory summary for 2023, October 25 week was accessed from a Jewelery company located at New York City. It is a real and active data set, all the diamonds in the list are available and can be purchased from giant online wholesale jewelers such as https://www.rapnet.com. Every single natural mined diamonds have unique GIA (Gemological Institute of America) reported number, unbiased diamonds rating and grading. They set the industry standard for diamond quality, creating a scale for cut, color, clarity, and carat that acts as a diamond quality guide for jewelers and appraisers.
The data set contains information on variety of prices over time of diamonds, as well as many attributes of diamonds, some of which are known to influence their price the 4 Cs (carat, cut, color, and clarity) , as well as some physical measurements as described in Cases. So, the type of diamonds’ appraisal is so called observational study.
Data was collected from the company where I am working. We use DT (Diamond Track) online cloud inventory management system. We attach this white diamond summary list, csv file to Mailchimp for mass marketing that can be seen any subscribed jeweler companies with us.
The “Price” variable in the “diamonds_df” data set is quantitative (numeric) in nature. It represents the monetary value of each diamond. Quantitative variables are numeric and can be used for mathematical operations, making “Price” a quantitative response variable.
The other variables, such as the Four Cs (Cut, Color, Clarity, and Carat), and physical dimensions (length, width, depth) of a diamond to predict or explain variations in the “Price” variable. These predictors (independent variables) can be quantitative or qualitative (categorical). For example, “Cut” “Clarity” and “Color” are qualitative, while “Carat” is quantitative.
# Summary Statistics For Brilliant Round shape diamond and its Total price (two quantitative Variables)
brilliant_round <- diamonds_df %>%
filter(Cut == "BR")
# Calculate summary statistics for the "Carat" variable
summary_stats <- summarise(
brilliant_round,
Mean_Carat = mean(Carat),
Median_Carat = median(Carat),
Min_Carat = min(Carat),
Max_Carat = max(Carat),
SD_Carat = sd(Carat),
N = n()
)
print(summary_stats)
## # A tibble: 9 × 7
## Clarity Mean_Carat Median_Carat Min_Carat Max_Carat SD_Carat N
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 I1 1.24 1.02 0.96 3 0.473 23
## 2 I2 1.00 1.00 1 1.01 0.00707 2
## 3 IF 1.16 1.01 0.3 2.26 0.675 13
## 4 SI1 0.970 0.725 0.3 6 0.772 312
## 5 SI2 1.29 1.2 0.3 5.01 0.748 181
## 6 VS1 0.941 0.745 0.3 4 0.634 84
## 7 VS2 1.03 0.8 0.3 5.02 0.661 250
## 8 VVS1 1.15 0.94 0.32 2.81 0.842 12
## 9 VVS2 0.797 0.735 0.3 1.36 0.426 10
# Plot relationship between the weight of diamonds and total price.
ggplot(data = diamonds_df) + geom_point(mapping = aes(x = Carat, y = Total, color = Cut)) +
labs(x = "Weight of Diamonds", y = "Total Price", title = "Correlation between weight and shape affecting Diamonds' Price")
## Warning: Removed 17 rows containing missing values (`geom_point()`).
# Create a line graph prices with guesstimate analysis on Ratio attribute based on Shape
ggplot(diamonds_df, aes(x = Ratio, y = Total, color = Cut)) +
geom_line() +
labs(x = "Ratio", y = "Price") +
theme_minimal()
## Warning: Removed 1 row containing missing values (`geom_line()`).
# Create a count plot for Clarity
ggplot(diamonds_df, aes(x = Clarity)) +
geom_bar(fill = "blue") +
labs(x = "Clarity", y = "Count") +
theme_minimal()
The summarized data frame shows that there are total 887 round shape diamonds with minimum weight is from 0.30 carat to maximum weight 6 carat in the white diamond summary list. The scattered plot and line graph illustrate that radiant shape with more weight with mean ratio have the most expensive price in the marketplace. The most occurrence of clarity in our inventory is SI1 as shown in bar chart.