Exploring the Diamond Dataset This project analyzes the relationship between diamond size, quality, and price in using data form the ‘Diamond’ dataset.

Dataset description

The Diomond dataset from the ‘ggplot2’ package contains detials about over 50,000 diamonds. Key variables are:

‘price’: Price in the US dollars ’carat’: Weight of the diamond ‘cut’: Quality of the cut ’color’: Diamond color (D: the best & J the worst) ‘clarity’: Clarity of the diomonds and how clear it is ’x’, ‘y’, ‘z’: Diomonds in mm

Perperation of the Dataset and loading it

library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Loading the dataset
data('diamonds')
#this method help to load the data and don't face any error whie loading. 
set.seed(1)
diamonds_2 <- diamonds %>% sample_n(1000)

ggplot: Creating a boxplot to show how pricing is dependign on diamond quality.

ggplot(diamonds_2, aes(x = cut, y = price, fill = cut)) + 
  geom_boxplot() + 
  labs(title = "Price vs Cut Quality",
       x = "Cut Quality", y = "Price (USD)")

ggplot: Creating a bar chart, the diamond count is compared for colors of diomonds and its clarity combinations.

ggplot(diamonds_2, aes(x = color, fill = clarity))+
  geom_bar(position = "dodge") +
  labs(title = "Color vs Clarity",
       x = "Color", y = "Diamond Count", fill = "Diamond Clarity")

Plotly graph: Scatter plot(Carat vs Price by Cut)

plot_ly(data = diamonds_2,
        x = ~carat,
        y = ~price,
        color = ~cut,
        type = 'scatter',
        mode = 'markers') %>%
  layout(title = "Carat vs Price by cut",
         xaxis = list(title = "Carat"),
         yaxis = list(title = "Price"))

Plot_ly: 3D plot(Carat, Depth, and Price(scatter plot))

plot_ly(data = diamonds_2,
        x = ~carat,
        y = ~depth,
        z = ~price,
        color = ~color,
        type = "scatter3d",
        mode = "markers") %>%
  layout(title = "3D Plot: Carat vs Depth vs Price")

Descriptive Statistics: Calculate the Avg_price, Carat, and Depth of the diamonds

# The average price
Avg_price <- mean(diamonds_2$price)
Avg_carat <- mean(diamonds_2$carat)
Avg_depth <- mean(diamonds_2$depth)

#The median 
med_price <- median(diamonds_2$price)
med_carat <- median(diamonds_2$carat)
med_depth <- median(diamonds_2$depth)

#Standard Deviations values

sd_price <- sd(diamonds_2$price)
sd_carat <- sd(diamonds_2$carat)
sd_depth <- sd(diamonds_2$depth)


#Creating a dataframe 
data.frame(
  Statistics = c("Average", "Median", "Standard Deviation"),
  Price = round(c(Avg_price, med_price, sd_price), 2),
  Carat = round(c(Avg_carat, med_carat, sd_carat), 2),
  Depth = round(c(Avg_depth, med_depth, sd_depth), 2)
)
##           Statistics   Price Carat Depth
## 1            Average 4186.03  0.83 61.82
## 2             Median 2677.00  0.72 61.90
## 3 Standard Deviation 4238.29  0.51  1.44

Explanation of the statistics

The avgerage price of diamonds is $4,186, while the median price is \(2, 677. This gap suggests that some high-priced diamonds are suddenly change direction the average upward. *The standard deviation in price is also quite large (\),238), indicating significant variability in diamond pricing. The average carat weight is 0.83 and median is 0.72, again showing that a few heavier diamonds riase the mean. The depth of diamonds is much more consistent, with a small standard deviation of 1.44, meaning most diamonds have similair depth proportions. These statistics help us understnad how extreme values can influence average and highlight where vairation is more or less prominent in the dataset.