Description of the project
we have more then 50000 transactions recorded in the given file. High clarity diamond is priced high? Does the relationship between price and clarity always hold true?
let us investigate
library(ggplot2)
mydata<- read.csv(file.choose())
View(mydata)
str(mydata)
## 'data.frame': 53940 obs. of 3 variables:
## $ carat : num 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
## $ clarity: chr "SI2" "SI1" "VS1" "VS2" ...
## $ price : int 326 326 327 334 335 336 336 337 337 338 ...
ggplot(data= mydata, aes(x= carat,y= price)) + geom_point()
This gives the scatter plot for diamonds. All the transactions are captured but there is clarity variable
ggplot (data= mydata, aes(x=carat, y=price, color= clarity))+ geom_point()
ggplot (data = mydata,aes(x=carat, y= price,color= clarity))+ geom_point(alpha=0.1)
Alpha is used for clarity
Therefore let us filter out the data now with the condition on clarity.
ggplot(data= mydata[mydata$carat< 2.5,], aes(x=carat,y=price,color=clarity)) + geom_point(alpha=0.1)
Variable ‘carat’ records the only less than 2.5 values are considered in the visualization.
ggplot(mydata[mydata$carat<2.5,], aes(x=carat,y=price,color=clarity))+ geom_point(alpha=0.1)+geom_smooth()
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
Since the number of observations is too large, we use ggplot2.
Conclusion
1.From this plot we can observe that in the carat(1.0:1.5)we can see
there is not much of the clash or intersection between all kind of
clarity here.
2.From the plot we can observe that in the carat range(1.5:2.0) we can
see that many clashes between different kind of clarity are the in which
VVS2 is intersecting VVS1 and also IF price range. S12
3.From the scatter plot we can observe that carat range(2.0:2.5) we can
see that the clarity wise pricing is not done because VS2 is
intersecting many other clarity range such as SI1, SI2, VVS1 and there
is down trend ahead, IF is also intersected by I1.
From the above statements we can see that our assumption as price is determined by the clarity of the diamonds but here we can see that various interception between different clarity range proves that the prices have be altered or influence, which defines our assumption of price is based on clarity.