##Data-Visualization Home work 1

#Deimantė Jokšaitė

#Task

1.Choose a data set (the number of data attributes should be more than 5), explain why it is important or interesting for you. 2.Formulate research questions (for which you expect to find the answers) 3.Make some visualizations for the formulated questions. Prepare a presentation (where you explain the data, questions, problems, results) and upload it.

#Questions

  1. How the size of a diamond affects the price?
  2. Whether the size of diamond depends on the transparency of the diamond?
  3. Does the color of a diamond affect the price and its tranparency?
  4. Does the size of a diamond also depend on the quality of its processing?
  5. What is the quality distribution of diamond processing?
getwd()
## [1] "C:/Users/skirmantas/OneDrive/Desktop"
setwd("C:/Users/skirmantas/OneDrive/Desktop")
duom <- read.csv2("C:/Users/skirmantas/OneDrive/Desktop/duomenu/DPP.csv", header = TRUE, sep = ";", dec = ".")
  1. Carat(Weight of Diamond) - Weight of Diamond.
  2. Cut(Quality) - Quality of cut(Fair, Good, Very Good, Premium, Ideal).
  3. Color - Diamond Color(from J -> ‘worst’ to D -> ‘Best’).
  4. Clarity - Measurement of Transparency(how clear the Diamond is) Sequence of clarity ( I1 (worst quality), SI2, SI1, VS2, VS1, VVS2, VVS1, IF(best quality) ).
  5. Table - Width of top of a Diamond.
  6. Price(in US dollars) - Price of Diamond in US dollars.
  7. X(length) - Length of Diamond in mm.
  8. Y(width) - Width of Diamond in mm.
  9. Z(depth) - Depth of Diamond in mm.
  10. Depth - Total depth percentage. It can calculated by a simple formula. Total Depth % = z / mean(x , y) or z * 2 / (x + y).

#Data str()

str(duom)
## 'data.frame':    53940 obs. of  10 variables:
##  $ Carat.Weight.of.Daimond.: num  0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
##  $ Cut.Quality.            : chr  "Ideal" "Premium" "Good" "Premium" ...
##  $ Color                   : chr  "E" "E" "E" "I" ...
##  $ Clarity                 : chr  "SI2" "SI1" "VS1" "VS2" ...
##  $ Depth                   : num  61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
##  $ Table                   : num  55 61 65 58 58 57 57 55 61 61 ...
##  $ Price.in.US.dollars.    : int  326 326 327 334 335 336 336 337 337 338 ...
##  $ X.length.               : num  3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
##  $ Y.width.                : num  3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
##  $ Z.Depth.                : num  2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
#install.packages("ggpubr")
library(ggpubr)
## Įkeliamas reikalingas paketas: ggplot2

Kaip deimanto dydis įtakoja jo kainą?

ggplot() + 
  geom_point(data = duom, mapping = aes(y = Price.in.US.dollars., x = Z.Depth. )) + 
  geom_smooth(data = duom, mapping = aes(y = Price.in.US.dollars., x = Z.Depth. ))
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Ar nuo deimanto dydžio priklauso deimanto skaidrumas?

ggplot(data = duom, mapping = aes(x = Clarity, y = Z.Depth.)) + 
  geom_boxplot()

Ar deimantų spalva įtakoja kainą ir jo skaidrumą?

ggplot(data = duom) + 
  geom_point(mapping = aes(x = Price.in.US.dollars.,
                           y = Carat.Weight.of.Daimond. , color = Color))

Ar nuo deimanto dydžio priklauso apdirbimo kokybė?

ggplot(data = duom) + 
  stat_summary(
    mapping = aes(x = Cut.Quality., y = Z.Depth.),
    fun.min = min,
    fun.max = max,
    fun = median)

Koks yra deimanto apdorojimo kokybės pasiskirstymas?

ggplot(data = duom) + 
  geom_bar(mapping = aes(x = Cut.Quality., fill = Clarity), position = "dodge")