Content of Dataset

Description

Today I will start my analysis with diamonds dataset within ggplot2 package. A dataset containing the prices and other attributes of almost 54,000 diamonds.

Variables in dataset are:

  • price - in US dollars ($326 to $18,823)
  • carat - weight of the diamond (0.2 to 5.01)
  • cut - quality of the cut (Fair, Good, Very Good, Premium, Ideal)
  • color - diamond colour, from J (worst) to D (best)
  • clarity - a measurement of how clear the diamond is (I1 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best))
  • depth - total depth percentage = z / mean(x, y) = 2 * z / (x + y) *table - width of top of diamond relative to widest point (43 to 95)
  • x - length in mm (0 to 10.74)
  • y - width in mm (0 to 58.9)
  • z - depth in mm (0 to 31.8)

Sometimes it is hard to imagine (contrary to John Lenon’s thoughts) what exacly we are talking about just looking at numbers, at least for me, so visualizations are always useful: