Assignment 14

Diamond Sized

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Attaching package: 'scales'


The following object is masked from 'package:purrr':

    discard


The following object is masked from 'package:readr':

    col_factor

We have data about 53,940 diamonds. Only 126 are larger than 2.5 carats, which represents 0.23% of all diamonds. The distribution of the remainder is shown below:

Figure Figure 1 shows the distribution of diamond sizes under 2.5 carats.

Figure 1: Distribution of diamond carat sizes under 2.5 carats.

The distribution is strongly right-skewed, with most diamonds falling below 1 carat. The most striking feature is the sharp spike just under 0.5 carats, followed by smaller peaks near 1.0 and 1.5 carats, likely reflecting round-number cutting preferences. Frequency drops off rapidly after each peak, and diamonds above 2 carats are extremely rare.

This section explores how diamond size varies by Cut, Color and Clarity

#Cut

# A tibble: 5 × 5
  cut       average_carat median_carat largest_carat number_of_diamonds
  <ord>             <dbl>        <dbl>         <dbl>              <int>
1 Fair              1.05          1             5.01               1610
2 Good              0.849         0.82          3.01               4906
3 Very Good         0.806         0.71          4                 12082
4 Premium           0.892         0.86          4.01              13791
5 Ideal             0.703         0.54          3.5               21551

The chart shows that diamond sizes vary by cut. Ideal cut diamonds are mostly smaller, while Fair and Good cuts include some larger diamonds.

#Color

# A tibble: 7 × 5
  color average_carat median_carat largest_carat number_of_diamonds
  <ord>         <dbl>        <dbl>         <dbl>              <int>
1 D             0.658         0.53          3.4                6775
2 E             0.658         0.53          3.05               9797
3 F             0.737         0.7           3.01               9542
4 G             0.771         0.7           3.01              11292
5 H             0.912         0.9           4.13               8304
6 I             1.03          1             4.01               5422
7 J             1.16          1.11          5.01               2808

The chart shows that diamond size changes across color grades, but the pattern is not as strong as the difference seen by cut.

#Clarity

# A tibble: 8 × 5
  clarity average_carat median_carat largest_carat number_of_diamonds
  <ord>           <dbl>        <dbl>         <dbl>              <int>
1 I1              1.28          1.12          5.01                741
2 SI2             1.08          1.01          3.04               9194
3 SI1             0.850         0.76          2.57              13065
4 VS2             0.764         0.63          3.51              12258
5 VS1             0.727         0.57          2.59               8171
6 VVS2            0.596         0.44          2.07               5066
7 VVS1            0.503         0.39          2.31               3655
8 IF              0.505         0.35          2.29               1790

The chart shows that lower clarity groups include more large diamonds, while higher clarity groups tend to have smaller diamonds.

##Largest 20 Diamond

# A tibble: 20 × 5
   carat cut       color clarity price
   <dbl> <ord>     <ord> <ord>   <int>
 1  5.01 Fair      J     I1      18018
 2  4.5  Fair      J     I1      18531
 3  4.13 Fair      H     I1      17329
 4  4.01 Premium   I     I1      15223
 5  4.01 Premium   J     I1      15223
 6  4    Very Good I     I1      15984
 7  3.67 Premium   I     I1      16193
 8  3.65 Fair      H     I1      11668
 9  3.51 Premium   J     VS2     18701
10  3.5  Ideal     H     I1      12587
11  3.4  Fair      D     I1      15964
12  3.24 Premium   H     I1      12300
13  3.22 Ideal     I     I1      12545
14  3.11 Fair      J     I1       9823
15  3.05 Premium   E     I1      10453
16  3.04 Very Good I     SI2     15354
17  3.04 Premium   I     SI2     18559
18  3.02 Fair      I     I1      10577
19  3.01 Premium   I     I1       8040
20  3.01 Premium   F     I1       9925

The largest diamonds in this dataset are all above 3 carats. These diamonds vary in cut, color, clarity, and price. This shows that carat size is important, but it is not the only factor that affects diamond value.

Diamond Gem

##5 rows of Diamond

First 5 Rows of the Diamonds Dataset
carat cut color clarity depth table price x y z
0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75