Lab 5

Author

Misha Cox

PART 1: SUMMARY DATA: COLOR AND CLARITY


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

`summarise()` has grouped output by 'color'. You can override using the
`.groups` argument.

# A tibble: 56 × 5
# Groups:   color [7]
   color clarity     N    freq   pct
   <ord> <ord>   <int>   <dbl> <dbl>
 1 D     I1         42 0.00620     1
 2 D     SI2      1370 0.202      20
 3 D     SI1      2083 0.307      31
 4 D     VS2      1697 0.250      25
 5 D     VS1       705 0.104      10
 6 D     VVS2      553 0.0816      8
 7 D     VVS1      252 0.0372      4
 8 D     IF         73 0.0108      1
 9 E     I1        102 0.0104      1
10 E     SI2      1713 0.175      17
# ℹ 46 more rows

Yes, the data makes sense. The data is organized by the combinations of each color to each clarity type. It shows the number of observations of diamonds that fall into the combination of color and clarity. Furthermore it shows the frequency and percentage out of color variable. Looking at the color variable in each of the color categories there is a lower amount of diamonds with low clarity and high clarity and the rest in the color variable are in the middle. This make sense as there will probably be more average diamonds then the ones that are very nice or very bad.

PART 2: STACKED AND DODGED BAR CHARTS

PART 3: PRACTICE USING PIPES (dplyr) TO SUMMARIZE DATA: LENGTH, WIDTH AND CLARITY

# A tibble: 8 × 6
  clarity     N length_mean width_mean   freq   pct
  <ord>   <int>       <dbl>      <dbl>  <dbl> <dbl>
1 I1        741        6.76       6.71 0.0137     1
2 SI2      9194        6.40       6.40 0.170     17
3 SI1     13065        5.89       5.89 0.242     24
4 VS2     12258        5.66       5.66 0.227     23
5 VS1      8171        5.57       5.58 0.151     15
6 VVS2     5066        5.22       5.23 0.0939     9
7 VVS1     3655        4.96       4.98 0.0678     7
8 IF       1790        4.97       4.99 0.0332     3

Yes, the data makes sense. The table displays the mean of the length and width of the variable clarity. It appears that most of the diamonds fall into the lower to medium clarity categories. Furthermore, it appears that within each clarity category they are grouped together and have similar lengths and widths. It’s looks like the better the clarity smaller the diamonds. This could be because those who are buying the diamonds prefer good clarity over the size. Another reason could be because of the intensive polishing process and be a cause to the reduction to a smaller diamond size.

PART 4: SCATTERPLOT WITH THE AVERAGE LENGTH AND WIDTH AND CLARITY

Part 5: LEGEND AND GUIDES

Part 6: DATA LABELS VS LEGEND

PART 7: INTERPRETATION

It looks like the better the clarity the smaller the diamonds. It also appears that the diamond’s sizes are more clustered together. This could be because those who are buying the diamonds prefer good clarity over the size. Another reason could be because of the intensive polishing process and be a cause to the reduction to a smaller diamond size.