Problem 1

For a data set of your choosing, make a faceted plot using the trelliscopejs package. You may make any type of plot; scatter plot, histogram, etc. but, as mentioned in the discussion below, you must explain why you chose this plot and what you are investigating about the variable you are graphing.

# load dataset
load("C:/Users/tkb50/OneDrive/Desktop/STAT 5110/diamonds.RData")

# save as csv
write.csv(diamonds, "diamonds.csv")

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.3.3
## Warning: package 'ggplot2' was built under R version 4.3.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.5.0     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(trelliscopejs)
## Warning: package 'trelliscopejs' was built under R version 4.3.3
# plot
diamonds %>%
  filter(price <10000, carat <3) %>%
  group_by(color) %>%
  mutate(n_obs = n(), desc= "Number of diamonds in this color group") %>%
  ggplot(aes(x = carat, y = price, color = cut)) +
  geom_point() +
  facet_trelliscope(
    ~ color,
    name= "Carat vs. Price by Color",
    desc = "Does the diamond's weight (carat) affect price across colors?",
    nrow = 2, ncol = 3,
    path = "trelliscope"
  )
## using data from the first layer

The trelliscope plot must include one cognostic measure of your own. Include a description of what it is and what information this measure gives.


Description 2-3 paragraphs.

Describe the data set. Explain the variable you are graphing in your plots and the reason you are investigating with it. Discuss the reason/motivation you chose the variable to facet on, and what insight or trend you are attempting to investigate. Discuss any challenges you had in making the graphs and how you dealt with these challenges. Name at least one cognostic measure (this can include the cognostic you created or be different) the reader could investigate, and explain any insight they might gain from it.

Description:

The diamonds dataset, which was pulled from DataCamp, contains information on 1,000 dimaonds, including their carat weight, cut quality, color, price, and other variables. In this assignment, I focused specificially on the relationship between carat weight and price. Carat is a measure of a diamond’s size and price is one of the most important factors when deciding to purchase. The goal of this was to look into whether heavier diamonds are consistently more expensive whether the relationship changes depending on the diamond’s color. To reduce chance of outliers, I made sure to filter diamonds priced under $10,000, and a carat weight under 3.

In the graph, carat is plotted on the x-axis and price on the y-axis, with points that are colored by the cut quality. The effect of diamond color on this relationship is being explored through faceting the graph with color group. This allows for me to compare how the relationship between carat and price varies across the different colors. I wanted to see if there were colors that caused more of a price increase. Some colors might be more popular which might explain why a diamond of the same cut and carat might be more expensive than one of a different color, but all other variables the same.

One challenge that I ran into was deciding how to appropriately filter the dataset. The original data had a wide range of carat weights and prices, which could skew the visualizations due to outliers. This was dealt with by experimenting with different values for both price and carat. This led me to choosing to limit the dataset to diamonds under 3 carats and under $10,000 in price. Another challenge was to determine what variables to use for group by and color grouping. There are several other variables that I didn’t end up using. I chose color and cut because they seemed to correlate the most with price. I also included a cognostic measure (number of observations) in each color group. This helps show hoe many diamons in each of teh facets is based on. In turn, this offers insight into the sample size and distribution of the total diamonds to each group. This allows to see whih groups have more samples than others.