For a data set of your choosing, make a faceted plot using the trelliscopejs package. You may make any type of plot; scatter plot, histogram, etc. but, as mentioned in the discussion below, you must explain why you chose this plot and what you are investigating about the variable you are graphing.
The trelliscope plot must include one cognostic measure of your own. Include a description of what it is and what information this measure gives.
library(ggplot2)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ lubridate 1.9.4 ✔ tibble 3.3.0
## ✔ purrr 1.2.0 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# install.packages("trelliscope")
library(trelliscopejs)
## This package is no longer maintained. Please use the 'trelliscope' package instead (see https://github.com/trelliscope/).
#View(diamonds)
### create plot
diamonds %>%
group_by(cut) %>%
mutate(number_diamonds= cog(n(),
desc = "number of observations per cut",
default_label = TRUE)) %>%
ggplot( aes( x = carat, y = price )) +
geom_point() +
facet_trelliscope(~ cut,
name = "Price Vs Carat",
desc = "faceted by cut (quality of diamond's cut)",
nrow = 2, ncol = 3,
scales = c("same", "same"),
self_contained = TRUE,
path = "."
)
## using data from the first layer
# Information for description of dataset
colnames(diamonds)
## [1] "carat" "cut" "color" "clarity" "depth" "table" "price"
## [8] "x" "y" "z"
str(diamonds)
## tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)
## $ carat : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
## $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
## $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
## $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
## $ depth : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
## $ table : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
## $ price : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
## $ x : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
## $ y : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
## $ z : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
table(diamonds$cut)
##
## Fair Good Very Good Premium Ideal
## 1610 4906 12082 13791 21551
Description 2-3 paragraphs.
Describe the data set. Explain the variable you are graphing in your plots and the reason you are investigating with it. Discuss the reason/motivation you chose the variable to facet on, and what insight or trend you are attempting to investigate. Discuss any challenges you had in making the graphs and how you dealt with these challenges. Name at least one cognostic measure (this can include the cognostic you created or be different) the reader could investigate, and explain any insight they might gain from it.
For this assignment, I decided to use the diamonds data set that can be found in R. This data set includes the variables carat, cut, color, clarity, depth, table, price, x, y, and z. There are 53,940 observations with 1610 having cut = fair, 4906 having cut = Good, 12082 having cut = very good, 13791 having cut = premium, 21551 having cut = ideal. I selected this data set because I wanted to investigate the relationship between price and carat for each of the quality of cut levels. In my plots, I graphed carat as my x value and price as my y value. The reason why I decided to investigate with a scatter plot is because I wanted to see if the price of the diamond increases as the carat of the diamond increases. My plots were then faceted on cut. I chose this variable because I was hoping to view how the relationship between price and carat differs depending on the quality of the diamond’s cut.
Overall, I ran into a few challenges when making the graphs. First of all, I initially was unable to install the trelliscopejs package. I continuously received an error that the package was not available on my version of R, even though I was on the latest version. Eventually, I was able to troubleshoot this error. Once I created my graphs, I realized that I needed to use the scales argument to ensure that my graphs were able to be compared to one another. I decided to change the title of my trelliscope graph and ran into an issue where I could not see the latest version. I deleted the HTML, lib, and appfiles in my folder and reran the code. This solved the issue.
Readers viewing my graphs can easily investigate the number of diamonds, or observations, in each plot. I created number_diamonds as a cognostic measure. I included “default_label = TRUE” in my code to ensure that this cognostic measure would always be displayed on each plot. Now, readers can use the filter tab to filter over the number of diamonds in a plot. For instance, if the reader only wants to view plots that have over 10000 observations, this is easily accessible. They may only want to analyze the relationship between Price and Carat for the cut qualities with a large number of observations. This cognostic measure allows them to gain insights for these type of plots.
knit the file to an html document
publish this to an RPubs page.
grading: trelliscope plot[25 points], discussion[25 points]
Note: you can add a url directly to the text and it will be active in the html (and word document if you knit to that)