Problem 1

For a data set of your choosing, make a faceted plot using the trelliscopejs package. You may make any type of plot; scatter plot, histogram, etc. but, as mentioned in the discussion below, you must explain why you chose this plot and what you are investigating about the variable you are graphing.

The trelliscope plot must include one cognostic measure of your own. Include a description of what it is and what information this measure gives.

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.3.3
## Warning: package 'ggplot2' was built under R version 4.3.3
## Warning: package 'forcats' was built under R version 4.3.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.0     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(trelliscopejs)
## Warning: package 'trelliscopejs' was built under R version 4.3.3
library(lubridate)

chocolate <- read_csv("Chocolate Sales.csv") %>%
  mutate(
    Amount = as.numeric(gsub("[$,]", "", Amount)),
    Date = dmy(Date),
    avg_amount_per_box = Amount / `Boxes Shipped`
  )
## Rows: 1094 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Sales Person, Country, Product, Date, Amount
## dbl (1): Boxes Shipped
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ggplot(chocolate, aes(x = `Boxes Shipped`, y = Amount)) +
  geom_point() +
  labs(x = "Boxes Shipped", y = "Amount ($)") +
  facet_trelliscope(
    ~ Country,
    ncol = 3,
    nrow = 2,
    name = "Chocolate Sales Analysis",
    path = "choc_output"
  )
## using data from the first layer


This project uses a transactional dataset containing chocolate sales across several countries, including variables such as the number of boxes shipped and the revenue generated per transaction. Using the TrelliscopeJS package, I constructed a faceted scatter plot of Boxes Shipped versus Amount, grouped by Country. This visualization was designed to explore whether the quantity of chocolate sold in a given transaction correlates with the total revenue, and whether that relationship varies by country.

Faceting by country allows for geographic comparisons in both volume and value of chocolate sales. In some regions like the USA and India, there appears to be a greater spread in both boxes shipped and revenue, suggesting a wide range of customer types and order sizes. Other countries such as Canada or New Zealand appear more clustered, possibly indicating more uniform distribution or pricing structures. One challenge encountered during this analysis was cleaning the Amount column, which was initially formatted as text with currency symbols. This was resolved by stripping the dollar sign and coercing the column into numeric format for analysis.

While working on this project, I had originally planned to include a custom cognostic to summarize average revenue per box for each country. However, I later discovered that the facet_trelliscope() function which was in the one used in this weeks video tutorial, doesn’t currently support user-defined cognostics. This wasn’t immediately clear at the start, so I continued with the plot as I had already began working on, relying instead on the built-in panel statistics provided by the Trelliscope viewer. The default cognostic “number of observations” provides meaningful insight. It helps the viewer immediately gauge which countries contribute the most transactional data, and therefore which plots may reveal more reliable trends. A useful cognostic would be average revenue per box, offering a standardized measure of pricing efficiency across different markets.