Bike Models and Sales I

An Analysis of Revenue Trends by Category Using R

Author

Jesus Ambriz-Sanchez

Published

October 14, 2025


Back to Projects

1 Introduction

Summary: This report analyzes bike sales performance across models, categories, and years using R. The data includes transaction records, product details, and pricing information. This allows us to identify revenue trends, top-performing models, and seasonal patterns.

Navigation: Use the Table of Contents to navigate sections and click the 🞂 Code buttons throughout the document to view or hide R code for each example.

2 Datasets

Below I show the files used for this analysis.

Code
# Resolve paths
bikes_path      <- file.path(params$data_dir, "bikes.xlsx")
bikeshops_path  <- file.path(params$data_dir, "bikeshops.xlsx")
orderlines_path <- file.path(params$data_dir, "orderlines.xlsx")

stopifnot(file.exists(bikes_path), file.exists(bikeshops_path), file.exists(orderlines_path))

bike_tbl       <- read_excel(path = bikes_path)
bikeshops_tbl  <- read_excel(path = bikeshops_path)
orderlines_tbl <- read_excel(path = orderlines_path)

2.1 Data Structure

Here is a quick peek of the raw data structure.

  • Dataset 1: Composed of 4 columns and 97 rows
  • Dataset 2: Composed of 3 columns and 30 rows
  • Dataset 3: Composed of 7 columns and 15,644 rows
Code
glimpse(bike_tbl)
Rows: 97
Columns: 4
$ bike.id     <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
$ model       <chr> "Supersix Evo Black Inc.", "Supersix Evo Hi-Mod Team", "Su…
$ description <chr> "Road - Elite Road - Carbon", "Road - Elite Road - Carbon"…
$ price       <dbl> 12790, 10660, 7990, 5330, 4260, 3940, 3200, 2660, 2240, 18…
Code
glimpse(bikeshops_tbl)
Rows: 30
Columns: 3
$ bikeshop.id   <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
$ bikeshop.name <chr> "Pittsburgh Mountain Machines", "Ithaca Mountain Climber…
$ location      <chr> "Pittsburgh, PA", "Ithaca, NY", "Columbus, OH", "Detroit…
Code
glimpse(orderlines_tbl)
Rows: 15,644
Columns: 7
$ X__1        <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "…
$ order.id    <dbl> 1, 1, 2, 2, 3, 3, 3, 3, 3, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7…
$ order.line  <dbl> 1, 2, 1, 2, 1, 2, 3, 4, 5, 1, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2…
$ order.date  <dttm> 2011-01-07, 2011-01-07, 2011-01-10, 2011-01-10, 2011-01-1…
$ customer.id <dbl> 2, 2, 10, 10, 6, 6, 6, 6, 6, 22, 8, 8, 8, 8, 16, 16, 16, 1…
$ product.id  <dbl> 48, 52, 76, 52, 2, 50, 1, 4, 34, 26, 96, 66, 35, 72, 45, 3…
$ quantity    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1…


2.2 Data Preview

First 5 rows of each dataset.

Code
bike_tbl %>% head(5)
Code
bikeshops_tbl %>% head(5)
Code
orderlines_tbl %>% head(5)


2.3 Interactive Tables

Creating an interactive table with renamed variables.

Code
tagList(
  tags$div(
    id = "table-title",
  "Table 1: Bike Models and Prices"),
bike_tbl %>%   
  rename(
    "Bike ID"     = bike.id,
    "Model"       = model,
    "Description" = description,
    "Price"       = price
  ) %>%
  datatable(options = list(pageLength = 8, scrollX = TRUE))) 
Table 1: Bike Models and Prices


Code
tagList(
  tags$div(
    id = "table-title",
  "Table 2: Bike Shops and Locations"),
bikeshops_tbl %>%
  rename(
    "Bike Shop ID"   = bikeshop.id,
    "Bike Shop Name" = bikeshop.name,
    "Location"       = location
  ) %>%
  datatable(options = list(pageLength = 8, scrollX = TRUE)))
Table 2: Bike Shops and Locations


Code
tagList(
  tags$div(
    id = "table-title",
  "Table 3: Orderlines, Dates, and Sales"),
orderlines_tbl %>%
  rename(
    "N"            = X__1,
    "Order ID"     = order.id,
    "Order"        = order.line,
    "Order Date"   = order.date,
    "Bike Shop ID" = customer.id,
    "Bike ID"      = product.id,
    "Quantity"     = quantity
  ) %>%
  datatable(options = list(pageLength = 6, scrollX = TRUE)))
Table 3: Orderlines, Dates, and Sales




3 Left Joining

Joining the orderline dataset to bikes and shops using ID keys.

Code
bike_orderlines_joined_tbl <- orderlines_tbl %>%
  left_join(bike_tbl,      by = c("product.id"  = "bike.id")) %>%
  left_join(bikeshops_tbl, by = c("customer.id" = "bikeshop.id")) 

bike_orderlines_joined_tbl %>%
  rename(
    "N"              = X__1,
    "Order ID"       = order.id,
    "Order"          = order.line,
    "Order Date"     = order.date,
    "Bike Shop ID"   = customer.id,
    "Bike ID"        = product.id,
    "Quantity"       = quantity,
    "Model"          = model,
    "Description"    = description,
    "Price"          = price,
    "Bike Shop Name" = bikeshop.name,
    "Location"       = location
  ) %>% head(10)
Tip

Click top right arrow of the table to view all columns.


3.1 Expanding Descriptors

Selecting, creating, splitting, renaming, and standardizing column names efficiently. Below is the before and after.

Code
# Before
before_cols <- colnames(bike_orderlines_joined_tbl)

bike_orderlines_wrangled_tbl <- bike_orderlines_joined_tbl %>%
  separate(description,
           into = c("category.1","category.2","frame.material"),
           sep = " - ",
           remove = TRUE) %>%
  separate(location, into = c("city", "state"), sep = ", ", remove = FALSE) %>%
  mutate(total.sale = price * quantity) %>%
  select(-X__1, -location) %>%
  select(-ends_with(".id")) %>%
  bind_cols(bike_orderlines_joined_tbl %>% select(order.id)) %>%
  select(contains("date"), everything()) %>%
  select(contains("date"), contains("id"), contains("order"),
         quantity, price, total.sale, everything()) %>%
  rename(order_date = order.date) %>%
  rename_with(tolower) %>%
  rename_with(~ stringr::str_replace_all(.x, "[.]", "_"))

after_cols <- colnames(bike_orderlines_wrangled_tbl)

tibble(
  before = before_cols,
  after  = after_cols[seq_len(min(length(after_cols), length(before_cols)))]
) %>% 
  rename("Old Names" = before) %>%
  rename("New Names" = after) %>% head(12)


3.2 Saving Progress

Saving data in Excel, CSV, and RDS format.

Code
glimpse(bike_orderlines_wrangled_tbl)
Rows: 15,644
Columns: 13
$ order_date     <dttm> 2011-01-07, 2011-01-07, 2011-01-10, 2011-01-10, 2011-0…
$ order_id       <dbl> 1, 1, 2, 2, 3, 3, 3, 3, 3, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7…
$ order_line     <dbl> 1, 2, 1, 2, 1, 2, 3, 4, 5, 1, 1, 2, 3, 4, 1, 2, 3, 4, 1…
$ quantity       <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1…
$ price          <dbl> 6070, 5970, 2770, 5970, 10660, 3200, 12790, 5330, 1570,…
$ total_sale     <dbl> 6070, 5970, 2770, 5970, 10660, 3200, 12790, 5330, 1570,…
$ model          <chr> "Jekyll Carbon 2", "Trigger Carbon 2", "Beast of the Ea…
$ category_1     <chr> "Mountain", "Mountain", "Mountain", "Mountain", "Road",…
$ category_2     <chr> "Over Mountain", "Over Mountain", "Trail", "Over Mounta…
$ frame_material <chr> "Carbon", "Carbon", "Aluminum", "Carbon", "Carbon", "Ca…
$ bikeshop_name  <chr> "Ithaca Mountain Climbers", "Ithaca Mountain Climbers",…
$ city           <chr> "Ithaca", "Ithaca", "Kansas City", "Kansas City", "Loui…
$ state          <chr> "NY", "NY", "KS", "KS", "KY", "KY", "KY", "KY", "KY", "…
Code
fs::dir_create("./00_support/data/bike_sales/data_wrangled")
 
bike_orderlines_wrangled_tbl %>%
    write_xlsx("./00_support/data/bike_sales/data_wrangled/bike_orderlines_wrangled.xlsx")
 
bike_orderlines_wrangled_tbl %>%
    write_csv("./00_support/data/bike_sales/data_wrangled/bike_orderlines_wrangled.csv")
 
bike_orderlines_wrangled_tbl %>%
    write_rds("./00_support/data/bike_sales/data_wrangled/bike_orderlines_wrangled.rds")


3.3 Finalized Data

Code
bike_orderlines_wrangled_tbl %>% head(100) %>%
    rename(
      "Order Date"  = order_date,
      "Order ID"    = order_id,
      "Orderline"   = order_line,
      "Quantity"    = quantity,
      "Price"       = price,
      "Total Sale"  = total_sale,
      "Model"       = model,
      "Category 1"  = category_1,
      "Category 2"  = category_2,
      "Material"    = frame_material,
      "Bikeshop"    = bikeshop_name,
      "City"        = city,
      "State"       = state
    )
Tip

Click top right arrow of the table to view all columns.




4 Wrangling

Pulling and extracting specific data.

4.1 Pulling Quick Stats

Code
   # Mean Price
price_avg <- bike_tbl %>%
    pull(price) %>%
    mean() %>% round(0) %>% 
    dollar()
cat("Avg Bike Price:", price_avg)
Avg Bike Price: $3,954
Code
   # Median Price
price_median <- bike_tbl %>%
    pull(price) %>%
    median() %>% dollar()
cat("Median Bike Price:", price_median)
Median Bike Price: $3,200
Code
   # Max Price
price_max <- bike_tbl %>%
    pull(price) %>%
    max() %>% dollar()
cat("Max Bike Price:", price_max)
Max Bike Price: $12,790
Code
   # Min Price
price_min <- bike_tbl %>%
    pull(price) %>%
    min() %>% dollar()
cat("Min Bike Price:", price_min)
Min Bike Price: $415
Code
   # Distinct States
n_state_count <- bike_orderlines_wrangled_tbl %>% 
    pull(state) %>%
    n_distinct()
cat("Distinct Operating States:", n_state_count)
Distinct Operating States: 21
Code
   # Total Sales
total_sales <- bike_orderlines_wrangled_tbl %>%
    pull(total_sale) %>%
    sum() %>%
    dollar()
cat("Total Sales Across All Stores:", total_sales)
Total Sales Across All Stores: $71,032,330


4.2 Renaming Columns

Creating shop revenue dataset by category_1.

Code
    # Summarizing Revenue by Shop 
bikeshop_revenue_tbl <- bike_orderlines_wrangled_tbl %>%
    select(bikeshop_name, category_1, total_sale) %>%
    group_by(bikeshop_name, category_1) %>%
    summarise(
        revenue = sum(total_sale, na.rm = TRUE),
        .groups = "drop"
    ) %>%
    arrange(desc(revenue))

4.2.1 Set Names Method 1

Code
    # Rename() 
bikeshop_revenue_tbl %>% 
    rename("Bikeshop Name" = bikeshop_name,
        "Primary Category" = category_1,
        "Total Revenue" = revenue) %>% 
    head(4)

4.2.2 Set Name Method 2

Code
    # Set_names() 
bikeshop_revenue_tbl %>%
    set_names(c("Bikeshop Name", "Primary Category", "Sales")) %>%
    head(4)

4.2.3 Set Name Method 3

Code
bikeshop_revenue_tbl %>%
    set_names(names(.) %>% str_replace("_"," ") %>% str_to_title()) %>%
    head(4)


4.3 Creating Columns

Creating a new dataframe to manipulate new columns.

4.3.1 Squared and Logged Sales

Code
    # Creating Test Table
bike_orderlines_wrangled_tbl %>%
    select(order_id, quantity, price) %>%
    mutate(total_sale = (quantity * price)) %>%
    mutate(total_price_log = log(total_sale) %>% round(3)) %>%
    mutate(total_price_sqrt = total_sale^(0.5) %>% round(3)) %>%  
    rename("Order ID" = order_id,
           "Quantity" = quantity,
           "Price" = price,
           "Total Sales" = total_sale,
           "Log Sales" = total_price_log,
           "Sqrt Sales" = total_price_sqrt)


4.4 Arranging Prices

4.4.1 Top 10 Lowest-Priced Models

Code
    # Bike Prices from Low:High
bike_tbl %>%
    select(model, price) %>%
    arrange(price) %>%
    rename("Model"  = model) %>% 
    rename("Price" = price) %>% head(10)


4.4.2 Top 10 Highest-Priced Models

Code
    # Bike Prices from High:Low
bike_tbl %>%
    select(model, price) %>%
    arrange(desc(price)) %>% 
    rename("Model"  = model) %>% 
    rename("Price" = price) %>% head(10)


4.4.3 Models Priced Above Average

Code
    # Filter Models above average price
bike_tbl %>%
    select(model, price) %>%
    filter(price > mean(price)) %>%
    rename("Model"  = model) %>% 
    rename("Price" = price)


4.4.4 Models Above $9,000 or Below $1,000

Code
    # Filter by price > 9000 | price < 1000
bike_tbl %>%
    select(model, price) %>%
    filter((price > 9000) | (price < 1000 )) %>%
    arrange(desc(price)) %>%
    rename("Model"  = model) %>% 
    rename("Price" = price)


4.4.5 Supersix Models Priced Above $6,000

Code
    # Filter by Price > 6000 & Str Name
bike_tbl %>%
    select(model, price) %>%
    filter(price > 6000, 
           model %>% str_detect("Supersix")) %>%
    rename("Model"  = model) %>% 
    rename("Price" = price)


4.5 Filtering Data

4.5.1 Listing All Category 2 Types

Code
    # Filtering in Category_2 By 1 Category
bike_orderlines_wrangled_tbl %>% 
    select(category_2) %>%
    unique() %>%
    rename("Category 2" = category_2)


4.5.2 Orders Based on One Category 2

Condition: Sport

Code
    # Filtering in Category_2 By 1 Category
bike_orderlines_wrangled_tbl %>% 
    select(order_id,model:frame_material,price,quantity,total_sale) %>%
    filter(category_2 == "Sport") %>%
    rename(
      "Order ID"   = order_id,
      "Model"      = model,
      "Category 1" = category_1,
      "Category 2" = category_2,
      "Material"   = frame_material,
      "Price"      = price,
      "Quantity"   = quantity,
      "Total Sale" = total_sale
    ) 


4.5.3 Orders Based on Two Category 2

Condition: Sport, Trail

Code
    # Filtering in Category_2 By 2 Categories
bike_orderlines_wrangled_tbl %>%
    select(order_id,model:frame_material,price,quantity,total_sale) %>%
    filter(category_2 %in% c("Trail","Sport")) %>%
    rename(
      "Order ID"   = order_id,
      "Model"      = model,
      "Category 1" = category_1,
      "Category 2" = category_2,
      "Material"   = frame_material,
      "Price"      = price,
      "Quantity"   = quantity,
      "Total Sale" = total_sale
    )
Tip

Click top right arrow of the table to view all columns.


4.5.4 Orders Based on Not One Category 2

Not Condition: Over Mountain

Code
    # Filtering in Category_2 By Not 1 Category
bike_orderlines_wrangled_tbl %>%
    select(order_id,model:frame_material,price,quantity,total_sale) %>%
    filter(category_2 != "Over Mountain") %>%
    rename(
      "Order ID"   = order_id,
      "Model"      = model,
      "Category 1" = category_1,
      "Category 2" = category_2,
      "Material"   = frame_material,
      "Price"      = price,
      "Quantity"   = quantity,
      "Total Sale" = total_sale
    )
Tip

Click top right arrow of the table to view all columns.


4.5.5 Orders Based on No Set of Category 2

Not Condition: Over Mountain, Endurance Road, Trail

Code
    # Filtering in Category_2 By No Set of Categories
bike_orderlines_wrangled_tbl %>%
    select(order_id,model:frame_material,price,quantity,total_sale) %>%
    filter(!(category_2 %in% c("Over Mountain","Trail","Endurance Road"))) %>%
    rename(
      "Order ID"   = order_id,
      "Model"      = model,
      "Category 1" = category_1,
      "Category 2" = category_2,
      "Material"   = frame_material,
      "Price"      = price,
      "Quantity"   = quantity,
      "Total Sale" = total_sale
    )


4.6 Slicing Data

4.6.1 Ascending Range

Code
      # Arrange & Select From 10 to 15
bike_tbl %>%
    arrange(bike.id) %>%
    slice(10:15) %>% 
    rename("Bike ID" = bike.id, 
           "Model"   = model, 
           "Description" = description, 
           "Price"  = price)


4.6.2 Descending Range

Code
      # Arrange & Select From Descending 10 to 15
bike_tbl %>%
    arrange(desc(bike.id)) %>%
    slice(10:15) %>%
    rename("Bike ID" = bike.id, 
           "Model"   = model, 
           "Description" = description, 
           "Price"  = price)


4.6.3 After Row Range

Code
      # Arrange & Select From After 4th row
bike_tbl %>%
    arrange(desc(bike.id)) %>%
    slice(nrow(.)-4:nrow(.)) %>% 
    rename("Bike ID" = bike.id, 
           "Model"   = model, 
           "Description" = description, 
           "Price"  = price)


4.7 Distinct Combinations

4.7.1 All Combination of One Descriptors

Code
    # Show Available Combination of Category_1
bike_orderlines_wrangled_tbl %>%
    distinct(category_1) %>%
    rename("Category 1" = category_1)


4.7.2 All Combination of Two Descriptors

Code
    # Show Available Combination of Category_1 & Category_2
bike_orderlines_wrangled_tbl %>%
    distinct(category_1, category_2) %>%
    rename("Category 1" = category_1,
           "Category 2" = category_2)


4.7.3 All Combination of Three Descriptors

Code
    # Show Available Combination of Three Descriptors
bike_orderlines_wrangled_tbl %>%
    distinct(bikeshop_name, city, state) %>%
    rename("Bike Shop" = bikeshop_name,
           "City"  = city,
           "State" = state)


4.8 Summarized Data

4.8.1 Summary by Category 1

Code
   # Summary by Category 1
bike_orderlines_wrangled_tbl %>%
  group_by(category_1) %>%
  summarise(
    avg_price = mean(price, na.rm = TRUE),
    total_quantity = sum(quantity, na.rm = TRUE) %>% round(3),
    total_sale = sum(total_sale, na.rm = TRUE) %>% round(-5)
  ) %>% 
  ungroup() %>%
  mutate(avg_price = dollar(avg_price),
         total_sale = dollar(total_sale)) %>%
  rename("Category 1" = category_1,
         "Avg Price"  = avg_price, 
         "Total Qty"  = total_quantity,
         "Total Sale" = total_sale) 


4.8.2 Summary by Category 1 and 2

Code
   # Summary by Category 1 and 2
bike_orderlines_wrangled_tbl %>%
  group_by(category_1, category_2) %>%
  summarise(
    avg_price = mean(price, na.rm = TRUE)%>% round(0),
    total_quantity = sum(quantity, na.rm = TRUE) %>% round(2),
    total_sale = sum(total_sale, na.rm = TRUE) 
  ) %>%
  ungroup() %>%
  arrange(desc(total_sale)) %>%  
  mutate(avg_price = dollar(avg_price),
         total_sale = dollar(total_sale)) %>%
  rename("Category 1" = category_1,
         "Category 2" = category_2,
         "Avg Price"  = avg_price, 
         "Total Qty"    = total_quantity,
         "Total Sale" = total_sale) 


4.8.3 Summary by Category 1 and 2 Stats

Code
   # Summary by Category 1 and 2
bike_orderlines_wrangled_tbl %>%
  group_by(category_1, category_2) %>%
  summarise(
    Count = n(),
    Average   = mean(total_sale, na.rm = TRUE) %>% round() %>% dollar(),
    Median    = median(total_sale, na.rm = TRUE)  %>% dollar(),
    "Standard Deviation"    = sd(total_sale, na.rm = TRUE) %>% round()  %>% dollar(),
    Minimum   = min(total_sale, na.rm = TRUE)  %>% dollar(),
    Maximum   = max(total_sale, na.rm = TRUE)  %>% dollar(),
    .groups = "drop"
  ) %>%
  rename(
    "Category 1" = category_1,
    "Category 2" = category_2
  ) %>%
  arrange(desc(Count))


4.8.4 Summary by Categories and Frame Material

Code
   # Summary by Categories and Frame Material
bike_orderlines_wrangled_tbl %>%
  group_by(category_1, category_2, frame_material) %>%
  summarise(
    total_sale = sum(total_sale, na.rm = TRUE),
    total_quantity = sum(quantity, na.rm = TRUE),
    avg_price = mean(price, na.rm = TRUE) %>% round(0)
  ) %>%
  ungroup() %>%
  arrange(desc(total_sale)) %>%
  mutate(
    avg_price = dollar(avg_price),
    total_sale = dollar(total_sale)
  ) %>%
  rename(
    "Category 1" = category_1,
    "Category 2" = category_2,
    "Frame Material" = frame_material,
    "Avg Price" = avg_price,
    "Total Qty" = total_quantity,
    "Total Sale" = total_sale
  ) %>% select(1:3,6:4)


4.8.5 Summary by States

Code
   # Total Number of Stores by State
bike_orderlines_wrangled_tbl %>%
  group_by(state) %>%
  summarise(
    num_stores = n_distinct(bikeshop_name),
    total_sale = sum(total_sale, na.rm = TRUE)
  ) %>%
  arrange(desc(total_sale)) %>%
  mutate(
    state = state.name[match(state, state.abb)],
    total_sale = dollar(total_sale)
  ) %>%
  rename(
    "State"       = state,
    "Store Count" = num_stores,
    "Total Sale"  = total_sale
  )


4.8.6 Summary by Stores

Code
   # Totals Sales By Bikeshop
bike_orderlines_wrangled_tbl %>%
  group_by(bikeshop_name) %>%
  summarise(
    avg_price = mean(price, na.rm = TRUE),
    total_qty = sum(quantity, na.rm = TRUE),
    total_sale = sum(total_sale, na.rm = TRUE)
  ) %>%
  arrange(desc(total_sale)) %>%
  mutate(
    avg_price = dollar(avg_price),
    total_sale = dollar(total_sale)) %>% 
    rename("Bike Shop" = bikeshop_name,
         "Avg Price"  = avg_price, 
         "Avg Qty"    = total_qty,
         "Total Sale" = total_sale)




5 Revenue

5.1 Revenue by Year

Using ggplot functions to graph an overview of total revenue by year.

Code
sales_by_year_tbl <- bike_orderlines_wrangled_tbl %>% 
  select(order_date, total_sale) %>%
  mutate(year = year(order_date)) %>%
  group_by(year) %>%
  summarize(annual_sale = sum(total_sale), .groups = "drop") %>%
  mutate(annual_sales_dtxt = scales::dollar(annual_sale)) 

sales_by_year_tbl %>% 
  select(year, annual_sales_dtxt) %>%
  rename("Year" = year) %>%
  rename("Total Annual Revenue" = annual_sales_dtxt) %>%
  head(5)
Code
sales_by_year_tbl %>%
  ggplot(aes(x = year, y = annual_sale)) +
  geom_col(fill = "#008cba") +
  geom_label(aes(label = annual_sales_dtxt), size = 3) +
  theme_tq() +
  scale_y_continuous(labels = scales::dollar) +
  labs(title = "Revenue by Year", subtitle = "Upward Trend", x = "", y = "Revenue")


5.2 Revenue by Year and Category

Breaking down total sales by year and bike category.

Code
sales_by_year_cat_tbl <- bike_orderlines_wrangled_tbl %>% 
  select(order_date, total_sale, category_2) %>%
  mutate(year = year(order_date)) %>%
  group_by(year, category_2) %>%
  summarize(cat_annual_sales = sum(total_sale), .groups = "drop") %>%
  mutate(cat_annual_dtxt = scales::dollar(cat_annual_sales))

sales_by_year_cat_tbl %>%
  select(-cat_annual_sales) %>% 
  rename("Year" = year) %>%
  rename("Category" = category_2) %>%
  rename("Total Revenue" = cat_annual_dtxt)


5.3 Average Revenue by Category

Total revenue and average sales based on bike category.

Code
category_summary_tbl <- sales_by_year_cat_tbl %>%
  group_by(category_2) %>%
  summarize(
    total_sales = sum(cat_annual_sales, na.rm = TRUE),
    avg_sales   = mean(cat_annual_sales, na.rm = TRUE),
    years       = n_distinct(year)
  ) %>%
  arrange(desc(total_sales)) %>%
  mutate(
    total_sales_dtxt = scales::dollar(total_sales),
    avg_sales_dtxt   = scales::dollar(avg_sales)
  )

category_summary_tbl %>%
  select(years, category_2, -total_sales, -avg_sales, avg_sales_dtxt, total_sales_dtxt) %>% 
  rename("N Years" = years) %>%
  rename("Category" = category_2) %>%
  rename("Average Revenue" = avg_sales_dtxt) %>%
  rename("Total Revenue" = total_sales_dtxt)


5.4 Graphing Average Sales by Year

Graphing average sales using ggplot functions.

Code
category_summary_tbl %>%
  ggplot(aes(
    x = avg_sales,
    y = reorder(category_2, avg_sales)
  )) +
  geom_col(fill = "#008cba") +
  geom_label(aes(label = avg_sales_dtxt),
             hjust = -0.1,
             size = 3.5) +
  scale_x_continuous(
    labels = scales::dollar,
    limits = c(0, 5000000),   
    breaks = seq(0, 5000000, by = 1000000)
  ) +
  labs(
    title = "Average Annual Revenue by Category",
    subtitle = "Ranked highest to lowest",
    x = "Average Annual Revenue",
    y = "Category"
  ) +
  theme_tq() +
  theme(
    plot.title = element_text(face = "bold"),
    plot.subtitle = element_text(color = "gray30")
  ) +
  coord_cartesian(clip = "off")


5.5 Side-by-Side Comparison

Top two highest-performing bike categories based on total annual sales.

Code
# Identify top 2 categories by total sales
top2 <- sales_by_year_cat_tbl %>%
  group_by(category_2) %>%
  summarize(total_sales = sum(cat_annual_sales, na.rm = TRUE), .groups = "drop") %>%
  slice_max(total_sales, n = 2) %>%
  pull(category_2)

# Plot only top 4 categories
sales_by_year_cat_tbl %>%
  filter(category_2 %in% top2) %>% 
  ggplot(aes(x = year, y = cat_annual_sales, fill = category_2)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ category_2, ncol = 2, scales = "fixed") +
  theme_tq() +
  scale_fill_tq(drop = TRUE) +
  scale_y_continuous(
    labels = label_dollar(scale_cut = cut_short_scale()) 
  ) +
  labs(
    title = "Side-by-Side Comparison: Revenue by Year",
    subtitle = " Cross Country Race vs Elite",
    x = "Year",
    y = "Revenue"
  ) +
  theme(legend.position = "none")


5.6 Graphing Multiple Categories

Top 4 high-performing bike categories based on total annual sales within multiple charts in an all-in-one graph.

Code
# Identify top 4 categories by total sales
top4 <- sales_by_year_cat_tbl %>%
  group_by(category_2) %>%
  summarise(total_sales = sum(cat_annual_sales, na.rm = TRUE), .groups = "drop") %>%
  slice_max(total_sales, n = 4) %>%
  pull(category_2)

# Plot top 4 categories
sales_by_year_cat_tbl %>%
  filter(category_2 %in% top4) %>%
  ggplot(aes(x = year, y = cat_annual_sales, fill = category_2)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ category_2, ncol = 2, scales = "fixed") +
  theme_tq() +
  scale_fill_tq(drop = TRUE) +
  scale_y_continuous(
    labels = scales::label_dollar(scale_cut = scales::cut_short_scale()) 
  ) +
  labs(
    title = "Top 4 Categories: Revenue by Year",
    subtitle = "Highest-performing categories based on total sales",
    x = "Year",
    y = "Revenue"
  ) +
  theme(legend.position = "none")


5.7 Grouped Categories by Year

Bottom 4 Lowest-performing categories based on total annual sales in an all-in-one graph.

Code
# Identify bottom 4 categories by total sales
bottom4 <- sales_by_year_cat_tbl %>%
  group_by(category_2) %>%
  summarise(total_sales = sum(cat_annual_sales, na.rm = TRUE), .groups = "drop") %>%
  slice_min(total_sales, n = 4) %>%
  pull(category_2)

# Plot all bottom 4 categories grouped by year
sales_by_year_cat_tbl %>%
  filter(category_2 %in% bottom4) %>%
  ggplot(aes(x = factor(year), y = cat_annual_sales, fill = category_2)) +
  geom_col(position = "dodge") +   
  geom_text(
    aes(label = scales::dollar(cat_annual_sales, scale_cut = scales::cut_short_scale())),
    position = position_dodge(width = 0.9),
    vjust = -0.3,
    size = 2.5
  ) +
  theme_tq() +
  scale_fill_tq() +
  scale_y_continuous(labels = scales::label_dollar(scale_cut = scales::cut_short_scale())) +
  labs(
  title = "Bottom 4 Categories by Annual Revenue",
  subtitle = "Grouped by year for direct category comparison",
    x = "Year",
    y = "Revenue",
    fill = "Category"
  )


5.8 Facet Wrap Reporting

Reporting performances of all categories in one.

Code
# Calculate total revenue by category (descending)
ordered_cats <- sales_by_year_cat_tbl %>%
  group_by(category_2) %>%
  summarise(total_sales = sum(cat_annual_sales, na.rm = TRUE), .groups = "drop") %>%
  arrange(desc(total_sales)) %>%
  pull(category_2)

# Plot with facets ordered high 
sales_by_year_cat_tbl %>%
  mutate(category_2 = factor(category_2, levels = ordered_cats)) %>%
  ggplot(aes(x = year, y = cat_annual_sales, fill = category_2)) +
  geom_col() +
  facet_wrap(~ category_2, ncol = 3, scales = "free_y") +
  theme_tq() +
  scale_fill_tq() +
  scale_y_continuous(
    labels = scales::label_dollar(scale_cut = scales::cut_short_scale())
  ) +
  labs(
    title = "Revenue by Year and Category",
    subtitle = "Bike categories have their own scaled revenue axis",
    x = "",
    y = "Revenue",
    fill = "Category"
  ) +
  theme(legend.position = "none")




6 Wide Data

Grouping category_1 (mountain and road) bike sales by bike shop and converting to wide format to calculate total revenue.

Code
bike_orderlines_wrangled_tbl %>% glimpse()
Rows: 15,644
Columns: 13
$ order_date     <dttm> 2011-01-07, 2011-01-07, 2011-01-10, 2011-01-10, 2011-0…
$ order_id       <dbl> 1, 1, 2, 2, 3, 3, 3, 3, 3, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7…
$ order_line     <dbl> 1, 2, 1, 2, 1, 2, 3, 4, 5, 1, 1, 2, 3, 4, 1, 2, 3, 4, 1…
$ quantity       <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1…
$ price          <dbl> 6070, 5970, 2770, 5970, 10660, 3200, 12790, 5330, 1570,…
$ total_sale     <dbl> 6070, 5970, 2770, 5970, 10660, 3200, 12790, 5330, 1570,…
$ model          <chr> "Jekyll Carbon 2", "Trigger Carbon 2", "Beast of the Ea…
$ category_1     <chr> "Mountain", "Mountain", "Mountain", "Mountain", "Road",…
$ category_2     <chr> "Over Mountain", "Over Mountain", "Trail", "Over Mounta…
$ frame_material <chr> "Carbon", "Carbon", "Aluminum", "Carbon", "Carbon", "Ca…
$ bikeshop_name  <chr> "Ithaca Mountain Climbers", "Ithaca Mountain Climbers",…
$ city           <chr> "Ithaca", "Ithaca", "Kansas City", "Kansas City", "Loui…
$ state          <chr> "NY", "NY", "KS", "KS", "KY", "KY", "KY", "KY", "KY", "…


6.1 Grouping Sales by Mountain and Road

Creating wide bike shop revenue data.

Code
bikeshop_revenue_wide_tbl <- bike_orderlines_wrangled_tbl %>%   
  filter(category_1 %in% c("Mountain", "Road")) %>%    
  mutate(`Bikeshop Name` = bikeshop_name) %>%
  group_by(`Bikeshop Name`, category_1) %>%
  summarise(Revenue = sum(total_sale, na.rm = TRUE), .groups = "drop") %>%
  pivot_wider(
    names_from  = category_1,
    values_from = Revenue,
    values_fill = 0
  ) %>%
  mutate(
    Mountain = coalesce(Mountain, 0),
    Road     = coalesce(Road, 0),
    Total    = Mountain + Road
  ) %>%
  select(`Bikeshop Name`, Mountain, Road, Total) %>%
  arrange(desc(Total))

bikeshop_revenue_wide_tbl %>% glimpse()
Rows: 30
Columns: 4
$ `Bikeshop Name` <chr> "Kansas City 29ers", "Denver Bike Shop", "Ithaca Mount…
$ Mountain        <dbl> 9215950, 6038105, 4660325, 2257290, 868405, 1171280, 6…
$ Road            <dbl> 2319505, 1659565, 1639010, 1911245, 2581635, 1902335, …
$ Total           <dbl> 11535455, 7697670, 6299335, 4168535, 3450040, 3073615,…
Code
bikeshop_revenue_wide_tbl %>% head(30)


6.2 Graphing Sales by Mountain and Road

Code
max_tot <- max(bikeshop_revenue_wide_tbl$Total)

bikeshop_revenue_wide_tbl %>%
  slice_max(Total, n = 15, with_ties = FALSE) %>%
  pivot_longer(c(Mountain, Road), names_to = "Category", values_to = "Revenue") %>%
  mutate(Category = factor(Category, levels = c("Road", "Mountain"))) %>%
  ggplot(aes(y = reorder(`Bikeshop Name`, Total), x = Revenue, fill = Category)) +
  geom_col() +
  geom_label(
    data = \(d) d %>% group_by(`Bikeshop Name`) %>% summarise(Total = sum(Revenue), .groups = "drop"),
    aes(x = Total, y = `Bikeshop Name`,
        label = scales::dollar(Total, scale = 1e-6, suffix = "M")),
    inherit.aes = FALSE,
    hjust = -0.2, 
    size = 2.5,
    fill = "white",
    color = "black",
    label.size = 0.2
  ) +
  scale_fill_manual(values = c("Mountain" = "#008cba", "Road" = "#8bd3e8")) +
  scale_x_continuous(
    labels  = scales::label_dollar(scale = 1e-6, suffix = "M"),
    breaks  = scales::breaks_width(1000000),
    limits  = c(0, max_tot * 1.1)
  ) +
  labs(
    title = "Top 15 Bikeshops",
    subtitle = "Combinding Total Mountain and Road Sales",
    x = "Revenue",
    y = "Bikeshop Name"
  ) +
  theme_tq() +
  coord_cartesian(clip = "off")


6.3 Save Progress

Saving data in Excel, CSV, and RDS format.

Code
bikeshop_revenue_wide_tbl %>%
    write_xlsx("./00_support/data/bike_sales/data_wrangled/bikeshop_revenue_wide_tbl.xlsx")
 
bike_orderlines_wrangled_tbl %>%
    write_csv("./00_support/data/bike_sales/data_wrangled/bikeshop_revenue_wide_tbl.csv")
 
bike_orderlines_wrangled_tbl %>%
    write_rds("./00_support/data/bike_sales/data_wrangled/bikeshop_revenue_wide_tbl.rds")

7 Appendix

7.1 Export Data

Original Data

New Data

7.2 Libraries

  • tidyverse
  • tidyquant
  • lubridate
  • htmltools
  • stringr
  • tibble
  • writexl
  • readr
  • readxl
  • scales
  • DT
  • fs

7.3 Functions

Below is a list of all functions used in this project. Learn more by prefixing the function with ? in R.

  • aes()
  • arrange()
  • bind_cols()
  • breaks_width()
  • cat()
  • coalesce()
  • colnames()
  • contains()
  • coord_cartesian()
  • cut_short_scale()
  • datatable()
  • desc()
  • dollar()
  • element_text()
  • ends_with()
  • facet_wrap()
  • factor()
  • file.exists()
  • file.path()
  • filter()
  • fs::dir_create()
  • geom_col()
  • geom_label()
  • geom_text()
  • ggplot()
  • glimpse()
  • group_by()
  • head()
  • label_dollar()
  • labs()
  • left_join()
  • max()
  • mean()
  • median()
  • min()
  • mutate()
  • n_distinct()
  • params
  • pivot_longer()
  • pivot_wider()
  • position_dodge()
  • pull()
  • read_excel()
  • rename()
  • rename_with()
  • reorder()
  • round()
  • scale_fill_manual()
  • scale_fill_tq()
  • scale_x_continuous()
  • scale_y_continuous()
  • select()
  • separate()
  • slice_max()
  • slice_min()
  • stopifnot()
  • str_detect()
  • str_replace_all()
  • sum()
  • tagList()
  • tags$div()
  • theme()
  • theme_tq()
  • tibble()
  • tolower()
  • unique()
  • write_csv()
  • write_rds()
  • write_xlsx()
  • year()