Muhammad Gilang Putra Ariyanto
Student ID: 114035130
This report analyzes bicycle sales performance using an instructor-provided financial transaction database. The focus is store- and location-based performance, showing how transactional data can be aggregated and transformed into managerial insights for retail operations.
Key Objectives:
This analysis uses multiple related datasets provided by the instructor, representing a financial transaction database for bicycle sales.
Datasets included :
orderlines: base transactional records (order-level
line items)bikes: product reference (model and product
attributes)bikeshops: store reference (bikeshop and geographic
information)bike_orderlines: enriched transactional dataset
resulting from integration of transactional and reference tablesThe bike_orderlines dataset is used as the primary
analysis table because it is a consolidated dataset suitable for
time-based and store/location-based reporting.
# Import Library
library(tidyverse)
library(lubridate)
library(scales)
library(readxl)
# Load base/dimension tables (for relational database demonstration)
orderlines <- read_excel("orderlines.xlsx")
bikes <- read_excel("bikes.xlsx")
bikeshops <- read_excel("bikeshops.xlsx")
# Load enriched transactional dataset (main analysis table)
sales <- read_csv("bike_orderlines.csv", show_col_types = FALSE)
# Robust parsing and cleaning to prevent non-finite values in plots
sales <- sales %>%
mutate(
order_date = parse_date_time(order_date, orders = c("Ymd HMS", "Ymd HM", "Ymd")),
total_price = readr::parse_number(as.character(total_price))
) %>%
filter(!is.na(order_date), !is.na(total_price)) %>%
mutate(
week = floor_date(order_date, "week", week_start = 1),
month = floor_date(order_date, "month"),
quarter_label = paste0(year(order_date), "-Q", quarter(order_date))
)
glimpse(sales)
## Rows: 15,644
## Columns: 16
## $ order_date <dttm> 2011-01-07, 2011-01-07, 2011-01-10, 2011-01-10, 2011-0…
## $ order_id <dbl> 1, 1, 2, 2, 3, 3, 3, 3, 3, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7…
## $ order_line <dbl> 1, 2, 1, 2, 1, 2, 3, 4, 5, 1, 1, 2, 3, 4, 1, 2, 3, 4, 1…
## $ quantity <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1…
## $ price <dbl> 6070, 5970, 2770, 5970, 10660, 3200, 12790, 5330, 1570,…
## $ total_price <dbl> 6070, 5970, 2770, 5970, 10660, 3200, 12790, 5330, 1570,…
## $ model <chr> "Jekyll Carbon 2", "Trigger Carbon 2", "Beast of the Ea…
## $ category_1 <chr> "Mountain", "Mountain", "Mountain", "Mountain", "Road",…
## $ category_2 <chr> "Over Mountain", "Over Mountain", "Trail", "Over Mounta…
## $ frame_material <chr> "Carbon", "Carbon", "Aluminum", "Carbon", "Carbon", "Ca…
## $ bikeshop_name <chr> "Ithaca Mountain Climbers", "Ithaca Mountain Climbers",…
## $ city <chr> "Ithaca", "Ithaca", "Kansas City", "Kansas City", "Loui…
## $ state <chr> "NY", "NY", "KS", "KS", "KY", "KY", "KY", "KY", "KY", "…
## $ week <dttm> 2011-01-03, 2011-01-03, 2011-01-10, 2011-01-10, 2011-0…
## $ month <dttm> 2011-01-01, 2011-01-01, 2011-01-01, 2011-01-01, 2011-0…
## $ quarter_label <chr> "2011-Q1", "2011-Q1", "2011-Q1", "2011-Q1", "2011-Q1", …
In the financial reporting, transaction-level data is typically aggregated into different time horizons:
Additionally, this report aggregates sales by:
make_total_ts <- function(df, period_col, period_name) {
df %>%
group_by({{ period_col }}) %>%
summarise(total_sales = sum(total_price, na.rm = TRUE), .groups = "drop") %>%
filter(is.finite(total_sales)) %>%
mutate(period_type = period_name)
}
weekly_total <- make_total_ts(sales, week, "Weekly")
ggplot(weekly_total, aes(x = week, y = total_sales)) +
geom_line(linewidth = 0.95, alpha = 0.9, color = "#2c3e50") +
geom_smooth(se = FALSE, linewidth = 0.9, color = "#e74c3c") +
scale_y_continuous(labels = scales::dollar) +
labs(
title = "Total Sales (Weekly)",
subtitle = "Weekly view highlights short-term fluctuations and operational variability",
x = "Week",
y = "Total Sales (USD)"
) +
theme_minimal(base_size = 12)
monthly_total <- make_total_ts(sales, month, "Monthly")
ggplot(monthly_total, aes(x = month, y = total_sales)) +
geom_line(linewidth = 1.05, alpha = 0.9, color = "#2c3e50") +
geom_point(size = 1.3, alpha = 0.85, color = "#3498db") +
scale_y_continuous(labels = scales::dollar) +
labs(
title = "Total Sales (Monthly)",
subtitle = "Monthly aggregation supports tactical planning and budgeting checks",
x = "Month",
y = "Total Sales (USD)"
) +
theme_minimal(base_size = 12)
quarterly_total <- make_total_ts(sales, quarter_label, "Quarterly")
ggplot(quarterly_total, aes(x = quarter_label, y = total_sales, group = 1)) +
geom_line(linewidth = 1.05, alpha = 0.9, color = "#2c3e50") +
geom_point(size = 2.0, alpha = 0.9, color = "#9b59b6") +
scale_y_continuous(labels = scales::dollar) +
labs(
title = "Total Sales (Quarterly)",
subtitle = "Quarterly performance is commonly used for strategic reporting and evaluation",
x = "Quarter",
y = "Total Sales (USD)"
) +
theme_minimal(base_size = 12) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
This section evaluates revenue contribution per bikeshop, identifying top-performing and underperforming stores.
shop_sales <- sales %>%
group_by(bikeshop_name) %>%
summarise(total_sales = sum(total_price, na.rm = TRUE), .groups = "drop") %>%
arrange(desc(total_sales))
top10_shops <- shop_sales %>% slice_head(n = 10)
ggplot(top10_shops, aes(x = reorder(bikeshop_name, total_sales), y = total_sales, fill = bikeshop_name)) +
geom_col(alpha = 0.9, show.legend = FALSE) +
coord_flip() +
scale_y_continuous(labels = scales::dollar) +
labs(
title = "Top 10 Bikeshops by Total Sales",
subtitle = "Store-level performance comparison (highest to lowest)",
x = "Bikeshop",
y = "Total Sales (USD)"
) +
theme_minimal(base_size = 12)
total_all <- sum(shop_sales$total_sales, na.rm = TRUE)
top5_total <- sum(shop_sales$total_sales[1:5], na.rm = TRUE)
top10_total <- sum(shop_sales$total_sales[1:10], na.rm = TRUE)
concentration <- tibble(
group = c("Top 5 Stores", "Top 10 Stores", "All Stores"),
total_sales = c(top5_total, top10_total, total_all),
share = c(top5_total/total_all, top10_total/total_all, 1)
)
concentration
Interpretation If the top stores represent a large share of total sales, management should treat them as critical revenue drivers and ensure operational excellence and sufficient inventory allocation.
This section analyzes sales distribution across locations to support decisions such as market expansion, regional marketing, and store network optimization.
state_sales <- sales %>%
group_by(state) %>%
summarise(total_sales = sum(total_price, na.rm = TRUE), .groups = "drop") %>%
arrange(desc(total_sales))
top10_states <- state_sales %>% slice_head(n = 10)
ggplot(top10_states, aes(x = reorder(state, total_sales), y = total_sales, fill = state)) +
geom_col(alpha = 0.9, show.legend = FALSE) +
coord_flip() +
scale_y_continuous(labels = scales::dollar) +
labs(
title = "Top 10 States by Total Sales",
subtitle = "Geographic concentration of revenue across states",
x = "State",
y = "Total Sales (USD)"
) +
theme_minimal(base_size = 12)
city_sales <- sales %>%
group_by(city) %>%
summarise(total_sales = sum(total_price, na.rm = TRUE), .groups = "drop") %>%
arrange(desc(total_sales))
top10_cities <- city_sales %>% slice_head(n = 10)
ggplot(top10_cities, aes(x = reorder(city, total_sales), y = total_sales, fill = city)) +
geom_col(alpha = 0.9, show.legend = FALSE) +
coord_flip() +
scale_y_continuous(labels = scales::dollar) +
labs(
title = "Top 10 Cities by Total Sales",
subtitle = "City-level revenue comparison (highest to lowest)",
x = "City",
y = "Total Sales (USD)"
) +
theme_minimal(base_size = 12)
Weekly sales show higher short-term variability, while monthly and quarterly trends provide more stable signals for tactical and strategic decision-making.
Total sales differ substantially across bikeshops, indicating unequal revenue contribution and varying store productivity.
Total sales differ substantially across bikeshops, indicating unequal revenue contribution and varying store productivity.
Certain states and cities contribute disproportionately to total sales, suggesting strong regional markets and potential areas for expansion or increased marketing support.
Variability in sales across time and location suggests the need for flexible inventory, staffing, and promotional planning.
Allocate inventory, staffing, and marketing resources to high-performing stores to protect and grow core revenue streams.
Investigate operational issues (assortment, staffing, local market conditions) and implement targeted interventions.
Expand promotional efforts and market development in top-performing regions; consider strategic adjustments in low-performing regions.
Use monthly/quarterly trends to guide budgeting and forecasting, while using weekly monitoring to respond to short-term changes.
If revenue is highly concentrated in a few stores/regions, develop risk-mitigation plans such as diversifying markets and strengthening mid-tier stores.