Final Homework - Financial Database & Management Applications

1 Overview

This report demonstrates how a financial transaction database can be transformed into actionable managerial insights using R. Using provided relational datasets, the analysis focuses on:

Time-based financial reporting (monthly/quarterly trends)
Operational KPIs (orders, units, average order value)
Store and geographic performance (bikeshops, states/cities)
Sales concentration (Pareto effect) and planning implications

2 Data Source

This analysis uses multiple related datasets provided by the instructor, representing a financial transaction database for bicycle sales:

orderlines:base transactional data (line-level transactions)
bikes: product reference data (product attributes and categories)
bikes: product reference data (product attributes and categories)
bike_orderlines: enriched transactional dataset integrating transactions with product and store information

The bike_orderlines dataset is used as the primary analytical table because it supports consolidated reporting and managerial dashboards.

# Import Library

library(tidyverse)
library(lubridate)
library(scales)
library(readxl)

# Load tables (demonstrating relational structure)

orderlines <- read_excel("orderlines.xlsx")
bikes <- read_excel("bikes.xlsx")
bikeshops <- read_excel("bikeshops.xlsx")

# Main analysis table

sales <- read_csv("bike_orderlines.csv", show_col_types = FALSE)

# Robust parsing + cleaning

sales <- sales %>%
mutate(
order_date = parse_date_time(order_date, orders = c("Ymd HMS", "Ymd HM", "Ymd")),
total_price = readr::parse_number(as.character(total_price))
) %>%
filter(!is.na(order_date), !is.na(total_price)) %>%
mutate(
year = year(order_date),
month = floor_date(order_date, "month"),
quarter_label = paste0(year(order_date), "-Q", quarter(order_date))
)

glimpse(sales)

## Rows: 15,644
## Columns: 16
## $ order_date     <dttm> 2011-01-07, 2011-01-07, 2011-01-10, 2011-01-10, 2011-0…
## $ order_id       <dbl> 1, 1, 2, 2, 3, 3, 3, 3, 3, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7…
## $ order_line     <dbl> 1, 2, 1, 2, 1, 2, 3, 4, 5, 1, 1, 2, 3, 4, 1, 2, 3, 4, 1…
## $ quantity       <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1…
## $ price          <dbl> 6070, 5970, 2770, 5970, 10660, 3200, 12790, 5330, 1570,…
## $ total_price    <dbl> 6070, 5970, 2770, 5970, 10660, 3200, 12790, 5330, 1570,…
## $ model          <chr> "Jekyll Carbon 2", "Trigger Carbon 2", "Beast of the Ea…
## $ category_1     <chr> "Mountain", "Mountain", "Mountain", "Mountain", "Road",…
## $ category_2     <chr> "Over Mountain", "Over Mountain", "Trail", "Over Mounta…
## $ frame_material <chr> "Carbon", "Carbon", "Aluminum", "Carbon", "Carbon", "Ca…
## $ bikeshop_name  <chr> "Ithaca Mountain Climbers", "Ithaca Mountain Climbers",…
## $ city           <chr> "Ithaca", "Ithaca", "Kansas City", "Kansas City", "Loui…
## $ state          <chr> "NY", "NY", "KS", "KS", "KY", "KY", "KY", "KY", "KY", "…
## $ year           <dbl> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2…
## $ month          <dttm> 2011-01-01, 2011-01-01, 2011-01-01, 2011-01-01, 2011-0…
## $ quarter_label  <chr> "2011-Q1", "2011-Q1", "2011-Q1", "2011-Q1", "2011-Q1", …

3 Data Management & Aggregation

Transaction-level data is aggregated into different reporting horizons:

Monthly: tactical budgeting, forecasting, seasonality tracking
Quarterly: strategic executive reporting
Store/Location: operational and market-level performance evaluation

Then i also compute operational KPIs commonly used in retail management:

Total revenue (Sales)
Number of unique orders (Orders)
Total units sold (Units)
Average order value (AOV = Sales / Orders)
Average items per order (Units / Orders)

make_monthly_kpis <- function(df) {
df %>%
group_by(month) %>%
summarise(
sales = sum(total_price, na.rm = TRUE),
orders = n_distinct(order_id),
units = sum(quantity, na.rm = TRUE),
.groups = "drop"
) %>%
mutate(
aov = sales / orders,
items_per_order = units / orders
)
}

make_quarterly_sales <- function(df) {
df %>%
group_by(quarter_label) %>%
summarise(
sales = sum(total_price, na.rm = TRUE),
orders = n_distinct(order_id),
.groups = "drop"
)
}

monthly_kpis <- make_monthly_kpis(sales)
quarterly_sales <- make_quarterly_sales(sales)

4 KPI Snapshots (Overall)

kpi_overall <- sales %>%
summarise(
total_sales = sum(total_price, na.rm = TRUE),
total_orders = n_distinct(order_id),
total_units = sum(quantity, na.rm = TRUE),
aov = total_sales / total_orders,
items_per_order = total_units / total_orders
)

kpi_overall

5 Total Sales Trends

5.1 Monthly Sales Trend with 3-Month Moving Average

monthly_plot <- monthly_kpis %>%
arrange(month) %>%
mutate(
sales_ma3 = zoo::rollmean(sales, k = 3, fill = NA, align = "right")
)

ggplot(monthly_plot, aes(x = month, y = sales)) +
geom_line(linewidth = 1.0, alpha = 0.9, color = "#2c3e50") +
geom_line(aes(y = sales_ma3), linewidth = 1.0, alpha = 0.9, color = "#e74c3c") +
scale_y_continuous(labels = scales::dollar) +
labs(
title = "Monthly Total Sales with 3-Month Moving Average",
subtitle = "Moving average highlights the underlying trend and reduces short-term noise",
x = "Month",
y = "Sales (USD)"
) +
theme_minimal(base_size = 12)

5.2 Quarterly Sales (Executive View)

ggplot(quarterly_sales, aes(x = quarter_label, y = sales, group = 1)) +
geom_line(linewidth = 1.0, alpha = 0.9, color = "#2c3e50") +
geom_point(size = 2.0, alpha = 0.9, color = "#9b59b6") +
scale_y_continuous(labels = scales::dollar) +
labs(
title = "Quarterly Sales (Executive Reporting View)",
subtitle = "Quarterly aggregation supports strategic performance evaluation",
x = "Quarter",
y = "Sales (USD)"
) +
theme_minimal(base_size = 12) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))

5.3 Year-over-Year (YoY) Growth (Monthly)

yoy <- monthly_kpis %>%
mutate(
month_num = month(month),
year_num  = year(month)
) %>%
group_by(month_num) %>%
arrange(year_num, .by_group = TRUE) %>%
mutate(
sales_prev_year = lag(sales, 1),
yoy_growth = (sales - sales_prev_year) / sales_prev_year
) %>%
ungroup()

ggplot(yoy, aes(x = month, y = yoy_growth)) +
geom_hline(yintercept = 0, linewidth = 0.6, alpha = 0.6) +
geom_line(linewidth = 0.9, alpha = 0.9, color = "#3498db") +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
labs(
title = "Monthly Year-over-Year (YoY) Sales Growth",
subtitle = "YoY growth helps separate seasonality from structural growth",
x = "Month",
y = "YoY Growth"
) +
theme_minimal(base_size = 12)

6 Seasonality Analysis

A seasonality heatmap provides a compact view of recurring monthly patterns and helps managers align forecasting and promotional planning.

season <- monthly_kpis %>%
mutate(
year = year(month),
mon  = month(month, label = TRUE, abbr = TRUE)
)

ggplot(season, aes(x = mon, y = factor(year), fill = sales)) +
geom_tile(alpha = 0.95) +
scale_fill_gradient(labels = scales::dollar) +
labs(
title = "Seasonality Heatmap (Monthly Sales)",
subtitle = "Highlights recurring peak months across years",
x = "Month",
y = "Year",
fill = "Sales"
) +
theme_minimal(base_size = 12)

7 Store Performance Analysis

This section evaluates store-level contributions and identifies both revenue drivers and operational improvement opportunities.

7.1 Top 10 Bikeshops by Total Sales

store_perf <- sales %>%
group_by(bikeshop_name) %>%
summarise(
sales = sum(total_price, na.rm = TRUE),
orders = n_distinct(order_id),
units = sum(quantity, na.rm = TRUE),
.groups = "drop"
) %>%
mutate(
aov = sales / orders
) %>%
arrange(desc(sales))

top10_stores <- store_perf %>% slice_head(n = 10)

ggplot(top10_stores, aes(x = reorder(bikeshop_name, sales), y = sales, fill = bikeshop_name)) +
geom_col(alpha = 0.9, show.legend = FALSE) +
coord_flip() +
scale_y_continuous(labels = scales::dollar) +
labs(
title = "Top 10 Bikeshops by Total Sales",
subtitle = "Identifies key revenue-driving stores",
x = "Bikeshop",
y = "Sales (USD)"
) +
theme_minimal(base_size = 12)

7.2 Sales Concentration (Pareto-Style)

pareto <- store_perf %>%
arrange(desc(sales)) %>%
mutate(
cum_sales = cumsum(sales),
total_sales = sum(sales),
cum_share = cum_sales / total_sales,
rank = row_number()
)

ggplot(pareto, aes(x = rank, y = cum_share)) +
geom_line(linewidth = 1.0, alpha = 0.9, color = "#e74c3c") +
scale_y_continuous(labels = percent_format(accuracy = 1), limits = c(0,1)) +
labs(
title = "Sales Concentration Curve (Cumulative Share by Store Rank)",
subtitle = "Shows whether revenue is concentrated in a small number of stores",
x = "Store Rank (highest sales to lowest)",
y = "Cumulative Sales Share"
) +
theme_minimal(base_size = 12)

7.3 Top 10 Stores by Average Order Value (AOV)

top10_aov <- store_perf %>%
filter(orders >= 50) %>%        # avoid unstable AOV for very small-order stores
arrange(desc(aov)) %>%
slice_head(n = 10)

ggplot(top10_aov, aes(x = reorder(bikeshop_name, aov), y = aov, fill = bikeshop_name)) +
geom_col(alpha = 0.9, show.legend = FALSE) +
coord_flip() +
scale_y_continuous(labels = scales::dollar) +
labs(
title = "Top 10 Bikeshops by Average Order Value (AOV)",
subtitle = "Highlights stores generating higher-value transactions (filtered: orders ≥ 50)",
x = "Bikeshop",
y = "AOV (USD)"
) +
theme_minimal(base_size = 12)

7.4 Geographic Performance Analysis

This section evaluates sales distribution by location to support market development and regional planning.

state_perf <- sales %>%
group_by(state) %>%
summarise(
sales = sum(total_price, na.rm = TRUE),
orders = n_distinct(order_id),
.groups = "drop"
) %>%
mutate(aov = sales / orders) %>%
arrange(desc(sales))

top10_states <- state_perf %>% slice_head(n = 10)

ggplot(top10_states, aes(x = reorder(state, sales), y = sales, fill = state)) +
geom_col(alpha = 0.9, show.legend = FALSE) +
coord_flip() +
scale_y_continuous(labels = scales::dollar) +
labs(
title = "Top 10 States by Total Sales",
subtitle = "Identifies key geographic revenue markets",
x = "State",
y = "Sales (USD)"
) +
theme_minimal(base_size = 12)

8 Key Managerial Insights and Recommendations

8.1 Key Managerial Insights and Recommendations

Trend and stability:

Monthly and quarterly aggregates provide more stable signals than weekly figures; moving averages clarify the underlying direction of performance.

Seasonality:

Heatmap patterns suggest recurring peak months, supporting improved forecasting, staffing, and promotion planning.

Store concentration:

The concentration curve indicates whether a small set of stores drives a large share of revenue, which has implications for operational risk and resource prioritization.

Store value quality:

Stores with high AOV may represent stronger upsell performance or higher-end customer profiles; these stores may require tailored assortment and premium service strategies.

Geographic markets:

Certain states contribute disproportionately to total sales, indicating high-potential markets for targeted investment.

8.2 Managerial Recommendations

Use KPI-based monitoring:

Track Sales, Orders, Units, AOV, and Items/Order monthly to support budgeting and performance management.

Plan around seasonality:

Align inventory, staffing, and promotions with recurring peak months identified in the heatmap.

Protect key revenue drivers:

Prioritize service levels and inventory availability at top stores; develop contingency plans if revenue is highly concentrated.

Scale best practices:

Replicate practices from high-AOV/high-performance stores across mid-tier stores to improve overall efficiency.

Regional strategy:

Invest in high-performing states with targeted marketing and store support; review underperforming regions for operational or market constraints.

Final Homework - Financial Database & Management Applications

Davin Imanullah
Student ID: 112031136

2025-12-23

1 Overview

2 Data Source

3 Data Management & Aggregation

4 KPI Snapshots (Overall)

5 Total Sales Trends

5.1 Monthly Sales Trend with 3-Month Moving Average

5.2 Quarterly Sales (Executive View)

5.3 Year-over-Year (YoY) Growth (Monthly)

6 Seasonality Analysis

7 Store Performance Analysis

7.1 Top 10 Bikeshops by Total Sales

7.2 Sales Concentration (Pareto-Style)

7.3 Top 10 Stores by Average Order Value (AOV)

7.4 Geographic Performance Analysis

8 Key Managerial Insights and Recommendations

8.1 Key Managerial Insights and Recommendations

8.2 Managerial Recommendations

Final Homework - Financial Database & Management Applications

Davin Imanullah Student ID: 112031136

2025-12-23

1 Overview

2 Data Source

3 Data Management & Aggregation

4 KPI Snapshots (Overall)

5 Total Sales Trends

5.1 Monthly Sales Trend with 3-Month Moving Average

5.2 Quarterly Sales (Executive View)

5.3 Year-over-Year (YoY) Growth (Monthly)

6 Seasonality Analysis

7 Store Performance Analysis

7.1 Top 10 Bikeshops by Total Sales

7.2 Sales Concentration (Pareto-Style)

7.3 Top 10 Stores by Average Order Value (AOV)

7.4 Geographic Performance Analysis

8 Key Managerial Insights and Recommendations

8.1 Key Managerial Insights and Recommendations

8.2 Managerial Recommendations

Davin Imanullah
Student ID: 112031136