This Quiz 3 report extends our team’s Mini Project 2 by focusing on
store-level variability and ranking-based
insights that were not explored in detail in the group
analysis.
The goal is to show evidence (one small table/plot per question) and
interpret what it means for managerial decisions.
# ------------------------------------------------------------
# 1.1 Load the Excel file
# ------------------------------------------------------------
# IMPORTANT: Keep WalmartData.xlsx in the same folder as this RMD file.
walmart_raw <- read_excel("WalmartData.xlsx")
# Quick check: column names + types
glimpse(walmart_raw)
## Rows: 6,435
## Columns: 8
## $ Store <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1…
## $ Date <chr> "05-02-2010", "12-02-2010", "19-02-2010", "26-02-2010", "…
## $ Weekly_Sales <chr> "1643690.9", "1641957.44", "1611968.17", "1409727.59", "1…
## $ Holiday_Flag <chr> "0", "1", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0…
## $ Temperature <chr> "42.31", "38.51", "39.93", "46.63", "46.5", "57.79", "54.…
## $ Fuel_Price <chr> "2.572", "2.548", "2.514", "2.561", "2.625", "2.667", "2.…
## $ CPI <chr> "211.0963582", "211.2421698", "211.2891429", "211.3196429…
## $ Unemployment <chr> "8.106", "8.106", "8.106", "8.106", "8.106", "8.106", "8.…
# ------------------------------------------------------------
# 1.2 Clean and prepare
# ------------------------------------------------------------
walmart <- walmart_raw %>%
mutate(
# Convert Date to proper Date type (needed for time analysis)
Date = as.Date(Date),
# Convert sales to numeric (prevents sum()/lm() errors)
Weekly_Sales = as.numeric(Weekly_Sales),
# Store as factor (categorical identifier)
Store = factor(Store),
# Holiday flag as 0/1 integer
Holiday_Flag = as.integer(Holiday_Flag),
# Year for time grouping
Year = year(Date)
)
# Missing values check (simple summary)
missing_summary <- walmart %>%
summarise(across(everything(), ~ sum(is.na(.))))
missing_summary
Interpretation (Data Prep):
This step ensures variables are in correct formats. In R, ranking,
variability calculations, and regressions require numeric variables
(e.g., Weekly_Sales). Converting dates and creating
Year also supports time-based comparisons.
# ------------------------------------------------------------
# Coefficient of Variation (CV) = sd / mean
# It measures volatility relative to the average sales level.
# ------------------------------------------------------------
store_var <- walmart %>%
group_by(Store) %>%
summarise(
avg_sales = mean(Weekly_Sales, na.rm = TRUE),
sd_sales = sd(Weekly_Sales, na.rm = TRUE),
cv_sales = sd_sales / avg_sales,
.groups = "drop"
)
# Show the top 10 most volatile stores
store_var %>% arrange(desc(cv_sales)) %>% head(10)
Interpretation:
Some stores have much higher relative volatility (CV)
than others. This suggests that sales risk is not evenly distributed
across stores. Managers should monitor high-volatility stores more
closely because planning errors (inventory, staffing) are more costly
there.
# ------------------------------------------------------------
# Rank stores by average sales and by volatility (CV)
# Rank 1 = highest (best/highest)
# ------------------------------------------------------------
rank_df <- store_var %>%
mutate(
sales_rank = rank(-avg_sales),
volatility_rank = rank(-cv_sales)
)
ggplot(rank_df, aes(x = sales_rank, y = volatility_rank)) +
geom_point(alpha = 0.7) +
labs(
title = "Sales Rank vs Volatility Rank (Store-Level)",
x = "Sales Rank (1 = highest average sales)",
y = "Volatility Rank (1 = highest volatility)"
)
Interpretation:
If points were close to a diagonal line, sales and volatility would move
together. Instead, the pattern is scattered, meaning high sales
does not automatically mean high volatility. This implies that
revenue drivers and risk drivers may be different stores.
# ------------------------------------------------------------
# Compute volatility (CV) for each store in each year
# ------------------------------------------------------------
vol_by_year <- walmart %>%
group_by(Store, Year) %>%
summarise(
cv_sales = sd(Weekly_Sales, na.rm = TRUE) / mean(Weekly_Sales, na.rm = TRUE),
.groups = "drop"
)
ggplot(vol_by_year, aes(x = Year, y = cv_sales, group = Store)) +
geom_line(alpha = 0.2) +
labs(
title = "Store Sales Volatility Over Time (CV by Year)",
x = "Year",
y = "Coefficient of Variation (CV)"
)
Interpretation:
Some stores remain volatile across years, while others change volatility
patterns. This suggests volatility is partly structural (store
characteristics) but can also depend on time and external
conditions.
Managerial interpretation:
Managers who look only at average sales may underestimate operational
risk. A practical approach is to separate stores into:
(1) high-sales stores, (2) high-volatility stores, and (3) stable
baseline stores — and plan inventory and staffing differently for each
group.
This Quiz 3 report adds value beyond Mini Project 2 by highlighting that volatility is a distinct management problem from revenue generation. Ranking-based visuals help identify which stores require risk-focused planning.