0) Positioning: How this extends Mini Project 2

This Quiz 3 report extends our team’s Mini Project 2 by focusing on store-level variability and ranking-based insights that were not explored in detail in the group analysis.
The goal is to show evidence (one small table/plot per question) and interpret what it means for managerial decisions.


1) Data Understanding & Preparation

# ------------------------------------------------------------
# 1.1 Load the Excel file
# ------------------------------------------------------------
# IMPORTANT: Keep WalmartData.xlsx in the same folder as this RMD file.
walmart_raw <- read_excel("WalmartData.xlsx")

# Quick check: column names + types
glimpse(walmart_raw)
## Rows: 6,435
## Columns: 8
## $ Store        <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1…
## $ Date         <chr> "05-02-2010", "12-02-2010", "19-02-2010", "26-02-2010", "…
## $ Weekly_Sales <chr> "1643690.9", "1641957.44", "1611968.17", "1409727.59", "1…
## $ Holiday_Flag <chr> "0", "1", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0…
## $ Temperature  <chr> "42.31", "38.51", "39.93", "46.63", "46.5", "57.79", "54.…
## $ Fuel_Price   <chr> "2.572", "2.548", "2.514", "2.561", "2.625", "2.667", "2.…
## $ CPI          <chr> "211.0963582", "211.2421698", "211.2891429", "211.3196429…
## $ Unemployment <chr> "8.106", "8.106", "8.106", "8.106", "8.106", "8.106", "8.…
# ------------------------------------------------------------
# 1.2 Clean and prepare
# ------------------------------------------------------------
walmart <- walmart_raw %>%
  mutate(
    # Convert Date to proper Date type (needed for time analysis)
    Date = as.Date(Date),

    # Convert sales to numeric (prevents sum()/lm() errors)
    Weekly_Sales = as.numeric(Weekly_Sales),

    # Store as factor (categorical identifier)
    Store = factor(Store),

    # Holiday flag as 0/1 integer
    Holiday_Flag = as.integer(Holiday_Flag),

    # Year for time grouping
    Year = year(Date)
  )

# Missing values check (simple summary)
missing_summary <- walmart %>%
  summarise(across(everything(), ~ sum(is.na(.))))
missing_summary

Interpretation (Data Prep):
This step ensures variables are in correct formats. In R, ranking, variability calculations, and regressions require numeric variables (e.g., Weekly_Sales). Converting dates and creating Year also supports time-based comparisons.


2) Question 1 (Descriptive): Are some stores consistently more variable than others?

Evidence: store-level volatility (Coefficient of Variation)

# ------------------------------------------------------------
# Coefficient of Variation (CV) = sd / mean
# It measures volatility relative to the average sales level.
# ------------------------------------------------------------
store_var <- walmart %>%
  group_by(Store) %>%
  summarise(
    avg_sales = mean(Weekly_Sales, na.rm = TRUE),
    sd_sales  = sd(Weekly_Sales, na.rm = TRUE),
    cv_sales  = sd_sales / avg_sales,
    .groups = "drop"
  )

# Show the top 10 most volatile stores
store_var %>% arrange(desc(cv_sales)) %>% head(10)

Interpretation:
Some stores have much higher relative volatility (CV) than others. This suggests that sales risk is not evenly distributed across stores. Managers should monitor high-volatility stores more closely because planning errors (inventory, staffing) are more costly there.


3) Question 2 (Ranking): Do top-selling stores also rank high in volatility?

Evidence: ranking comparison plot

# ------------------------------------------------------------
# Rank stores by average sales and by volatility (CV)
# Rank 1 = highest (best/highest)
# ------------------------------------------------------------
rank_df <- store_var %>%
  mutate(
    sales_rank      = rank(-avg_sales),
    volatility_rank = rank(-cv_sales)
  )

ggplot(rank_df, aes(x = sales_rank, y = volatility_rank)) +
  geom_point(alpha = 0.7) +
  labs(
    title = "Sales Rank vs Volatility Rank (Store-Level)",
    x = "Sales Rank (1 = highest average sales)",
    y = "Volatility Rank (1 = highest volatility)"
  )

Interpretation:
If points were close to a diagonal line, sales and volatility would move together. Instead, the pattern is scattered, meaning high sales does not automatically mean high volatility. This implies that revenue drivers and risk drivers may be different stores.


4) Question 3 (Time Stability): Is store volatility stable over time?

Evidence: volatility by store-year

# ------------------------------------------------------------
# Compute volatility (CV) for each store in each year
# ------------------------------------------------------------
vol_by_year <- walmart %>%
  group_by(Store, Year) %>%
  summarise(
    cv_sales = sd(Weekly_Sales, na.rm = TRUE) / mean(Weekly_Sales, na.rm = TRUE),
    .groups = "drop"
  )

ggplot(vol_by_year, aes(x = Year, y = cv_sales, group = Store)) +
  geom_line(alpha = 0.2) +
  labs(
    title = "Store Sales Volatility Over Time (CV by Year)",
    x = "Year",
    y = "Coefficient of Variation (CV)"
  )

Interpretation:
Some stores remain volatile across years, while others change volatility patterns. This suggests volatility is partly structural (store characteristics) but can also depend on time and external conditions.


5) Managerial Implications (Short)

Managerial interpretation:
Managers who look only at average sales may underestimate operational risk. A practical approach is to separate stores into:
(1) high-sales stores, (2) high-volatility stores, and (3) stable baseline stores — and plan inventory and staffing differently for each group.


6) Conclusion

This Quiz 3 report adds value beyond Mini Project 2 by highlighting that volatility is a distinct management problem from revenue generation. Ranking-based visuals help identify which stores require risk-focused planning.