South Dakota State University

Assignment 13

Ecuadorian Sales Analysis - Deep Dive

Judge Eli and Valerie Janis

STAT-442-ST1

FA2024 Semester

December 6, 2024

Abstract

This report explores sales and transaction trends across Ecuadorian supermarkets, emphasizing patterns influenced by events, holidays, and the 2016 earthquake in the Manabí region. Using sales data from 2013–2017, analyses reveal significant trends such as increased sales during shopping holidays, steady workday growth, and heightened transaction volumes following natural disasters. Key statistical tests, including the Shapiro-Wilk test and Wilcoxon rank-sum test, confirm meaningful differences in pre- and post-earthquake transactions, underscoring the critical role of supermarkets during crises. These findings guide strategies for inventory management, event-based promotions, and disaster readiness.





Dataset Overview

This dataset contains information about millions of sales sold at Favorita stores located in Ecuador. The data includes dates, store and product information, whether that item was being promoted, as well as the sales and transaction numbers.

Variables

Additional Notes

  • Wages in the public sector are paid every two weeks on the 15th and on the last day of the month. Supermarket sales could be affected by this.

  • A magnitude 7.8 earthquake struck Ecuador on April 16, 2016. People rallied in relief efforts donating water and other first need products which greatly affected supermarket sales for several weeks after the earthquake.


Dataset Structure

Data Frame Summary

dataset

Dimensions: 3000888 x 31
Duplicates: 0
Variable Stats / Values Freqs (% of Valid) Graph Missing
id [character]
1. 1218441993
2. 1218802178
3. 1218041020
4. 1218148017
5. 1218046148
6. 1218160109
7. 1218378191
8. 1218947988
9. 1218156392
10. 1218253183
[ 166725 others ]
1000296(33.3%)
444576(14.8%)
166716(5.6%)
166716(5.6%)
111144(3.7%)
111144(3.7%)
111144(3.7%)
111144(3.7%)
55572(1.9%)
55572(1.9%)
666864(22.2%)
0 (0.0%)
date [Date]
min : 2013-01-01
med : 2015-04-24
max : 2017-08-15
range : 4y 7m 14d
1684 distinct values 0 (0.0%)
store_nbr [numeric]
Mean (sd) : 27.5 (15.6)
min ≤ med ≤ max:
1 ≤ 27.5 ≤ 54
IQR (CV) : 27 (0.6)
54 distinct values 0 (0.0%)
product_category [character]
1. AUTOMOTIVE
2. BABY CARE
3. BEAUTY
4. BEVERAGES
5. BOOKS
6. BREAD/BAKERY
7. CELEBRATION
8. CLEANING
9. DAIRY
10. DELI
[ 23 others ]
90936(3.0%)
90936(3.0%)
90936(3.0%)
90936(3.0%)
90936(3.0%)
90936(3.0%)
90936(3.0%)
90936(3.0%)
90936(3.0%)
90936(3.0%)
2091528(69.7%)
0 (0.0%)
sales [numeric]
Mean (sd) : 520.7 (1297.2)
min ≤ med ≤ max:
0.1 ≤ 78.5 ≤ 124717
IQR (CV) : 379 (2.5)
379609 distinct values 939130 (31.3%)
Log[sales] [numeric]
Mean (sd) : 4.2 (2.3)
min ≤ med ≤ max:
-2.1 ≤ 4.4 ≤ 11.7
IQR (CV) : 3.8 (0.6)
379609 distinct values 939130 (31.3%)
Range Scale[Log[sales]] [numeric]
Mean (sd) : 0.5 (0.2)
min ≤ med ≤ max:
0 ≤ 0.5 ≤ 1
IQR (CV) : 0.3 (0.4)
379609 distinct values 939130 (31.3%)
items_promoted [numeric]
Mean (sd) : 12.8 (24.6)
min ≤ med ≤ max:
1 ≤ 4 ≤ 741
IQR (CV) : 11 (1.9)
361 distinct values 2389559 (79.6%)
Log[items_promoted] [numeric]
Mean (sd) : 1.5 (1.4)
min ≤ med ≤ max:
0 ≤ 1.4 ≤ 6.6
IQR (CV) : 2.5 (0.9)
361 distinct values 2389559 (79.6%)
Range Scale[Log[items_promoted]] [numeric]
Mean (sd) : 0.2 (0.2)
min ≤ med ≤ max:
0 ≤ 0.2 ≤ 1
IQR (CV) : 0.4 (0.9)
361 distinct values 2389559 (79.6%)
transactions [numeric]
Mean (sd) : 1694.6 (963.3)
min ≤ med ≤ max:
5 ≤ 1393 ≤ 8359
IQR (CV) : 1033 (0.6)
4993 distinct values 245784 (8.2%)
Log[transactions] [numeric]
Mean (sd) : 7.3 (0.5)
min ≤ med ≤ max:
1.6 ≤ 7.2 ≤ 9
IQR (CV) : 0.7 (0.1)
4993 distinct values 245784 (8.2%)
Range Scale[Log[transactions]] [numeric]
Mean (sd) : 0.8 (0.1)
min ≤ med ≤ max:
0 ≤ 0.8 ≤ 1
IQR (CV) : 0.1 (0.1)
4993 distinct values 245784 (8.2%)
city [character]
1. Quito
2. Guayaquil
3. Cuenca
4. Santo Domingo
5. Ambato
6. Latacunga
7. Machala
8. Manta
9. Babahoyo
10. Cayambe
[ 12 others ]
1000296(33.3%)
444576(14.8%)
166716(5.6%)
166716(5.6%)
111144(3.7%)
111144(3.7%)
111144(3.7%)
111144(3.7%)
55572(1.9%)
55572(1.9%)
666864(22.2%)
0 (0.0%)
state [character]
1. Pichincha
2. Guayas
3. Azuay
4. Manabi
5. Santo Domingo de los Tsac
6. Cotopaxi
7. El Oro
8. Los Rios
9. Tungurahua
10. Bolivar
[ 6 others ]
1055868(35.2%)
611292(20.4%)
166716(5.6%)
166716(5.6%)
166716(5.6%)
111144(3.7%)
111144(3.7%)
111144(3.7%)
111144(3.7%)
55572(1.9%)
333432(11.1%)
0 (0.0%)
store_type [character]
1. A
2. B
3. C
4. D
5. E
500148(16.7%)
444576(14.8%)
833580(27.8%)
1000296(33.3%)
222288(7.4%)
0 (0.0%)
store_cluster [character]
1. 3
2. 10
3. 6
4. 15
5. 13
6. 14
7. 1
8. 11
9. 4
10. 8
[ 7 others ]
389004(13.0%)
333432(11.1%)
333432(11.1%)
277860(9.3%)
222288(7.4%)
222288(7.4%)
166716(5.6%)
166716(5.6%)
166716(5.6%)
166716(5.6%)
555720(18.5%)
0 (0.0%)
oil_price [numeric]
Mean (sd) : 67.9 (25.7)
min ≤ med ≤ max:
26.2 ≤ 53.3 ≤ 110.6
IQR (CV) : 49.4 (0.4)
1352 distinct values 0 (0.0%)
event_type [character]
1. Disaster
2. Holiday
3. Shopping Event
4. Sport Event
5. Workday
3069(0.1%)
142725(4.8%)
28281(0.9%)
24783(0.8%)
2802030(93.4%)
0 (0.0%)
event_description [character]
1. Navidad
2. Independencia
3. Mundial de futbol
4. Carnaval
5. Primer dia del ano
6. Dia de la Madre
7. Fundacion
8. Dia de Difuntos
9. Viernes Santo
10. Dia del Trabajo
[ 5 others ]
35508(17.9%)
32736(16.5%)
24783(12.5%)
17820(9.0%)
17820(9.0%)
17589(8.8%)
9636(4.8%)
8910(4.5%)
8877(4.5%)
8811(4.4%)
16368(8.2%)
2802030 (93.4%)
city_pop [numeric]
Mean (sd) : 1155280 (1063311)
min ≤ med ≤ max:
23874 ≤ 329928 ≤ 2723665
IQR (CV) : 1860561 (0.9)
22 distinct values 0 (0.0%)
Log[city_pop] [numeric]
Mean (sd) : 13.1 (1.6)
min ≤ med ≤ max:
10.1 ≤ 12.7 ≤ 14.8
IQR (CV) : 2.6 (0.1)
22 distinct values 0 (0.0%)
Range Scale[Log[city_pop]] [numeric]
Mean (sd) : 0.6 (0.3)
min ≤ med ≤ max:
0 ≤ 0.6 ≤ 1
IQR (CV) : 0.5 (0.5)
22 distinct values 0 (0.0%)
sales_per_100k [numeric]
Mean (sd) : 225.6 (981.7)
min ≤ med ≤ max:
0 ≤ 13 ≤ 94196
IQR (CV) : 88 (4.4)
11923 distinct values 939130 (31.3%)
Log[sales_per_100k] [numeric]
Mean (sd) : 2.5 (2.7)
min ≤ med ≤ max:
-4.5 ≤ 2.5 ≤ 11.5
IQR (CV) : 4 (1.1)
524940 distinct values 939130 (31.3%)
Range Scale[Log[sales_per_100k]] [numeric]
Mean (sd) : 0.4 (0.2)
min ≤ med ≤ max:
0 ≤ 0.4 ≤ 1
IQR (CV) : 0.3 (0.4)
524919 distinct values 939130 (31.3%)
trans_per_100k [numeric]
Mean (sd) : 783.9 (1247.2)
min ≤ med ≤ max:
0 ≤ 261 ≤ 12857
IQR (CV) : 903.2 (1.6)
4695 distinct values 245784 (8.2%)
Log[trans_per_100k] [numeric]
Mean (sd) : 5.7 (1.5)
min ≤ med ≤ max:
-1.2 ≤ 5.6 ≤ 9.5
IQR (CV) : 2.4 (0.3)
22679 distinct values 245784 (8.2%)
Range Scale[Log[trans_per_100k]] [numeric]
Mean (sd) : 0.6 (0.1)
min ≤ med ≤ max:
0 ≤ 0.6 ≤ 1
IQR (CV) : 0.2 (0.2)
22679 distinct values 245784 (8.2%)
lat [numeric]
Mean (sd) : -1.2 (1.1)
min ≤ med ≤ max:
-4 ≤ -0.9 ≤ 0.9
IQR (CV) : 2 (-1)
22 distinct values 0 (0.0%)
lng [numeric]
Mean (sd) : -79.1 (0.8)
min ≤ med ≤ max:
-81 ≤ -79 ≤ -78
IQR (CV) : 1.4 (0)
22 distinct values 0 (0.0%)

Generated by summarytools 1.0.1 (R version 4.4.0)
2024-11-27





Exploratory Data Analysis



1) Mean Sales by Day and Month

Identifying patterns in daily and monthly sales to optimize store operations.

Interpretation

This heatmap reveals the interplay between sales, days of the week, and months of the year. Key insights include:

  • Thursdays consistently exhibit the lowest average sales, potentially due to mid-week consumer behavior.

  • December sees a marked increase in sales across all days, reflecting holiday-driven shopping spikes.

  • Sundays maintain relatively high sales throughout the year, likely due to relaxed schedules or family shopping habits.

These insights can guide staffing, inventory management, and promotional campaigns during peak periods.



2) Monthly Sales by Year

Interpretation

The bar chart dissects average monthly sales across years, highlighting:

  • A steady growth trend from 2013 to 2016. Note that the last four months (Sep - Dec) of 2017 are not included.

  • December emerges as the highest-grossing month annually, driven by holiday spending.

  • February sales are relatively stagnant after the holiday season, indicating potential for promotional campaigns to boost revenue.

These trends underscore the importance of seasonal marketing strategies and addressing factors contributing to periods of decline.



3) Holiday Sales Analysis

Evaluating the impact of events and types of days on monthly sales.

Interpretation

  1. Workdays: Sales are steady but show a gradual increase over the years, reflecting economic growth or improved store performance.

  2. Sporting Events: Spikes in June and July 2014 align with the FIFA World Cup, showing opportunities for sales during national sports events.

  3. Shopping Events: Recurring peaks in May (Mother’s Day) and November (Black Friday/Cyber Monday) highlight critical periods for targeted promotions.

  4. Holiday Sales: A sharp rise in January sales starts in 2015, suggesting a growing post-holiday shopping trend.

  5. 2016 Earthquake: Sales surged after the April 2016 earthquake, possibly driven by relief efforts or changes in consumer behavior.

This analysis emphasizes the need for tailored event-based promotions, particularly during months with significant shopping holidays. Plan promotions around major sports events and shopping holidays. Focus on post-holiday campaigns to capture January’s growing sales trend. Explore strategies to meet increased demand during crises like natural disasters.



4) Holiday Impact Analysis

Interpretation

The interactive box plot compares sales distributions between holidays and workdays, revealing:

  • Sales are consistently higher on holidays, validating their significance for retail.

  • A logarithmic transformation indicates smaller stores still experience substantial sales spikes on holidays.

This distinction can inform decisions on extended hours, special deals, or targeted advertising during holidays.



5) Store Type Performance

Analyzing sales variations across different store formats.

Interpretation

This plot highlights the diversity in sales performance by store type:

  • Type A stores show the highest sales averages, suggesting these locations are either larger or cater to high-demand regions.

  • Smaller types (e.g., Type C) indicate niche opportunities or inconsistent customer bases.

This insight can guide future investments and focus on scaling high-performing store formats.





Quantitative Analysis



6) Manabi’s Earthquake Impact Analysis

Assessing earthquake-driven changes in daily transactions.

In 2016, a magnitude 7.8 earthquake struck Ecuador on April 16, heavily affecting the Manabí region. To understand the impact on consumer behavior in this region, we compared daily transaction data 30 days before and 30 days after the earthquake (March 18–April 16 vs. April 17–May 17).

  • Before the earthquake, average daily transactions in Manabí were consistent, around 1,000 transactions on average per day. This reflects stable shopping patterns typical of a pre-disaster period.

  • Following the earthquake, transactions dipped sharply and then spiked significantly, with averages climbing to approximately 1,500 per day. Some days even saw spikes exceeding 1,500 transactions.

## 
##  Shapiro-Wilk normality test
## 
## data:  daily_avg$daily_transactions[daily_avg$period == "Before"]
## W = 0.95153, p-value = 0.1858
## 
##  Shapiro-Wilk normality test
## 
## data:  daily_avg$daily_transactions[daily_avg$period == "After"]
## W = 0.91344, p-value = 0.01587
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  daily_transactions by period
## W = 74, p-value = 1.766e-08
## alternative hypothesis: true location shift is not equal to 0

Interpretation

The Shapiro-Wilk test checks if the data is normally distributed. For transactions before the earthquake, the test showed they were close to a normal distribution (p = ~ 0.2), meaning the data was fairly regular. After the earthquake, the test indicated a slight departure from normality (p = ~ 0.02), likely because of some extreme values (outliers).

A Paired Samples t-Test was not used becasue it assumes normality. Instead the Wilcoxon Rank-Sum test, which also compares two sets of data, showed a clear and significant increase in transactions after the earthquake (p < 0.001). This means the difference between the two periods wasn’t random, it was a meaningful change.

After the earthquake, transactions increased as people rushed to supermarkets for essential supplies like water, food, and first-aid items. Supermarkets became critical hubs for disaster response, with spending shifting toward basic needs and emergency preparedness. This emphasizes the importance of supermarkets in recovery efforts.

For the future, stores should anticipate demand spikes during disasters, ensure sufficient inventory, and develop crisis response plans. Offering discounts or relief packages can also strengthen community support and aid recovery efforts.





Conclusion

This analysis highlights the dynamic nature of sales and transaction patterns in Ecuadorian supermarkets, driven by holidays, sporting events, and external shocks like natural disasters. Key insights include:

  1. Consistent workday sales growth and recurring spikes during major shopping holidays emphasize the importance of targeted marketing and inventory planning.

  2. The 2016 Manabí earthquake triggered a significant post-event surge in transactions, underscoring the critical role of supermarkets as community support hubs during crises.

  3. Statistical analyses validated these findings, with the Wilcoxon rank-sum test confirming a meaningful increase in post-earthquake transactions.

Moving forward, supermarkets should leverage these insights to optimize operations during high-demand periods, develop disaster response plans, and strengthen community ties through targeted relief efforts. This approach not only enhances sales performance but also reinforces the role of supermarkets in economic and social recovery.