This analysis examines weekly sales data from a retail company to understand how promotions, holidays, and competitor activity influence sales. Exploratory analysis and a linear regression model are used to investigate these relationships.
#Data Prepriation
sales<-read.csv("sales_data.csv", stringsAsFactors = TRUE)
head(sales)
## Week Promotion Holiday CompetitorActivity Sales
## 1 1 Yes No High 47.76
## 2 2 No No Low 45.80
## 3 3 No No High 60.00
## 4 4 Yes Yes High 54.31
## 5 5 No No High 43.49
## 6 6 No Yes High 35.01
str(sales)
## 'data.frame': 52 obs. of 5 variables:
## $ Week : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Promotion : Factor w/ 2 levels "No","Yes": 2 1 1 2 1 1 1 1 1 1 ...
## $ Holiday : Factor w/ 2 levels "No","Yes": 1 1 1 2 1 2 2 1 1 2 ...
## $ CompetitorActivity: Factor w/ 3 levels "High","Low","Medium": 1 2 1 1 1 1 1 1 2 1 ...
## $ Sales : num 47.8 45.8 60 54.3 43.5 ...
colSums(is.na(sales))
## Week Promotion Holiday CompetitorActivity
## 0 0 0 0
## Sales
## 0
summary(sales)
## Week Promotion Holiday CompetitorActivity Sales
## Min. : 1.00 No :31 No :27 High :16 Min. :26.62
## 1st Qu.:13.75 Yes:21 Yes:25 Low :20 1st Qu.:42.34
## Median :26.50 Medium:16 Median :48.33
## Mean :26.50 Mean :49.58
## 3rd Qu.:39.25 3rd Qu.:55.41
## Max. :52.00 Max. :73.28
#Exploratory Data Analysis
##Sales by Promotion
boxplot(Sales~Promotion, data=sales,
main="Sales by Promotion",
xlab="Promotion",
ylab="Sales (thousands of dollars)",
col=c("lightgreen","lightblue")
)
aggregate(Sales~Promotion, data = sales, summary)
## Promotion Sales.Min. Sales.1st Qu. Sales.Median Sales.Mean Sales.3rd Qu.
## 1 No 26.62000 40.67000 45.08000 46.75968 52.45000
## 2 Yes 35.07000 47.25000 54.31000 53.73714 57.90000
## Sales.Max.
## 1 73.28000
## 2 70.73000
The median sales during promotion weeks (~54 thousand dollars) are higher than during non-promotion weeks (~45 thousand dollars), suggesting promotions have a positive effect on sales.
##Sales by Holiday
boxplot(Sales~Holiday, data=sales,
main="Sales by Holiday",
xlab = "Holiday",
ylab= "Sales",
col=c("lightyellow","lightblue")
)
aggregate(Sales~Holiday, data=sales, summary)
## Holiday Sales.Min. Sales.1st Qu. Sales.Median Sales.Mean Sales.3rd Qu.
## 1 No 26.6200 43.2800 47.2500 50.3137 57.1200
## 2 Yes 29.6000 41.3200 50.8700 48.7824 54.3100
## Sales.Max.
## 1 73.2800
## 2 66.6200
The median sales during holiday weeks (~51 thousand dollars) are higher than during non-holiday weeks (~47 thousand dollars), suggesting holidays may have a positive effect on sales. However, the highest and lowest sales values occurred during non-holiday weeks, indicating greater variability when there is no holiday. —
##Sales by Competitor Activity
boxplot(Sales~CompetitorActivity, data=sales,
main = "Sales by Competitor Activity",
xlab = "Competitor Activity",
ylab="Sales",
col=c("pink", "lightgreen", "lightyellow")
)
aggregate(Sales~CompetitorActivity, data=sales, summary)
## CompetitorActivity Sales.Min. Sales.1st Qu. Sales.Median Sales.Mean
## 1 High 35.01000 43.38500 47.04000 49.46813
## 2 Low 29.60000 44.76000 52.16000 52.05250
## 3 Medium 26.62000 38.30500 46.76000 46.59313
## Sales.3rd Qu. Sales.Max.
## 1 53.49250 70.73000
## 2 58.49250 73.28000
## 3 54.71750 64.27000
##Sales Trend Over Time plot(sales\(Week, sales\)Sales, main = “Sales Trend Over Time”, xlab=“Week”, ylab=“Sales(thousands of dollars”, pch=19, col=“blue” ) abline(lm(Sales ~ Week, data = sales), col = “red”, lwd = 2)
The scatterplot shows weekly sales fluctuate throughout the year with no strong pattern. The regression line shows a slight downward trend, suggesting sales decrease minimally over time. However, the wide spread of data points indicates that time alone does not strongly influence sales compared to other factors such as promotions, holidays, and competitor activity.
---
#Predictive Modeling
``` r
model<- lm(Sales~Promotion+Holiday+CompetitorActivity, data=sales)
summary(model)
##
## Call:
## lm(formula = Sales ~ Promotion + Holiday + CompetitorActivity,
## data = sales)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.2811 -7.6153 -0.8176 5.8278 23.7119
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 47.075 3.039 15.488 <2e-16 ***
## PromotionYes 7.070 2.926 2.416 0.0196 *
## HolidayYes -0.687 2.903 -0.237 0.8140
## CompetitorActivityLow 2.494 3.473 0.718 0.4764
## CompetitorActivityMedium -3.188 3.687 -0.865 0.3916
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.29 on 47 degrees of freedom
## Multiple R-squared: 0.154, Adjusted R-squared: 0.08203
## F-statistic: 2.139 on 4 and 47 DF, p-value: 0.09067
#Model Evaluation
predicted_sales <-predict(model)
mse<-mean((sales$Sales-predicted_sales)^2)
mse
## [1] 95.7844
The model produced a Mean Squared Error (MSE) of approximately 95.78, indicating an average prediction error of about $9,800. The model’s R-squared value of 0.154 suggests that only about 15% of the variation in sales is explained by the predictors included in the model.
The exploratory data analysis suggests that promotions are associated with higher sales, while holidays and competitor activity show weaker relationships with sales. Boxplots indicated that sales tend to increase during promotion weeks and when competitor activity is low. A linear regression model was built to predict weekly sales using promotion status, holiday weeks, and competitor activity. The results showed that promotions had a statistically significant positive effect on sales, increasing weekly sales by approximately $7,000 on average. In contrast, holiday weeks and competitor activity were not statistically significant predictors in the model.
Overall, the results suggest that promotional campaigns are the strongest factor influencing weekly sales in this dataset, while other factors may also contribute to sales variation but were not captured in this analysis.