Introduction

This analysis examines weekly sales data from a retail company to understand how promotions, holidays, and competitor activity influence sales. Exploratory analysis and a linear regression model are used to investigate these relationships.

#Data Prepriation

sales<-read.csv("sales_data.csv", stringsAsFactors = TRUE)

head(sales)
##   Week Promotion Holiday CompetitorActivity Sales
## 1    1       Yes      No               High 47.76
## 2    2        No      No                Low 45.80
## 3    3        No      No               High 60.00
## 4    4       Yes     Yes               High 54.31
## 5    5        No      No               High 43.49
## 6    6        No     Yes               High 35.01
str(sales)
## 'data.frame':    52 obs. of  5 variables:
##  $ Week              : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Promotion         : Factor w/ 2 levels "No","Yes": 2 1 1 2 1 1 1 1 1 1 ...
##  $ Holiday           : Factor w/ 2 levels "No","Yes": 1 1 1 2 1 2 2 1 1 2 ...
##  $ CompetitorActivity: Factor w/ 3 levels "High","Low","Medium": 1 2 1 1 1 1 1 1 2 1 ...
##  $ Sales             : num  47.8 45.8 60 54.3 43.5 ...
colSums(is.na(sales))
##               Week          Promotion            Holiday CompetitorActivity 
##                  0                  0                  0                  0 
##              Sales 
##                  0
summary(sales)
##       Week       Promotion Holiday  CompetitorActivity     Sales      
##  Min.   : 1.00   No :31    No :27   High  :16          Min.   :26.62  
##  1st Qu.:13.75   Yes:21    Yes:25   Low   :20          1st Qu.:42.34  
##  Median :26.50                      Medium:16          Median :48.33  
##  Mean   :26.50                                         Mean   :49.58  
##  3rd Qu.:39.25                                         3rd Qu.:55.41  
##  Max.   :52.00                                         Max.   :73.28

#Exploratory Data Analysis

##Sales by Promotion

boxplot(Sales~Promotion, data=sales,
        main="Sales by Promotion",
        xlab="Promotion",
        ylab="Sales (thousands of dollars)",
        col=c("lightgreen","lightblue")
)

aggregate(Sales~Promotion, data = sales, summary)
##   Promotion Sales.Min. Sales.1st Qu. Sales.Median Sales.Mean Sales.3rd Qu.
## 1        No   26.62000      40.67000     45.08000   46.75968      52.45000
## 2       Yes   35.07000      47.25000     54.31000   53.73714      57.90000
##   Sales.Max.
## 1   73.28000
## 2   70.73000

The median sales during promotion weeks (~54 thousand dollars) are higher than during non-promotion weeks (~45 thousand dollars), suggesting promotions have a positive effect on sales.


##Sales by Holiday

boxplot(Sales~Holiday, data=sales,
        main="Sales by Holiday",
        xlab = "Holiday",
        ylab= "Sales",
        col=c("lightyellow","lightblue")
)

aggregate(Sales~Holiday, data=sales, summary)
##   Holiday Sales.Min. Sales.1st Qu. Sales.Median Sales.Mean Sales.3rd Qu.
## 1      No    26.6200       43.2800      47.2500    50.3137       57.1200
## 2     Yes    29.6000       41.3200      50.8700    48.7824       54.3100
##   Sales.Max.
## 1    73.2800
## 2    66.6200

The median sales during holiday weeks (~51 thousand dollars) are higher than during non-holiday weeks (~47 thousand dollars), suggesting holidays may have a positive effect on sales. However, the highest and lowest sales values occurred during non-holiday weeks, indicating greater variability when there is no holiday. —

##Sales by Competitor Activity

 boxplot(Sales~CompetitorActivity, data=sales,
        main = "Sales by Competitor Activity",
        xlab = "Competitor Activity",
        ylab="Sales",
        col=c("pink", "lightgreen", "lightyellow")
)

aggregate(Sales~CompetitorActivity, data=sales, summary)
##   CompetitorActivity Sales.Min. Sales.1st Qu. Sales.Median Sales.Mean
## 1               High   35.01000      43.38500     47.04000   49.46813
## 2                Low   29.60000      44.76000     52.16000   52.05250
## 3             Medium   26.62000      38.30500     46.76000   46.59313
##   Sales.3rd Qu. Sales.Max.
## 1      53.49250   70.73000
## 2      58.49250   73.28000
## 3      54.71750   64.27000

The median sales were highest (~52 thousand dollars) when competitor activity was low. Sales were lowest (~46–47 thousand dollars) when competitor activity was medium. Sales increase slightly when competitor activity is high but remain below the levels observed during low competitor activity.

##Sales Trend Over Time plot(sales\(Week, sales\)Sales, main = “Sales Trend Over Time”, xlab=“Week”, ylab=“Sales(thousands of dollars”, pch=19, col=“blue” ) abline(lm(Sales ~ Week, data = sales), col = “red”, lwd = 2)


The scatterplot shows weekly sales fluctuate throughout the year with no strong pattern. The regression line shows a slight downward trend, suggesting sales decrease minimally over time. However, the wide spread of data points indicates that time alone does not strongly influence sales compared to other factors such as promotions, holidays, and competitor activity.
---
#Predictive Modeling

``` r
model<- lm(Sales~Promotion+Holiday+CompetitorActivity, data=sales)
summary(model)
## 
## Call:
## lm(formula = Sales ~ Promotion + Holiday + CompetitorActivity, 
##     data = sales)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -19.2811  -7.6153  -0.8176   5.8278  23.7119 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                47.075      3.039  15.488   <2e-16 ***
## PromotionYes                7.070      2.926   2.416   0.0196 *  
## HolidayYes                 -0.687      2.903  -0.237   0.8140    
## CompetitorActivityLow       2.494      3.473   0.718   0.4764    
## CompetitorActivityMedium   -3.188      3.687  -0.865   0.3916    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.29 on 47 degrees of freedom
## Multiple R-squared:  0.154,  Adjusted R-squared:  0.08203 
## F-statistic: 2.139 on 4 and 47 DF,  p-value: 0.09067

The regression results show that promotions have a statistically significant positive effect on sales, increasing weekly sales by approximately $7,000 on average. Holiday weeks and competitor activity levels do not show statistically significant effects on sales in this model. The model explains about 15% of the variation in sales, suggesting that additional factors not included in the model likely influence weekly sales performance.

#Model Evaluation

predicted_sales <-predict(model)
mse<-mean((sales$Sales-predicted_sales)^2)
mse
## [1] 95.7844

The model produced a Mean Squared Error (MSE) of approximately 95.78, indicating an average prediction error of about $9,800. The model’s R-squared value of 0.154 suggests that only about 15% of the variation in sales is explained by the predictors included in the model.


The exploratory data analysis suggests that promotions are associated with higher sales, while holidays and competitor activity show weaker relationships with sales. Boxplots indicated that sales tend to increase during promotion weeks and when competitor activity is low. A linear regression model was built to predict weekly sales using promotion status, holiday weeks, and competitor activity. The results showed that promotions had a statistically significant positive effect on sales, increasing weekly sales by approximately $7,000 on average. In contrast, holiday weeks and competitor activity were not statistically significant predictors in the model.

Overall, the results suggest that promotional campaigns are the strongest factor influencing weekly sales in this dataset, while other factors may also contribute to sales variation but were not captured in this analysis.