1a. Load Data and Basic Statistics

 sales_data<- read.csv("salesdata24.csv")
numeric_vars <- sales_data[sapply(sales_data, is.numeric)]

###1b number of rows and columns

dim(sales_data)
## [1] 52  5

there are 52 rors and 5 columns in the data

1c. Seasonal Averages and Barplot. Convert season to factor with WI as reference

sales_data$season <- factor(sales_data$season, levels = c("WI", "SP", "SU", "AU"))

sales_data$season <- relevel(sales_data$season, ref = "WI")

season_avg <- aggregate(salesvolume ~ season, data = sales_data, mean)
print(season_avg)
##   season salesvolume
## 1     WI    516.9231
## 2     SP    514.6923
## 3     SU    755.9231
## 4     AU    851.0000
barplot(season_avg$salesvolume, names.arg = season_avg$season, 
        main = "Average Sales Volume by Season", 
        xlab = "Season", ylab = "Average Sales Volume")

we see from table of seasonal average and plot above that avarge sales volumes are almost higher in autumn and summer seasons compared to other two seasons.

###1d Observations per Season

table(sales_data$season)
## 
## WI SP SU AU 
## 13 13 13 13

there are 13 weeks for each seasons.

1e. Sales Volume Histogram

hist(sales_data$salesvolume, main = "Histogram of Sales Volume", 
     xlab = "Sales Volume", breaks = 20)

from the histogram observed that highest frequncy sales are estimated to in between 750 and 800.this value by aproximating from the histogram

1f. Weeks with Max and Min Sales

max_week <- sales_data[which.max(sales_data$salesvolume), ]
min_week <- sales_data[which.min(sales_data$salesvolume), ]
rbind(max_week, min_week)
##    week salesvolume price_1 price_2 season
## 43   43        1050     169     249     AU
## 15   15         184     299     169     SP

the maximum sales volume is in the week 43 of autumn season which is around 1050, the minimum salesvolume is in the week 15 of spring season which is around 184.

1g. Top 8 Sales Weeks

top_8_weeks <- sales_data[order(-sales_data$salesvolume), ][1:8, ]
top_8_weeks  
##    week salesvolume price_1 price_2 season
## 43   43        1050     169     249     AU
## 34   34        1036     149     239     SU
## 40   40        1008     149     249     AU
## 47   47        1003     179     299     AU
## 26   26         995     169     239     SU
## 23   23         932     179     249     SU
## 46   46         926     199     249     AU
## 42   42         912     199     299     AU

This are maximum 8 weeks with high salesvolume.when compared the two price(price_1 and price_2) for this weeks price_2 is greater than price_1 in all 8 seelected weeks.

1h. High Sales Subset (Sales > 700)

high_sales <- subset(sales_data, salesvolume > 700)
nrow(high_sales)
## [1] 25
table(high_sales$season)
## 
## WI SP SU AU 
##  3  2  8 12

The total number of weeks above 700 is 25,when we see individually winter = 3, spring = 2, summer=8, and autumn = 12. Which shows additionly to above plots that there is high sales volumes in the two seasons(autumn and summer).

1i. Correlation Analysis and visulization

price_data <- sales_data[c("price_1", "price_2")]
cor_matrix <- cor(price_data)
cor_matrix
##             price_1     price_2
## price_1  1.00000000 -0.08811098
## price_2 -0.08811098  1.00000000
pairs(price_data, main = "Pairwise Correlation Plots")

Both cor matrix and pairwise corelation plot indicates that there is negative relation between the two prices(price_1 and price_2) ,which is about -0.088

###part 2

2a. Model A (Only Price_1)

model_A <- lm(salesvolume ~ price_1, data = sales_data)
summary(model_A)$r.squared
## [1] 0.445421
barplot(sales_data$price_1, sales_data$salesvolume, 
     xlab = "Salmon Price (price_1)", 
     ylab = "Sales Volume (kg)",
     main = "Sales Volume vs Salmon Price (Model A)",
     pch = 16, col = "blue")
abline(model_A, col = "red", lwd = 2)

From the above R output R-square for the model A is about 0.445421, which indicates the price_1 only as independent variable determines about 44% the model or salesvolume. the figure is scater plot of Y with X1.

2b. Model B (Both Prices)

model_B <- lm(salesvolume ~ price_1 + price_2, data = sales_data)
# Compare R-squared
cat("Model A R-squared:", summary(model_A)$r.squared, "\n")
## Model A R-squared: 0.445421
cat("Model B R-squared:", summary(model_B)$r.squared, "\n")
## Model B R-squared: 0.6736939
# Compare prediction errors
cat("Model A RMSE:", sqrt(mean(resid(model_A)^2)), "\n")
## Model A RMSE: 171.6334
cat("Model B RMSE:", sqrt(mean(resid(model_B)^2)), "\n")
## Model B RMSE: 131.6536

R-square in model A is about o.445 and 0.673 in model B, which indicates that adding price_2 makes the model more powerful. Also when we compare the error marigins of the two model A is higher which also indicates that model B is more effective than model A.

###2c. Marginal Effects

coef_B <- coef(model_B)
cat("Effect of X1 increase by 10:", 10 * coef_B["price_1"], "\n")
## Effect of X1 increase by 10: -29.51258
cat("Effect of X2 increase by 10:", 10 * coef_B["price_2"], "\n")
## Effect of X2 increase by 10: 23.49731

The eestimated effect of increase X1(price of salmon) by 10 decreases the sales volume of salmon(Y) by about 29.51. The effect of increase in X2(price of competing product) by 10 increase the sales volume(Y)by about 23.497.

The signs of the marginal effects are as expected, when price of the product(X1) increase salesvolume(demand) deecrase on the other hand increase in the price of competing product(X2)increase sales volume(demand) for the given product.

2d. Significance of Prices

summary(model_B)
## 
## Call:
## lm(formula = salesvolume ~ price_1 + price_2, data = sales_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -240.43 -104.80   -4.91  102.66  322.72 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 805.6192   127.6614   6.311 7.76e-08 ***
## price_1      -2.9513     0.3868  -7.631 7.04e-10 ***
## price_2       2.3497     0.4013   5.855 3.91e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 135.6 on 49 degrees of freedom
## Multiple R-squared:  0.6737, Adjusted R-squared:  0.6604 
## F-statistic: 50.58 on 2 and 49 DF,  p-value: 1.213e-12

From the R output we see that the Pr(>|t|) of two prices(price_1 and price_2) are very low or nearly zero, which indicates that the two prices are significant effect on salesvolume.

2e. Verify Factor Conversion

levels(sales_data$season)
## [1] "WI" "SP" "SU" "AU"
is.factor(sales_data$season)
## [1] TRUE

The above R code indicates that which is true that ‘’WI’’ is refernce season.

2f. Model C (With Season)

model_C <- lm(salesvolume ~ price_1 + price_2 + season, data = sales_data)
summary(model_C)
## 
## Call:
## lm(formula = salesvolume ~ price_1 + price_2 + season, data = sales_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -143.419  -61.216    0.296   54.997  207.888 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 688.8533    86.1116   8.000 2.96e-10 ***
## price_1      -2.5668     0.2531 -10.141 2.61e-13 ***
## price_2       1.9898     0.2667   7.459 1.87e-09 ***
## seasonSP      7.3966    34.2678   0.216     0.83    
## seasonSU    199.0675    34.4973   5.771 6.39e-07 ***
## seasonAU    230.7019    35.2643   6.542 4.44e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 87.33 on 46 degrees of freedom
## Multiple R-squared:  0.873,  Adjusted R-squared:  0.8592 
## F-statistic: 63.24 on 5 and 46 DF,  p-value: < 2.2e-16

from R output the ‘’seasonSP’’ IS not significant but the other two seasons are significant. when compared to the winter season sales volume in summer are higher aproximatly by 199 and sales volume in autumn also higher aproximatly by 230.7.

###2g predictions of sales volume in given 3 test data

test_data <- data.frame(
  price_1 = c(150, 200, 200),
  price_2 = c(250, 200, 200),
  season = factor(c("SU", "SU", "WI"), levels = c("WI", "SP", "SU", "AU"))
)

point_predictions <- predict(model_C, newdata = test_data)
results_2g <- cbind(test_data, predicted_sales = round(point_predictions))
print("Point predictions for test data:")
## [1] "Point predictions for test data:"
print(results_2g)
##   price_1 price_2 season predicted_sales
## 1     150     250     SU            1000
## 2     200     200     SU             773
## 3     200     200     WI             573

From the output we see that predicted sales to be 1000 for the first week, 773 in second week, and 573 in third week of testdata given.

2i. Predictions with 95% Prediction Intervals

interval_predictions <- predict(model_C, newdata = test_data, interval = "prediction")
results_2i <- cbind(test_data, round(interval_predictions))
print("Predictions with 95% prediction intervals:")
## [1] "Predictions with 95% prediction intervals:"
print(results_2i )
##   price_1 price_2 season  fit lwr  upr
## 1     150     250     SU 1000 814 1187
## 2     200     200     SU  773 590  955
## 3     200     200     WI  573 390  757

The upper and lower limts of sales are indicated in the output above for the testdata given.

2j. Prediction Interval Analysis, combined dataframes with test data and predictions. round decimal numbers to integer

avg_interval_width <- mean(results_2i$upr - results_2i$lwr)
summary(avg_interval_width)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   368.3   368.3   368.3   368.3   368.3   368.3
cat("Average prediction interval width:", avg_interval_width, "\n")
## Average prediction interval width: 368.3333
cat("Model C RMSE:", sqrt(mean(resid(model_C)^2)), "\n")
## Model C RMSE: 82.13386

In this model the eror margin of 95% is about 82.13 which is lower compared with model A and model B above.

2k. Seasonal Price Analysis and Check if prices vary by season

seasonal_prices <- aggregate(cbind(price_1, price_2) ~ season, 
                            data = sales_data, 
                            FUN = mean)

print(seasonal_prices)
##   season  price_1  price_2
## 1     WI 229.0000 209.0000
## 2     SP 229.7692 205.1538
## 3     SU 212.8462 208.2308
## 4     AU 209.0000 235.1538

The two prices are almost related in all seasons by this we conclude that seasonal diffrence in salesvolume is not becuase of the price change.

2l. Seasonal Effect Investigation and # Compare models with and without seasonal prices

model_season_prices <- lm(salesvolume ~ price_1 + price_2 + season, data = sales_data)
model_no_season <- lm(salesvolume ~ price_1 + price_2, data = sales_data)
print("ANOVA Comparison:")
## [1] "ANOVA Comparison:"
anova(model_no_season, model_season_prices)
## Analysis of Variance Table
## 
## Model 1: salesvolume ~ price_1 + price_2
## Model 2: salesvolume ~ price_1 + price_2 + season
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     49 901299                                  
## 2     46 350791  3    550508 24.063 1.633e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TSS <- sum((sales_data$salesvolume - mean(sales_data$salesvolume))^2)
R2_no_season <- 1 - (901299 / TSS)
print(R2_no_season)
## [1] 0.6736937
R2_with_season <- 1 - (350791 / TSS)
print(R2_with_season)
## [1] 0.8729996
improvement <- R2_with_season - R2_no_season
print(improvement)
## [1] 0.1993059

The F-statistic in anova result is 24.063, which is large, indicating that the improvement in model fit by adding season is substantial.

when calculate R2 of the model with and with out season i from above result that see that aroud 0.673(67.3%) without season and 0.8729(87.29%).from this calculation and by print improvment effect above i see that there is almost 19.93% when model includes seasonal effect.

so sales volumes are highly depend with seasonal effects not highly depend on the price change.

##Summary

The sales volume of the salmon in kg is highly dependent or significantly difrent from season to season.from the above all tests we conclude that demand for the salmon is highly dependent on seasonlity on much on price level.from the seasons autumn is the highest demand(salesvolume) season of for salmon followed by the summer. other twho seasons are almost low demand(salesvolume). As recommendation producing the salmon is more attractive and profitable in the two seasons(autumn and summer)other things remain as constant.