setwd("C:/Users/Kiki/Documents/Adv_Data_Project")
data <- read.csv("shopping_trends.csv")

Question 1: Average Customer Spending by season

#average spending by season
season_means <- tapply(data$Purchase.Amount..USD., data$Season, mean, na.rm = TRUE)

bp <- barplot(season_means,
              main = "Average Purchase Amount by Season",
              xlab = "Season",
              ylab = "Average Purchase Amount (USD)",
              col = "pink",
              border = "black")

This plot gives a brief overview of the average amount customers spent during each season.

Question 2: Spending by Season – Mean and Standard Deviation

# Mean
season_means <- tapply(data$Purchase.Amount..USD., data$Season, mean, na.rm = TRUE)

# Standard deviation
season_sd <- tapply(data$Purchase.Amount..USD., data$Season, sd, na.rm = TRUE)

season_means
##     Fall   Spring   Summer   Winter 
## 61.55692 58.73774 58.40524 60.35736
season_sd
##     Fall   Spring   Summer   Winter 
## 23.74502 23.93585 23.47058 23.47548
Based on these numbers, we can see that in the colder months (Fall and Winter), the average spending is slightly higher, as shown in our bar plot. The similar standard deviations across all seasons indicate that the variability in spending is roughly the same, meaning that the spread of purchase amounts does not differ much from season to season.

Question 3: Statistical Test – Spending by Season

# One-way ANOVA
anova_model <- aov(Purchase.Amount..USD. ~ Season, data = data)
summary(anova_model)
##               Df  Sum Sq Mean Sq F value Pr(>F)  
## Season         3    6291  2097.1   3.746 0.0106 *
## Residuals   3896 2181039   559.8                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
With theses ANOVA results we can say that the effect of season on the average spending is statistically significant. It’s telling us that at least one season has a significantly different average purchase amount compared to the other seasons. Our bar plot and calculations show that the average higher spending occurs in fall and winter.

Question 4: Histogram of Purchase Amounts

hist(data$Purchase.Amount..USD.,
     main = "Distribution of Purchase Amounts",
     xlab = "Purchase Amount (USD)",
     ylab = "Frequency",
     col = "lavender",
     breaks = 30)  # adjust number of bins if needed

Because this is an ai generated dataset. Our distribution is shown to be evenly distributed between 20-100 dollars.

Question 5: Statistical t-test

#spending by gender
t_test_result <- t.test(Purchase.Amount..USD. ~ Gender, data = data)

t_test_result
## 
##  Welch Two Sample t-test
## 
## data:  Purchase.Amount..USD. by Gender
## t = 0.88214, df = 2479.1, p-value = 0.3778
## alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
## 95 percent confidence interval:
##  -0.8719417  2.2979409
## sample estimates:
## mean in group Female   mean in group Male 
##              60.2492              59.5362
Based on the results of the t-test. With a p-value of 0.378 it is not statisically significant. Which means that gender does not significantly affect purchase amount.It aligns with the uniform distribution that our histogram showed us.