setwd("C:/Users/Kiki/Documents/Adv_Data_Project")
data <- read.csv("shopping_trends.csv")
Question 1: Average Customer Spending by season
#average spending by season
season_means <- tapply(data$Purchase.Amount..USD., data$Season, mean, na.rm = TRUE)
bp <- barplot(season_means,
main = "Average Purchase Amount by Season",
xlab = "Season",
ylab = "Average Purchase Amount (USD)",
col = "pink",
border = "black")

This plot gives a brief overview of the average amount customers
spent during each season.
Question 2: Spending by Season – Mean and Standard Deviation
# Mean
season_means <- tapply(data$Purchase.Amount..USD., data$Season, mean, na.rm = TRUE)
# Standard deviation
season_sd <- tapply(data$Purchase.Amount..USD., data$Season, sd, na.rm = TRUE)
season_means
## Fall Spring Summer Winter
## 61.55692 58.73774 58.40524 60.35736
season_sd
## Fall Spring Summer Winter
## 23.74502 23.93585 23.47058 23.47548
Based on these numbers, we can see that in the colder months
(Fall and Winter), the average spending is slightly higher, as shown in
our bar plot. The similar standard deviations across all seasons
indicate that the variability in spending is roughly the same, meaning
that the spread of purchase amounts does not differ much from season to
season.
Question 3: Statistical Test – Spending by Season
# One-way ANOVA
anova_model <- aov(Purchase.Amount..USD. ~ Season, data = data)
summary(anova_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## Season 3 6291 2097.1 3.746 0.0106 *
## Residuals 3896 2181039 559.8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
With theses ANOVA results we can say that the effect of season
on the average spending is statistically significant. It’s telling us
that at least one season has a significantly different average purchase
amount compared to the other seasons. Our bar plot and calculations show
that the average higher spending occurs in fall and winter.
Question 4: Histogram of Purchase Amounts
hist(data$Purchase.Amount..USD.,
main = "Distribution of Purchase Amounts",
xlab = "Purchase Amount (USD)",
ylab = "Frequency",
col = "lavender",
breaks = 30) # adjust number of bins if needed

Because this is an ai generated dataset. Our distribution is
shown to be evenly distributed between 20-100 dollars.
Question 5: Statistical t-test
#spending by gender
t_test_result <- t.test(Purchase.Amount..USD. ~ Gender, data = data)
t_test_result
##
## Welch Two Sample t-test
##
## data: Purchase.Amount..USD. by Gender
## t = 0.88214, df = 2479.1, p-value = 0.3778
## alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
## 95 percent confidence interval:
## -0.8719417 2.2979409
## sample estimates:
## mean in group Female mean in group Male
## 60.2492 59.5362
Based on the results of the t-test. With a p-value of 0.378 it
is not statisically significant. Which means that gender does not
significantly affect purchase amount.It aligns with the uniform
distribution that our histogram showed us.