This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
library(tidyverse)
# Import the dataset
shopping <- read_csv("shopping_behavior_updated (1).csv")
# View first few rows
head(shopping)
## # A tibble: 6 × 18
## `Customer ID` Age Gender `Item Purchased` Category `Purchase Amount (USD)`
## <dbl> <dbl> <chr> <chr> <chr> <dbl>
## 1 1 55 Male Blouse Clothing 53
## 2 2 19 Male Sweater Clothing 64
## 3 3 50 Male Jeans Clothing 73
## 4 4 21 Male Sandals Footwear 90
## 5 5 45 Male Blouse Clothing 49
## 6 6 46 Male Sneakers Footwear 20
## # ℹ 12 more variables: Location <chr>, Size <chr>, Color <chr>, Season <chr>,
## # `Review Rating` <dbl>, `Subscription Status` <chr>, `Shipping Type` <chr>,
## # `Discount Applied` <chr>, `Promo Code Used` <chr>,
## # `Previous Purchases` <dbl>, `Payment Method` <chr>,
## # `Frequency of Purchases` <chr>
avg_spending <- shopping %>%
group_by(Gender) %>%
summarise(
average_spent = mean(`Purchase Amount (USD)`, na.rm = TRUE),
median_spent = median(`Purchase Amount (USD)`, na.rm = TRUE),
count = n()
)
avg_spending
## # A tibble: 2 × 4
## Gender average_spent median_spent count
## <chr> <dbl> <dbl> <int>
## 1 Female 60.2 60 1248
## 2 Male 59.5 60 2652
shopping %>%
group_by(Gender) %>%
summarise(average_spent = mean(`Purchase Amount (USD)`, na.rm = TRUE)) %>%
ggplot(aes(x = Gender, y = average_spent, fill = Gender)) +
geom_col() +
labs(
title = "Average Purchase Amount by Gender",
x = "Gender",
y = "Average Purchase Amount (USD)"
) +
theme_minimal()
This bar graph shows the average amount spent between Female and Male. The two groups are almost identical with both having a spending average of about $60.
t_test_result <- t.test(`Purchase Amount (USD)` ~ Gender, data = shopping)
t_test_result
##
## Welch Two Sample t-test
##
## data: Purchase Amount (USD) by Gender
## t = 0.88214, df = 2479.1, p-value = 0.3778
## alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
## 95 percent confidence interval:
## -0.8719417 2.2979409
## sample estimates:
## mean in group Female mean in group Male
## 60.2492 59.5362
The t-test further confirms the average spending analysis. Gender and spending relationship are not statistically significant. (p-value > .05)
ggplot(shopping, aes(x = Gender, y = `Purchase Amount (USD)`, fill = Gender)) +
geom_boxplot(alpha = 0.7) +
labs(
title = "Spending Distribution by Gender",
x = "Gender",
y = "Purchase Amount (USD)"
) +
theme_minimal()
The boxplot shows both groups having the same median of about $60. If we look closely, Females do show a slightly higher purchase amount and Males show a slightly less purchase amount however, it is not statistically significant to their gender.
shopping %>%
group_by(Gender) %>%
summarise(
avg_frequency = mean(`Frequency of Purchases`, na.rm = TRUE),
median_frequency = median(`Frequency of Purchases`, na.rm = TRUE),
sd_frequency = sd(`Frequency of Purchases`, na.rm = TRUE),
count = n()
)
## Warning: There were 6 warnings in `summarise()`.
## The first warning was:
## ℹ In argument: `avg_frequency = mean(`Frequency of Purchases`, na.rm = TRUE)`.
## ℹ In group 1: `Gender = "Female"`.
## Caused by warning in `mean.default()`:
## ! argument is not numeric or logical: returning NA
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 5 remaining warnings.
## # A tibble: 2 × 5
## Gender avg_frequency median_frequency sd_frequency count
## <chr> <dbl> <dbl> <dbl> <int>
## 1 Female NA NA NA 1248
## 2 Male NA NA NA 2652
# remove NA rows
shopping_freq <- shopping %>%
filter(Gender %in% c("Male", "Female")) %>%
filter(!is.na(`Frequency of Purchases`))
# Create contingency table
freq_table <- table(shopping_freq$Gender, shopping_freq$`Frequency of Purchases`)
# Chi-square test
chisq_freq_test <- chisq.test(freq_table)
chisq_freq_test
##
## Pearson's Chi-squared test
##
## data: freq_table
## X-squared = 3.928, df = 6, p-value = 0.6864
ggplot(shopping_freq, aes(x = `Frequency of Purchases`, fill = Gender)) +
geom_bar(position = "dodge") +
labs(
title = "Purchase Frequency by Gender",
x = "Frequency Category",
y = "Count"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
This bar graph shows that males tend to make significantly more frequent purchases compared to females in every category.
shopping %>%
count(Gender, Category) %>%
ggplot(aes(x = Category, y = n, fill = Gender)) +
geom_col(position = "dodge") +
labs(
title = "Product Category Purchases by Gender",
x = "Product Category",
y = "Count"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
This bar graph shows that males significantly purchase more of every product.
chisq.test(table(shopping$Gender, shopping$Category))
##
## Pearson's Chi-squared test
##
## data: table(shopping$Gender, shopping$Category)
## X-squared = 0.59842, df = 3, p-value = 0.8968
shopping %>%
group_by(Season) %>%
summarise(
avg_spending = mean(`Purchase Amount (USD)`, na.rm = TRUE),
median_spending = median(`Purchase Amount (USD)`, na.rm = TRUE),
count = n()
)
## # A tibble: 4 × 4
## Season avg_spending median_spending count
## <chr> <dbl> <dbl> <int>
## 1 Fall 61.6 62 975
## 2 Spring 58.7 58 999
## 3 Summer 58.4 58 955
## 4 Winter 60.4 62 971
shopping %>%
group_by(Season) %>%
summarise(avg_spending = mean(`Purchase Amount (USD)`, na.rm = TRUE)) %>%
ggplot(aes(x = Season, y = avg_spending, fill = Season)) +
geom_col() +
labs(
title = "Average Purchase Amount by Season",
x = "Season",
y = "Average Purchase Amount (USD)"
) +
theme_minimal()
The bar graph shows that people tend to spend more during the fall and winter seasons. This could be due to end of summer sales during the fall and holiday shopping during the winter. Businesses could use this analysis to predict revenue amounts and promote higher-margin or bundled products.