R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Import Dataset

library(tidyverse)

# Import the dataset
shopping <- read_csv("shopping_behavior_updated (1).csv")

# View first few rows
head(shopping)
## # A tibble: 6 × 18
##   `Customer ID`   Age Gender `Item Purchased` Category `Purchase Amount (USD)`
##           <dbl> <dbl> <chr>  <chr>            <chr>                      <dbl>
## 1             1    55 Male   Blouse           Clothing                      53
## 2             2    19 Male   Sweater          Clothing                      64
## 3             3    50 Male   Jeans            Clothing                      73
## 4             4    21 Male   Sandals          Footwear                      90
## 5             5    45 Male   Blouse           Clothing                      49
## 6             6    46 Male   Sneakers         Footwear                      20
## # ℹ 12 more variables: Location <chr>, Size <chr>, Color <chr>, Season <chr>,
## #   `Review Rating` <dbl>, `Subscription Status` <chr>, `Shipping Type` <chr>,
## #   `Discount Applied` <chr>, `Promo Code Used` <chr>,
## #   `Previous Purchases` <dbl>, `Payment Method` <chr>,
## #   `Frequency of Purchases` <chr>

Average Spending by Gender

avg_spending <- shopping %>%
  group_by(Gender) %>%
  summarise(
    average_spent = mean(`Purchase Amount (USD)`, na.rm = TRUE),
    median_spent = median(`Purchase Amount (USD)`, na.rm = TRUE),
    count = n()
  )

avg_spending
## # A tibble: 2 × 4
##   Gender average_spent median_spent count
##   <chr>          <dbl>        <dbl> <int>
## 1 Female          60.2           60  1248
## 2 Male            59.5           60  2652
shopping %>%
  group_by(Gender) %>%
  summarise(average_spent = mean(`Purchase Amount (USD)`, na.rm = TRUE)) %>%
  ggplot(aes(x = Gender, y = average_spent, fill = Gender)) +
  geom_col() +
  labs(
    title = "Average Purchase Amount by Gender",
    x = "Gender",
    y = "Average Purchase Amount (USD)"
  ) +
  theme_minimal()

This bar graph shows the average amount spent between Female and Male. The two groups are almost identical with both having a spending average of about $60.

Do Men and Women Spend Differently?

t_test_result <- t.test(`Purchase Amount (USD)` ~ Gender, data = shopping)
t_test_result
## 
##  Welch Two Sample t-test
## 
## data:  Purchase Amount (USD) by Gender
## t = 0.88214, df = 2479.1, p-value = 0.3778
## alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
## 95 percent confidence interval:
##  -0.8719417  2.2979409
## sample estimates:
## mean in group Female   mean in group Male 
##              60.2492              59.5362

The t-test further confirms the average spending analysis. Gender and spending relationship are not statistically significant. (p-value > .05)

ggplot(shopping, aes(x = Gender, y = `Purchase Amount (USD)`, fill = Gender)) +
  geom_boxplot(alpha = 0.7) +
  labs(
    title = "Spending Distribution by Gender",
    x = "Gender",
    y = "Purchase Amount (USD)"
  ) +
  theme_minimal()

The boxplot shows both groups having the same median of about $60. If we look closely, Females do show a slightly higher purchase amount and Males show a slightly less purchase amount however, it is not statistically significant to their gender.

Compare Average Purchase Frequency By Gender

shopping %>%
  group_by(Gender) %>%
  summarise(
    avg_frequency = mean(`Frequency of Purchases`, na.rm = TRUE),
    median_frequency = median(`Frequency of Purchases`, na.rm = TRUE),
    sd_frequency = sd(`Frequency of Purchases`, na.rm = TRUE),
    count = n()
  )
## Warning: There were 6 warnings in `summarise()`.
## The first warning was:
## ℹ In argument: `avg_frequency = mean(`Frequency of Purchases`, na.rm = TRUE)`.
## ℹ In group 1: `Gender = "Female"`.
## Caused by warning in `mean.default()`:
## ! argument is not numeric or logical: returning NA
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 5 remaining warnings.
## # A tibble: 2 × 5
##   Gender avg_frequency median_frequency sd_frequency count
##   <chr>          <dbl>            <dbl>        <dbl> <int>
## 1 Female            NA               NA           NA  1248
## 2 Male              NA               NA           NA  2652
# remove NA rows
shopping_freq <- shopping %>%
  filter(Gender %in% c("Male", "Female")) %>%
  filter(!is.na(`Frequency of Purchases`))

# Create contingency table
freq_table <- table(shopping_freq$Gender, shopping_freq$`Frequency of Purchases`)

# Chi-square test
chisq_freq_test <- chisq.test(freq_table)
chisq_freq_test
## 
##  Pearson's Chi-squared test
## 
## data:  freq_table
## X-squared = 3.928, df = 6, p-value = 0.6864
ggplot(shopping_freq, aes(x = `Frequency of Purchases`, fill = Gender)) +
  geom_bar(position = "dodge") +
  labs(
    title = "Purchase Frequency by Gender",
    x = "Frequency Category",
    y = "Count"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

This bar graph shows that males tend to make significantly more frequent purchases compared to females in every category.

Difference of Product Purchases by Gender

shopping %>%
  count(Gender, Category) %>%
  ggplot(aes(x = Category, y = n, fill = Gender)) +
  geom_col(position = "dodge") +
  labs(
    title = "Product Category Purchases by Gender",
    x = "Product Category",
    y = "Count"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

This bar graph shows that males significantly purchase more of every product.

chisq.test(table(shopping$Gender, shopping$Category))
## 
##  Pearson's Chi-squared test
## 
## data:  table(shopping$Gender, shopping$Category)
## X-squared = 0.59842, df = 3, p-value = 0.8968

Business Implication: Does Season Affect Spending?

shopping %>%
  group_by(Season) %>%
  summarise(
    avg_spending = mean(`Purchase Amount (USD)`, na.rm = TRUE),
    median_spending = median(`Purchase Amount (USD)`, na.rm = TRUE),
    count = n()
  )
## # A tibble: 4 × 4
##   Season avg_spending median_spending count
##   <chr>         <dbl>           <dbl> <int>
## 1 Fall           61.6              62   975
## 2 Spring         58.7              58   999
## 3 Summer         58.4              58   955
## 4 Winter         60.4              62   971
shopping %>%
  group_by(Season) %>%
  summarise(avg_spending = mean(`Purchase Amount (USD)`, na.rm = TRUE)) %>%
  ggplot(aes(x = Season, y = avg_spending, fill = Season)) +
  geom_col() +
  labs(
    title = "Average Purchase Amount by Season",
    x = "Season",
    y = "Average Purchase Amount (USD)"
  ) +
  theme_minimal()

The bar graph shows that people tend to spend more during the fall and winter seasons. This could be due to end of summer sales during the fall and holiday shopping during the winter. Businesses could use this analysis to predict revenue amounts and promote higher-margin or bundled products.