ConsumerBehavior-DataAnalysis

Setting up my environment

Notes: Setting up my environment by loading the ‘tidyverse’, ‘skimr’ and ‘janitor’ packages.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(skimr)
library(janitor)

## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

Importing data

customerbehavior_df <- read_csv('CleanedCustomerBehavior.csv')

## Rows: 3900 Columns: 21
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (15): Gender, Item Purchased, Category, Location, Size, Color, Season, S...
## dbl  (6): Customer ID, Age, Purchase Amount, Review Rating, Previous Purchas...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Data exploration

head(customerbehavior_df)

## # A tibble: 6 × 21
##   `Customer ID`   Age Gender `Item Purchased` Category `Purchase Amount`
##           <dbl> <dbl> <chr>  <chr>            <chr>                <dbl>
## 1             1    55 Male   Blouse           Clothing                53
## 2             2    19 Male   Sweater          Clothing                64
## 3             3    50 Male   Jeans            Clothing                73
## 4             4    21 Male   Sandals          Footwear                90
## 5             5    45 Male   Blouse           Clothing                49
## 6             6    46 Male   Sneakers         Footwear                20
## # ℹ 15 more variables: Location <chr>, Size <chr>, Color <chr>, Season <chr>,
## #   `Review Rating` <dbl>, `Subscription Status` <chr>, `Shipping Type` <chr>,
## #   `Discount Applied` <chr>, `Promo Code Used` <chr>,
## #   `Previous Purchases` <dbl>, `Payment Method` <chr>,
## #   `Frequency of Purchases` <chr>, Sentiment <chr>, `Age Group` <chr>,
## #   CLV <dbl>

ANOVA test - Purchase Amount vs. Category

Performing a statistical test (ANOVA) to check whether customers spend significantly more on any particular product category.

anova_result1 <- aov(`Purchase Amount` ~ Category, data=customerbehavior_df)
summary(anova_result1)

##               Df  Sum Sq Mean Sq F value Pr(>F)
## Category       3    2446   815.2   1.454  0.225
## Residuals   3896 2184885   560.8

Insights:

The test result (p-value = 0.225) indicates no significant difference in average purchase amount across categories.

Boxplot for ANOVA

ggplot(customerbehavior_df, aes(x=Category, y=`Purchase Amount`)) +
  geom_boxplot(fill='lightpink') +
  theme_classic() +
  labs(title='Purchase Amount by Category', x='Category', y='Purchase Amount')

Insights:

The median is very similar across all categories.
Categories like Accessories, Clothing, and Footwear have nearly the same median (around 60), while Outerwear is slightly lower.
No significant difference in customer spending across product categories.
This aligns with your ANOVA result (p-value = 0.225), which suggests no statistically significant difference in average purchase amount across categories.
This insight shows that customers spend consistently across all categories.

ANOVA test - Purchase Amount vs.Season

Performing a statistical test (ANOVA) to check whether customers spend is influenced by any particular season

anova_result2 <- aov(`Purchase Amount` ~ Season, data=customerbehavior_df)
summary(anova_result2)

##               Df  Sum Sq Mean Sq F value Pr(>F)  
## Season         3    6291  2097.1   3.746 0.0106 *
## Residuals   3896 2181039   559.8                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Boxplot for ANOVA

ggplot(customerbehavior_df, aes(x=Season, y=`Purchase Amount`)) +
  geom_boxplot(fill='purple') +
  theme_classic() +
  labs(title='Purchase Amount by Season', x='Season', y='Purchase Amount')

Insights:

Fall and Winter have slightly higher median spending.
Since Fall and Winter show higher average spend and the difference is statistically significant, marketing strategies like seasonal discounts, holiday promotions, and inventory planning can be optimized during these periods.
Product upselling can also be tested during high-spend seasons to boost revenue further.
Stakeholders can try targeted seasonal marketing in colder months.

Chi-Square Test: Subscription Status vs Promo Usage

chisq.test(table(customerbehavior_df$`Subscription Status`, customerbehavior_df$`Promo Code Used`))

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  table(customerbehavior_df$`Subscription Status`, customerbehavior_df$`Promo Code Used`)
## X-squared = 1908.9, df = 1, p-value < 2.2e-16

Insights:

We can reject the null hypothesis because there is a statistically significant relationship between subscription status and whether the customer used a promo code.
Subscribers and non-subscribers behave differently when it comes to using promo codes.

Plotting the relationship

ggplot(customerbehavior_df, aes(x = `Subscription Status`, fill = `Promo Code Used`)) +
  geom_bar(position = "fill") +
  scale_y_continuous(labels = scales::percent) +
  labs(
    title = "Promo Code Usage by Subscription Status",
    x = "Subscription Status",
    y = "Proportion",
    fill = "Promo Code Used"
  ) +
  theme_minimal()

Insights:

Promo codes are likely exclusive to subscribers or actively promoted among them.
This suggests the subscription model adds value, possibly encouraging promo-driven purchases.
Promoting the benefit of subscriptions more clearly to non-subscribers would be recommended.

ConsumerBehavior-DataAnalysis

Shijin Ramesh

2025-07-04

Setting up my environment

Importing data

Data exploration

ANOVA test - Purchase Amount vs. Category

Performing a statistical test (ANOVA) to check whether customers spend significantly more on any particular product category.

Insights:

Boxplot for ANOVA

Insights:

ANOVA test - Purchase Amount vs.Season

Performing a statistical test (ANOVA) to check whether customers spend is influenced by any particular season

Boxplot for ANOVA

Insights:

Plotting the relationship

Insights: