The Online Shopping Dataset compiles a wealth of information regarding customer transactions in an e-commerce platform. It encompasses various aspects such as customer demographics (CustomerID, Gender, Location), transaction details (Transaction_ID, Transaction_Date), product specifics (Product_SKU, Product_Category), and transactional elements (Quantity, Avg_Price, Discount_pct). Additionally, it includes spending breakdowns (Offline_Spend, Online_Spend), coupon-related data (Coupon_Status, Coupon_Code). This dataset offers a comprehensive view of customer behavior, purchasing patterns, and the effectiveness of promotional strategies in the realm of online shopping.
Customer Behavior and Product Insights:
Which product categories are most frequently purchased? Can we visualize this trend over time?
Is there any correlation between tenure (length of association) and the average price or quantity of products purchased?
Coupon Analysis:
What is the distribution of coupon usage among different product categories or locations?
Can we analyze the impact of different coupon codes on the average transaction amount or quantity purchased?
Geographical Patterns and Spending Habits:
How does customer spending behavior vary across different locations?
Can we visualize the distribution of offline versus online spending across different regions or customer segments?
Temporal Trends and Seasonality:
Are there seasonal trends in purchases? How do they vary across product categories?
Can we identify any patterns in offline or online spending behavior over the months?
Impact of Discounts and GST:
Is there a noticeable relationship between the discount percentage applied and the quantity or total spend in transactions?
How does the GST affect the total transaction amount, and is there a variation across product categories or locations?
Performance Metrics:
# installing necessary packages
install.packages(c("tidyverse", "plotly", "ggthemes", "tinytex"),
repos = "https://cran.rstudio.com/")## package 'tidyverse' successfully unpacked and MD5 sums checked
## package 'plotly' successfully unpacked and MD5 sums checked
## package 'ggthemes' successfully unpacked and MD5 sums checked
## package 'tinytex' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\HP\AppData\Local\Temp\RtmpyKkqgt\downloaded_packages
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
library(ggthemes)
library(lubridate)
library(knitr)
# importing the dataset
on_shop <- read.csv("F:\\R practice\\R Projects\\Project 6\\Online Shopping Dataset\\Online_shopping.csv")
# Checking head and str()
str(on_shop)## 'data.frame': 52955 obs. of 20 variables:
## $ CustomerID : int 17850 17850 17850 17850 17850 17850 17850 17850 17850 17850 ...
## $ Gender : chr "M" "M" "M" "M" ...
## $ Location : chr "Chicago" "Chicago" "Chicago" "Chicago" ...
## $ Tenure_Months : int 12 12 12 12 12 12 12 12 12 12 ...
## $ Transaction_ID : int 16679 16680 16696 16699 16700 16701 16702 16703 16704 16710 ...
## $ Transaction_Date : chr "1/1/2019" "1/1/2019" "1/1/2019" "1/1/2019" ...
## $ Product_SKU : chr "GGOENEBJ079499" "GGOENEBJ079499" "GGOENEBQ078999" "GGOENEBQ079099" ...
## $ Product_Description: chr "Nest Learning Thermostat 3rd Gen-USA - Stainless Steel" "Nest Learning Thermostat 3rd Gen-USA - Stainless Steel" "Nest Cam Outdoor Security Camera - USA" "Nest Protect Smoke + CO White Battery Alarm-USA" ...
## $ Product_Category : chr "Nest-USA" "Nest-USA" "Nest-USA" "Nest-USA" ...
## $ Quantity : int 1 1 2 1 1 1 2 2 1 1 ...
## $ Avg_Price : num 153.7 153.7 122.8 81.5 153.7 ...
## $ Delivery_Charges : num 6.5 6.5 6.5 6.5 6.5 ...
## $ Coupon_Status : chr "Used" "Used" "Not Used" "Clicked" ...
## $ GST : num 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 ...
## $ Date : chr "1/1/2019" "1/1/2019" "1/1/2019" "1/1/2019" ...
## $ Offline_Spend : int 4500 4500 4500 4500 4500 4500 4500 4500 4500 4500 ...
## $ Online_Spend : num 2424 2424 2424 2424 2424 ...
## $ Month : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Coupon_Code : chr "ELEC10" "ELEC10" "ELEC10" "ELEC10" ...
## $ Discount_pct : int 10 10 10 10 10 10 10 10 10 10 ...
## CustomerID Gender Location Tenure_Months Transaction_ID Transaction_Date
## 1 17850 M Chicago 12 16679 1/1/2019
## 2 17850 M Chicago 12 16680 1/1/2019
## 3 17850 M Chicago 12 16696 1/1/2019
## 4 17850 M Chicago 12 16699 1/1/2019
## 5 17850 M Chicago 12 16700 1/1/2019
## 6 17850 M Chicago 12 16701 1/1/2019
## Product_SKU Product_Description
## 1 GGOENEBJ079499 Nest Learning Thermostat 3rd Gen-USA - Stainless Steel
## 2 GGOENEBJ079499 Nest Learning Thermostat 3rd Gen-USA - Stainless Steel
## 3 GGOENEBQ078999 Nest Cam Outdoor Security Camera - USA
## 4 GGOENEBQ079099 Nest Protect Smoke + CO White Battery Alarm-USA
## 5 GGOENEBJ079499 Nest Learning Thermostat 3rd Gen-USA - Stainless Steel
## 6 GGOENEBJ079499 Nest Learning Thermostat 3rd Gen-USA - Stainless Steel
## Product_Category Quantity Avg_Price Delivery_Charges Coupon_Status GST
## 1 Nest-USA 1 153.71 6.5 Used 0.1
## 2 Nest-USA 1 153.71 6.5 Used 0.1
## 3 Nest-USA 2 122.77 6.5 Not Used 0.1
## 4 Nest-USA 1 81.50 6.5 Clicked 0.1
## 5 Nest-USA 1 153.71 6.5 Clicked 0.1
## 6 Nest-USA 1 153.71 6.5 Clicked 0.1
## Date Offline_Spend Online_Spend Month Coupon_Code Discount_pct
## 1 1/1/2019 4500 2424.5 1 ELEC10 10
## 2 1/1/2019 4500 2424.5 1 ELEC10 10
## 3 1/1/2019 4500 2424.5 1 ELEC10 10
## 4 1/1/2019 4500 2424.5 1 ELEC10 10
## 5 1/1/2019 4500 2424.5 1 ELEC10 10
## 6 1/1/2019 4500 2424.5 1 ELEC10 10
## [1] TRUE
## CustomerID Tenure_Months Transaction_ID Quantity
## 31 31 31 31
## Avg_Price Delivery_Charges GST Offline_Spend
## 31 31 31 31
## Online_Spend Discount_pct
## 31 400
Dealing missing values :
Substituting 400 missing ‘Discount_pct’ values with zeros prevented significant data loss, avoiding the need to delete those rows.
## [1] FALSE
# Dropping rows with null values in key fields preserved essential data integrity.
on_shop <- na.omit(on_shop)
any(is.na(on_shop))## [1] FALSE
## CustomerID Gender Location Tenure_Months
## Min. :12346 Length:52924 Length:52924 Min. : 2.00
## 1st Qu.:13869 Class :character Class :character 1st Qu.:15.00
## Median :15311 Mode :character Mode :character Median :27.00
## Mean :15347 Mean :26.13
## 3rd Qu.:16996 3rd Qu.:37.00
## Max. :18283 Max. :50.00
## Transaction_ID Transaction_Date Product_SKU Product_Description
## Min. :16679 Length:52924 Length:52924 Length:52924
## 1st Qu.:25384 Class :character Class :character Class :character
## Median :32626 Mode :character Mode :character Mode :character
## Mean :32410
## 3rd Qu.:39126
## Max. :48497
## Product_Category Quantity Avg_Price Delivery_Charges
## Length:52924 Min. : 1.000 Min. : 0.39 Min. : 0.00
## Class :character 1st Qu.: 1.000 1st Qu.: 5.70 1st Qu.: 6.00
## Mode :character Median : 1.000 Median : 16.99 Median : 6.00
## Mean : 4.498 Mean : 52.24 Mean : 10.52
## 3rd Qu.: 2.000 3rd Qu.:102.13 3rd Qu.: 6.50
## Max. :900.000 Max. :355.74 Max. :521.36
## Coupon_Status GST Date Offline_Spend
## Length:52924 Min. :0.0500 Length:52924 Min. : 500
## Class :character 1st Qu.:0.1000 Class :character 1st Qu.:2500
## Mode :character Median :0.1800 Mode :character Median :3000
## Mean :0.1375 Mean :2831
## 3rd Qu.:0.1800 3rd Qu.:3500
## Max. :0.1800 Max. :5000
## Online_Spend Month Coupon_Code Discount_pct
## Min. : 320.2 Min. : 1.000 Length:52924 Min. : 0.0
## 1st Qu.:1252.6 1st Qu.: 4.000 Class :character 1st Qu.:10.0
## Median :1837.9 Median : 7.000 Mode :character Median :20.0
## Mean :1893.1 Mean : 6.652 Mean :19.8
## 3rd Qu.:2425.3 3rd Qu.: 9.000 3rd Qu.:30.0
## Max. :4556.9 Max. :12.000 Max. :30.0
# Converting date from 'character' to 'date' format
on_shop$Transaction_Date <- mdy(on_shop$Transaction_Date)
# extracting transaction day, month, year, week day.
on_shop <- on_shop %>%
mutate(trans_day = day(Transaction_Date),
trans_month = month(Transaction_Date),
trans_year = year(Transaction_Date),
trans_week = wday(Transaction_Date))
#'Transaction_date' and 'Date' columns share identical values; thus,
# 'Date' column deletion is necessary.
on_shop <- on_shop %>%
select(-Date)Customer Behavior and Product Insights:
Which product categories are most frequently purchased? Can we visualize this trend over time?
Is there any correlation between tenure (length of association) and the average price or quantity of products purchased?
theme_set(theme_solarized())
prod_cat <- on_shop %>%
group_by(Product_Category) %>%
summarise(Product_Count = n()) %>%
arrange(-Product_Count)
ggplot(prod_cat, aes(x = Product_Category, y = Product_Count)) +
geom_bar(stat = "identity", fill = "#76D7C4") +
theme(axis.text.x = element_text(angle = 45)) +
labs(title = "Purchase trend across product categories",
x = "Category", y = "Count",
subtitle = "Apparel stands as the top-selling category, while Android ranks as the least sold") +
theme(axis.text.x = element_text(size = 8, color = "#39858C", face = "bold")) +
theme(plot.title = element_text(color = "#3498DB", face = "bold")) +
theme(plot.subtitle = element_text(size = 8, color = "#5F75CE", face = "italic"))# Visualizing trend over time
prod_cat2 <- on_shop %>%
group_by(Transaction_Date, Product_Category) %>%
summarise(Count = n()) %>%
arrange(- Count)## `summarise()` has grouped output by 'Transaction_Date'. You can override using
## the `.groups` argument.
k <- ggplot(prod_cat2, aes(x = Transaction_Date, y = Count)) +
geom_line(aes(color = Product_Category), size = 0.2) +
labs(title = "Trend analysis of product categories over time",
x = "Transaction Date", color = "Category") +
theme(plot.title = element_text(color = "#3498DB", face = "bold",
size = 12)) +
theme(legend.text = element_text(size = 8))
# Interactive Viz
ggplotly(k)Interactive viz: Navigate data using the cursor.
# correlation between tenure and quantity purchased
cor(on_shop$Quantity, on_shop$Tenure_Months) # 0.006867134## [1] 0.006867134
# correlation between tenure and avg_price purchased
cor(on_shop$Tenure_Months, on_shop$Avg_Price) # -0.0007849942## [1] -0.0007849942
There is a negligible correlation between tenure and quantity purchased (0.006867134), indicating weak to no relationship. Conversely, there appears to be a slight negative correlation (−0.0007849942) between tenure and average price, suggesting a minimal inverse association.
Coupon Analysis:
What is the distribution of coupon usage among different product categories or locations?
Can we analyze the impact of different coupon codes on the average transaction amount or quantity purchased?
## [1] "CustomerID" "Gender" "Location"
## [4] "Tenure_Months" "Transaction_ID" "Transaction_Date"
## [7] "Product_SKU" "Product_Description" "Product_Category"
## [10] "Quantity" "Avg_Price" "Delivery_Charges"
## [13] "Coupon_Status" "GST" "Offline_Spend"
## [16] "Online_Spend" "Month" "Coupon_Code"
## [19] "Discount_pct" "trans_day" "trans_month"
## [22] "trans_year" "trans_week"
coup_loc <- on_shop %>%
group_by(Coupon_Status, Location) %>%
summarise(Count = n()) %>%
arrange(- Count)## `summarise()` has grouped output by 'Coupon_Status'. You can override using the
## `.groups` argument.
ggplot(coup_loc, aes(x = Coupon_Status, y = Count)) +
geom_col(aes(fill = Location), color = "black") +
labs(title = "Distribution of Coupon Status across Locations",
x = "Coupon Status",
subtitle = "Chicago tops coupon usage, while Washington DC ranks lowest") +
theme(plot.subtitle = element_text(size = 8, color = "#5F75CE", face = "italic")) +
scale_fill_manual(values = c("#2B3E67", "#2D7246",
"#F4FC4A", "#E91601", "#71522A")) +
theme(plot.title = element_text(color = "#3498DB", face = "bold", size = 12))The predominant Coupon Status is ‘Clicked’, signifying its prevalence among transactions. Chicago residents utilize the highest number of coupons, while Washington DC residents use the fewest.
# Product category and coupon code
pcc <- on_shop %>%
group_by(Coupon_Code) %>%
summarise(Count = n()) %>%
arrange(-Count)
# Top 10 most used coupon code
top_10_cc <- head(pcc, 10)
kable(top_10_cc, full_width = FALSE)| Coupon_Code | Count |
|---|---|
| SALE20 | 6373 |
| SALE30 | 5915 |
| SALE10 | 5838 |
| ELEC10 | 4826 |
| ELEC30 | 4647 |
| ELEC20 | 4540 |
| EXTRA10 | 2317 |
| OFF10 | 2250 |
| EXTRA20 | 2211 |
| OFF20 | 2202 |
| Coupon_Code | Count |
|---|---|
| HOU30 | 30 |
| GC30 | 29 |
| AND30 | 16 |
| AND10 | 14 |
| AND20 | 13 |
# Product Category and Coupon
top_codes <- on_shop %>%
group_by(Product_Category, Coupon_Code) %>%
summarise(Count = n()) %>%
arrange(- Count) %>%
head(30)## `summarise()` has grouped output by 'Product_Category'. You can override using
## the `.groups` argument.
interactive <- ggplot(top_codes, aes(x = Product_Category, y = Count)) +
geom_col(color = "black", aes(fill = Coupon_Code)) +
labs(title = "Top Coupon Codes used for Product Purchases",
x = "Product Category", fill = "Coupon Code") +
theme(axis.text.x = element_text(angle = 45)) +
theme(axis.text.x = element_text(size = 8)) +
scale_fill_manual(values = c(
"#336600", "#FF9900", "#000066", "#FF3399", "#660099",
"#66CCFF", "#993200", "#00CC66", "#663300", "#FFCC00",
"#333333", "#FF0033", "#F5F5F5", "#9966CC", "#FAFAFA",
"#3333FF", "#F0F8FF", "#009999", "#E5B7B7", "#66FF33",
"#F08080", "#CC6699", "#DE3B3B", "#FFCC99", "#CD5C5C",
"#FFFF00", "#993232", "#FF99CC", "#8B0000", "#9900FF"
)) +
guides(fill = FALSE) +
theme(plot.title = element_text(color = "#3498DB", face = "bold",
size = 12))
ggplotly(interactive)Interactive viz: Navigate data using the cursor.
# impact of different coupon codes on the average
# transaction amount or quantity purchased
ggplot(on_shop, aes(x = Coupon_Code, y = Avg_Price)) +
geom_boxplot(fill = "#FB3F03",
color = "#714702",
alpha = 0.6) +
theme(axis.text.x = element_text(size = 7,
angle = 90,
color = "#363744")) +
labs(title = "Distribution of Average Price across Coupon Codes",
x = "Coupon Code", y = "Average Price") +
theme(plot.title = element_text(color = "#3498DB", face = "bold",
size = 12))There is a larger variance in average prices for coupon codes GC10, GC20, and GC30, while the distribution of average prices among other coupon codes appears relatively consistent. Coupon code ACC30 stands unique without any coupon pairs, unlike the other codes, which exhibit three distinct categories. Notably, the highest average prices are associated with coupon codes NE10, NE20, and NE30.
Geographical Patterns and Spending Habits:
How does customer spending behavior vary across different locations?
Can we visualize the distribution of offline versus online spending across different regions or customer segments?
# Adding up values of Offline Spend and Online Spend and creating a new column called total_spending
on_shop <- on_shop %>%
mutate(total_spending = Offline_Spend + Online_Spend)
# Combination of violin and boxplot is ideal for understanding the data distribution
ggplot(on_shop, aes(x = Location, y = total_spending)) +
geom_violin(color ="#59905B", fill = "#E0FEAF") +
geom_boxplot(color = "#D08115", width = 0.4,
fill = "#B06860", alpha = 0.5) +
labs(title = "Spendings distribution across locations",
y = "Total Spending (Online + Offline)") +
theme(plot.title = element_text(color = "#3498DB", face = "bold", size = 12))The minimum spending in Washington DC stands notably higher than in other locations, while spending patterns across all other locations appear similar. California exhibits the highest maximum spending density, while New Jersey registers the lowest.
# offline vs online
on_off <- on_shop %>%
select(Location, Online_Spend, Offline_Spend) %>%
group_by(Location) %>%
summarise(Total_Online_Spend = sum(Online_Spend),
Total_Offline_Spend = sum(Offline_Spend))
# Reshaping data to long format for plotting
on_off_long <- on_off %>%
pivot_longer(cols = c(Total_Online_Spend, Total_Offline_Spend),
names_to = "Spend_Type",
values_to = "Total_Spend")
ggplot(on_off_long, aes(x = Location, y = Total_Spend, fill = Spend_Type)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Offline vs Online spending distribution by location",
x = "Location",
y = "Total Spending",
fill = "Spend Type") +
theme(axis.text.x = element_text(angle = 40, hjust = 1)) +
scale_fill_manual(values = c("#0D126C", "#F701FE")) +
theme(plot.title = element_text(size = 12, color = "#3498DB",
face = "bold")) +
theme(legend.text = element_text(size = 7, face = "bold"),
legend.title = element_text(size = 8))Temporal Trends and Seasonality:
Are there seasonal trends in purchases? How do they vary across product categories?
Can we identify any patterns in offline or online spending behavior over the months?
purchase_trends <- ggplot(on_shop, aes(x = Product_Category)) +
geom_bar(aes(fill = Product_Category)) +
facet_wrap(~ factor(trans_month), nrow = 4) +
theme(axis.text.x = element_blank()) +
labs(title = "Monthly purchase trends by product category",
x = "Product Category",
y = "Total Purchases") +
theme(plot.title = element_text(size = 12, color = "#3498DB",
face = "bold")) +
theme(legend.text = element_text(size = 9, face = "bold"),
legend.title = element_text(size = 9))
ggplotly(purchase_trends)‘Apparel’ demonstrates elevated sales in July and August, with the lowest recorded during November. ‘Nest_USA’ experiences heightened sales in January and December. ‘Bags’ exhibit increased sales during July and August, while ‘Drinkware’ sees higher sales in March and August. ‘Lifestyle’ products depict a steady sales increase from March to October, followed by a decline after October. The ‘Nest’ category demonstrates an increasing trend from August to December.
# Finding patterns in offline and online spending behavior
spend_by_month <- on_shop %>%
group_by(trans_month) %>%
summarise(Total_Online_Spend = sum(Online_Spend),
Total_Offline_Spend = sum(Offline_Spend))
# Reshaping data to long format for plotting
spend_by_month_long <- spend_by_month %>%
pivot_longer(cols = c(Total_Online_Spend, Total_Offline_Spend),
names_to = "Spend_Type",
values_to = "Total_Spend")
ggplot(spend_by_month_long, aes(x = trans_month,
y = Total_Spend,
color = Spend_Type)) +
geom_line(linewidth = 1) +
geom_point(size = 4) +
scale_x_continuous(breaks = 1:12, labels = month.abb[1:12]) +
labs(title = "Montly Offline vs Online Purchase Trend",
x = "Transaction Month", y = "Total Spend",
subtitle = "Sales peak notably in August and December",
color = "Spend Type") +
theme(axis.text.x = element_text(size = 7, face = "bold")) +
theme(plot.subtitle = element_text(size = 9, face = "italic",
color = "#5F75CE")) +
scale_color_brewer(palette = "Set1") +
theme(plot.title = element_text(color = "#3498DB", face = "bold", size = 12)) +
theme(legend.text = element_text(size = 7),
legend.title = element_text(size = 8))Offline transactions surpass online transactions in volume. Sales peak in August and December, whereas they reach a minimum in February, June, and September.
Impact of Discounts and GST:
Is there a noticeable relationship between the discount percentage applied and the quantity or total spend in transactions?
How does the GST affect the total transaction amount, and is there a variation across product categories or locations?
analysis_5 <-on_shop %>%
select(Discount_pct, Quantity, Offline_Spend, Online_Spend, Location, GST) %>%
mutate(Total_Spend = Offline_Spend + Online_Spend) %>%
filter(Discount_pct > 0)
attach(analysis_5)
cor(Discount_pct, Total_Spend) ## [1] 0.04769603
## [1] -0.01138081
The correlation between Discount Percentage and Total_Spend is weak (0.04769603), suggesting little to no relationship. Conversely, a negative correlation (-0.01138081) exists between Discount Percentage and Quantity.
# Finding the total Amount including GST
# Add GST: GST amount = (Original cost x GST%)/100.
# Net price = original cost + GST amount.
analysis_5 <- analysis_5 %>%
mutate(GST_Amount = (Total_Spend * GST))
analysis_5$GST_Amount <- analysis_5$Total_Spend + analysis_5$GST_Amount
gst_locate <- analysis_5 %>%
group_by(Location) %>%
summarise(Without_GST = sum(Total_Spend),
With_GST = sum(GST_Amount))
# Reshaping for visualization purpose
gst_locate <- gst_locate %>%
pivot_longer(cols = c(Without_GST, With_GST),
names_to = "Amount_Type",
values_to = "Total_Amount")
ggplot(gst_locate, aes(x = Location, y = Total_Amount, fill = Amount_Type)) +
geom_bar(stat = "identity", position = "dodge") +
theme(axis.text.x = element_text(size = 8),
plot.title = element_text(size = 13, face = "bold")) +
labs(title = "Comparing total amounts with and without GST",
y = "Total Amount", fill = "Amount Type") +
scale_fill_manual(values = c("#D42626", "#169483")) +
theme(plot.title = element_text(color = "#3498DB", face = "bold")) +
theme(legend.text = element_text(size = 8))The highest impact of GST was observed in New Jersey, resulting in a 13.73% increase from the original amount. California and New York experienced the lowest impact, with a 13.58% increase due to GST.
Performance Metrics:
# Average Transaction Value
avg_transaction_value <- mean(on_shop$total_spending)
print(avg_transaction_value)## [1] 4724.023
# Customer Acquisition Trends Over Time
c_aq_trend <- on_shop %>%
group_by(Transaction_Date) %>%
summarise(New_Customers = n_distinct(CustomerID)) %>%
arrange(- New_Customers)
# Plotting Customer Acquisition Trends Over Time
ggplot(c_aq_trend, aes(x = Transaction_Date, y = New_Customers)) +
geom_line(color = "#2C3E50", size = 1) +
labs(title = "Customer Acquisition Trend Over Time",
x = "Transaction Date", y = "New Customers",
subtitle = "August saw the highest influx of new customers") +
theme(plot.title = element_text(color = "#3498DB", face = "bold")) +
theme(plot.subtitle = element_text(size = 10, face = "italic",
color = "#5F75CE"))Product Insights:
Apparel emerged as the most frequently purchased category, while Android products were the least sold.
Purchase trends across categories varied throughout the year, with distinct peaks for specific categories in different months.
Coupon Analysis:
Chicago exhibited the most significant coupon usage, while Washington DC had the least.
Different coupon codes affected average transaction prices differently, with varying price fluctuations among codes.
Geographical Spending Patterns:
Washington DC showed notably higher minimum spending compared to other locations.
Offline transactions dominated over online transactions, and peak sales occurred in August and December.
Discount and Quantity Relationship:
GST Impact:
The project provided insights into customer behavior, preferences, and seasonal trends in the online shopping domain, shedding light on various nuances and trends within the dataset.