Customer.ID Age Gender Annual.Income..k.
Length:5000 Min. :18.00 Length:5000 Min. : -3.00
Class :character 1st Qu.:30.00 Class :character 1st Qu.: 45.00
Mode :character Median :43.00 Mode :character Median : 59.00
Mean :43.17 Mean : 59.39
3rd Qu.:56.00 3rd Qu.: 73.00
Max. :69.00 Max. :123.00
Purchase.Amount.... Product.Category Purchase.Date Payment.Method
Min. : 25.51 Length:5000 Length:5000 Length:5000
1st Qu.:166.54 Class :character Class :character Class :character
Median :199.07 Mode :character Mode :character Mode :character
Mean :199.55
3rd Qu.:232.80
Max. :365.88
Loyalty.Program.Member Location
Length:5000 Length:5000
Class :character Class :character
Mode :character Mode :character
For both Annual Income and Purchase Amount the Median and the Mean were nearly identical. However the Standard Deviation for Annual Income was a bit low, showing the varying economic status’ of the shoppers.
mean(Groceries$Annual.Income..k.)
[1] 59.39
median(Groceries$Annual.Income..k)
[1] 59
sd(Groceries$Annual.Income..k)
[1] 20.0141
mean(Groceries$Purchase.Amount....)
[1] 199.5522
median(Groceries$Purchase.Amount...)
[1] 199.075
sd(Groceries$Purchase.Amount...)
[1] 48.65077
One thing I’ve noticed is that all the categories have roughly the same spending amount. Another is that people spent the most on beauty while spending the least on books. This is probably due to beauty products being so expensive while books are generally cheap.
ggplot(Groceries, mapping =aes(x = Age, y = Purchase.Amount...., fill = Loyalty.Program.Member)) +geom_col() +scale_fill_manual(values =c("Yes"="green", "No"="orange")) +labs(title ="Purchase Amount by Age & Loyalty", y ="Purchase Amount", x ="Age") +theme_minimal()
GRAPH 2
ggplot(Groceries, aes(x = Age, y = Purchase.Amount...., fill = Gender)) +geom_area(color ="black", linewidth =0.7) +scale_fill_brewer(palette ="Set2") +labs(title ="Purchase power by Age and Gender",x ="Age",y ="Purchase Amount") +theme_dark()
ggplot(data = Groceries, mapping =aes( x = Purchase.Amount...., fill = Payment.Method)) +geom_histogram() +ggtitle("Purchase Amount to Payment method") +scale_color_manual(values =c("Cash"="green", "Debit Card"="yellow", "Credit Card"="lightblue", "Online Payment"="red"))
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: No shared levels found between `names(values)` of the manual scale and the
data's colour values.
My key findings from the analysis was that age did not affect the purchase amount that much. In the graph 1 week can see that spending patterns are generally consistent throughout ages 18-70. Where we see major differences are between genders and loyalty members. According to graph 2, females spend noticeably more than their male and non-binary counterparts. Looking back at graph 1, the other distinguishable difference came from comparing loyalty program membership and it should be that non members had almost twice the overall spending of loyalty members. However, I suspect this is because there are simply way more non loyalty members than loyalty members. Finally in graph 3 I simply compared the purchase amount to the payment method and found that the majority of people still used cash while online purchase had the least.