This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
#Load Required Libraries
library(data.table)
## Warning: package 'data.table' was built under R version 4.1.3
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.1.3
library(ggmosaic)
## Warning: package 'ggmosaic' was built under R version 4.1.3
library(readr)
#Load the data
transdata <- read.csv("C:/Users/kashi/Downloads/QVI_transaction_data.csv")
purchasebehavior <- read.csv("C:/Users/kashi/Downloads/QVI_purchase_behaviour.csv")
##RUN EXPLORATORY ANALYSIS The first step in any analysis is to first understand the data.
#Examine Transaction Data
str(transdata)
## 'data.frame': 264836 obs. of 8 variables:
## $ DATE : chr "2018-10-17" "2019-05-14" "2019-05-20" "2018-09-05" ...
## $ STORE_NBR : int 1 1 1 1 1 1 1 1 1 1 ...
## $ LYLTY_CARD_NBR: int 1000 1307 1343 1052 1081 1081 1081 1081 1184 1307 ...
## $ TXN_ID : int 1 348 383 57 92 93 94 95 216 346 ...
## $ PROD_NBR : int 5 66 61 44 17 96 8 57 2 96 ...
## $ PROD_NAME : chr "Natural Chip Compny SeaSalt175g" "CCs Nacho Cheese 175g" "Smiths Crinkle Cut Chips Chicken 170g" "Thins Chips Light& Tangy 175g" ...
## $ PROD_QTY : int 2 3 2 1 1 2 1 1 1 2 ...
## $ TOT_SALES : num 6 6.3 2.9 3.3 4.6 3.8 2.9 5.1 3.8 3.8 ...
##Convert date to date format
transdata$DATE <- as.Date(transdata$DATE)
View(transdata)
#Examine PROD_NAME
setDT(transdata)
transdata[, .N , PROD_NAME]
## PROD_NAME N
## 1: Natural Chip Compny SeaSalt175g 1468
## 2: CCs Nacho Cheese 175g 1498
## 3: Smiths Crinkle Cut Chips Chicken 170g 1484
## 4: Thins Chips Light& Tangy 175g 3188
## 5: Kettle Sensations BBQ&Maple 150g 3083
## ---
## 110: Smiths Crinkle Cut Snag&Sauce 150g 1503
## 111: Thins Chips Originl saltd 175g 1441
## 112: Cobs Popd Swt/Chlli &Sr/Cream Chips 110g 3269
## 113: Pringles Slt Vingar 134g 3095
## 114: Pringles Original Crisps 134g 3157
==Some Text Analysis==
#Check for any incorrect entries
prodWords <- data.table(unlist(strsplit(unique(transdata[, PROD_NAME]), " ")))
setnames(prodWords, 'words')
Remove all words with digits and special characters
#remove digits
prodWords <- prodWords[grepl("\\d", words) == FALSE, ]
#remove special characters
prodWords <- prodWords[grepl("[:alpha:]" , words), ]
#count frequency of words & sort by highest to lowest
prodWords[, .N, words][order(N, decreasing = TRUE)]
## words N
## 1: Chips 21
## 2: Smiths 16
## 3: Crinkle 14
## 4: Kettle 13
## 5: Cheese 12
## ---
## 127: saltd 1
## 128: Swt/Chlli 1
## 129: &Sr/Cream 1
## 130: Slt 1
## 131: Vingar 1
As we are only interested in chips, so we will remove salsa products.
#remove salsa products
transdata[, SALSA := grepl("salsa", tolower(PROD_NAME))]
transdata <- transdata[SALSA == FALSE, ][, SALSA := NULL]
==Summary of Data to Check Nulls and Outliers
summary(transdata)
## DATE STORE_NBR LYLTY_CARD_NBR TXN_ID
## Min. :2018-07-01 Min. : 1.0 Min. : 1000 Min. : 1
## 1st Qu.:2018-09-30 1st Qu.: 70.0 1st Qu.: 70015 1st Qu.: 67569
## Median :2018-12-30 Median :130.0 Median : 130367 Median : 135183
## Mean :2018-12-30 Mean :135.1 Mean : 135531 Mean : 135131
## 3rd Qu.:2019-03-31 3rd Qu.:203.0 3rd Qu.: 203084 3rd Qu.: 202654
## Max. :2019-06-30 Max. :272.0 Max. :2373711 Max. :2415841
## PROD_NBR PROD_NAME PROD_QTY TOT_SALES
## Min. : 1.00 Length:246742 Min. : 1.000 Min. : 1.700
## 1st Qu.: 26.00 Class :character 1st Qu.: 2.000 1st Qu.: 5.800
## Median : 53.00 Mode :character Median : 2.000 Median : 7.400
## Mean : 56.35 Mean : 1.908 Mean : 7.321
## 3rd Qu.: 87.00 3rd Qu.: 2.000 3rd Qu.: 8.800
## Max. :114.00 Max. :200.000 Max. :650.000
#Filter data set to find outliers
transdata[PROD_QTY == 200, ]
## DATE STORE_NBR LYLTY_CARD_NBR TXN_ID PROD_NBR
## 1: 2018-08-19 226 226000 226201 4
## 2: 2019-05-20 226 226000 226210 4
## PROD_NAME PROD_QTY TOT_SALES
## 1: Dorito Corn Chp Supreme 380g 200 650
## 2: Dorito Corn Chp Supreme 380g 200 650
#Let's see if customer has some other transactions
transdata[LYLTY_CARD_NBR == 226000, ]
## DATE STORE_NBR LYLTY_CARD_NBR TXN_ID PROD_NBR
## 1: 2018-08-19 226 226000 226201 4
## 2: 2019-05-20 226 226000 226210 4
## PROD_NAME PROD_QTY TOT_SALES
## 1: Dorito Corn Chp Supreme 380g 200 650
## 2: Dorito Corn Chp Supreme 380g 200 650
#Filter out the customer based on loyalty card number
transdata <- transdata[LYLTY_CARD_NBR != 226000, ]
#Summary of data
summary(transdata)
## DATE STORE_NBR LYLTY_CARD_NBR TXN_ID
## Min. :2018-07-01 Min. : 1.0 Min. : 1000 Min. : 1
## 1st Qu.:2018-09-30 1st Qu.: 70.0 1st Qu.: 70015 1st Qu.: 67569
## Median :2018-12-30 Median :130.0 Median : 130367 Median : 135182
## Mean :2018-12-30 Mean :135.1 Mean : 135530 Mean : 135130
## 3rd Qu.:2019-03-31 3rd Qu.:203.0 3rd Qu.: 203083 3rd Qu.: 202652
## Max. :2019-06-30 Max. :272.0 Max. :2373711 Max. :2415841
## PROD_NBR PROD_NAME PROD_QTY TOT_SALES
## Min. : 1.00 Length:246740 Min. :1.000 Min. : 1.700
## 1st Qu.: 26.00 Class :character 1st Qu.:2.000 1st Qu.: 5.800
## Median : 53.00 Mode :character Median :2.000 Median : 7.400
## Mean : 56.35 Mean :1.906 Mean : 7.316
## 3rd Qu.: 87.00 3rd Qu.:2.000 3rd Qu.: 8.800
## Max. :114.00 Max. :5.000 Max. :29.500
#count the number of transaction by date
transdata[, .N, by= DATE]
## DATE N
## 1: 2018-10-17 682
## 2: 2019-05-14 705
## 3: 2019-05-20 707
## 4: 2018-09-05 685
## 5: 2018-09-27 632
## ---
## 360: 2018-08-19 670
## 361: 2018-11-15 689
## 362: 2019-04-15 651
## 363: 2019-01-22 689
## 364: 2019-05-03 657
#create a sequence of dates
eachdate <- data.table(seq(as.Date("2018/07/01"), as.Date("2019/06/30"), by = "day"))
setnames(eachdate, "DATE")
byday_trans <- merge(eachdate, transdata[, .N, by = DATE], all.x = TRUE)
#SETTING PLOT THEMES TO FORMAT GRAPHS
theme_set(theme_bw())
theme_update(plot.title = element_text(hjust = 0.5))
# over time transactions plot
ggplot(byday_trans, aes(x= DATE, y = N)) + geom_line() +
labs(x = "DAY", y = "Number of Transactions", title = "Transactions over Time") +
scale_x_date(breaks = "1 month") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
#narrow down number of days
ggplot(byday_trans[month(DATE) == 12, ], aes(x = DATE, y = N)) +
geom_line() +
labs(x = "Day", y = "Number of transactions", title = "Transactions over December") +
scale_x_date(breaks = "1 day") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
This graph shows that higher sales were in December because arrival of christmas. It is now confirmed that no more outliers are present in the transaction data.
#pack size
transdata[, PACK_SIZE := parse_number(PROD_NAME)]
transdata[, .N, PACK_SIZE][order(PACK_SIZE)]
## PACK_SIZE N
## 1: 70 1507
## 2: 90 3008
## 3: 110 22387
## 4: 125 1454
## 5: 134 25102
## 6: 135 3257
## 7: 150 40203
## 8: 160 2970
## 9: 165 15297
## 10: 170 19983
## 11: 175 66390
## 12: 180 1468
## 13: 190 2995
## 14: 200 4473
## 15: 210 6272
## 16: 220 1564
## 17: 250 3169
## 18: 270 6285
## 19: 330 12540
## 20: 380 6416
transdata
## DATE STORE_NBR LYLTY_CARD_NBR TXN_ID PROD_NBR
## 1: 2018-10-17 1 1000 1 5
## 2: 2019-05-14 1 1307 348 66
## 3: 2019-05-20 1 1343 383 61
## 4: 2018-09-05 1 1052 57 44
## 5: 2018-09-27 1 1081 92 17
## ---
## 246736: 2019-03-09 272 272319 270088 89
## 246737: 2018-08-13 272 272358 270154 74
## 246738: 2018-11-06 272 272379 270187 51
## 246739: 2018-12-27 272 272379 270188 42
## 246740: 2018-09-22 272 272380 270189 74
## PROD_NAME PROD_QTY TOT_SALES PACK_SIZE
## 1: Natural Chip Compny SeaSalt175g 2 6.0 175
## 2: CCs Nacho Cheese 175g 3 6.3 175
## 3: Smiths Crinkle Cut Chips Chicken 170g 2 2.9 170
## 4: Thins Chips Light& Tangy 175g 1 3.3 175
## 5: Kettle Sensations BBQ&Maple 150g 1 4.6 150
## ---
## 246736: Kettle Sweet Chilli And Sour Cream 175g 2 10.8 175
## 246737: Tostitos Splash Of Lime 175g 1 4.4 175
## 246738: Doritos Mexicana 170g 2 8.8 170
## 246739: Doritos Corn Chip Mexican Jalapeno 150g 2 7.8 150
## 246740: Tostitos Splash Of Lime 175g 2 8.8 175
#let's plot histogram
hist (transdata$PACK_SIZE)
#now we will create BRANDS & for this purpose we can use first word in PROD_NAME
transdata[, BRAND := toupper(substr(PROD_NAME, 1, regexpr(pattern = ' ', PROD_NAME) -1))]
transdata[, .N, by = BRAND][order(-N)]
## BRAND N
## 1: KETTLE 41288
## 2: SMITHS 27390
## 3: PRINGLES 25102
## 4: DORITOS 22041
## 5: THINS 14075
## 6: RRD 11894
## 7: INFUZIONS 11057
## 8: WW 10320
## 9: COBS 9693
## 10: TOSTITOS 9471
## 11: TWISTIES 9454
## 12: TYRRELLS 6442
## 13: GRAIN 6272
## 14: NATURAL 6050
## 15: CHEEZELS 4603
## 16: CCS 4551
## 17: RED 4427
## 18: DORITO 3183
## 19: INFZNS 3144
## 20: SMITH 2963
## 21: CHEETOS 2927
## 22: SNBTS 1576
## 23: BURGER 1564
## 24: WOOLWORTHS 1516
## 25: GRNWVES 1468
## 26: SUNBITES 1432
## 27: NCC 1419
## 28: FRENCH 1418
## BRAND N
#combine same brand names
transdata[BRAND == "RED", BRAND := "RRD"]
transdata[BRAND == "SNBTS", BRAND := "SUNBITES"]
transdata[BRAND == "INFZNS", BRAND := "INFUZIONS"]
transdata[BRAND == "WW", BRAND := "WOOLWORTHS"]
transdata[BRAND == "SMITH", BRAND := "SMITHS"]
transdata[BRAND == "NCC", BRAND := "NATURAL"]
transdata[BRAND == "DORITO", BRAND := "DORITOS"]
transdata[BRAND == "GRAIN", BRAND := "GRNWVES"]
#again check
transdata[, .N, by = BRAND][order(N)]
## BRAND N
## 1: FRENCH 1418
## 2: BURGER 1564
## 3: CHEETOS 2927
## 4: SUNBITES 3008
## 5: CCS 4551
## 6: CHEEZELS 4603
## 7: TYRRELLS 6442
## 8: NATURAL 7469
## 9: GRNWVES 7740
## 10: TWISTIES 9454
## 11: TOSTITOS 9471
## 12: COBS 9693
## 13: WOOLWORTHS 11836
## 14: THINS 14075
## 15: INFUZIONS 14201
## 16: RRD 16321
## 17: PRINGLES 25102
## 18: DORITOS 25224
## 19: SMITHS 30353
## 20: KETTLE 41288
Examining Customer Data
str(purchasebehavior)
## 'data.frame': 72637 obs. of 3 variables:
## $ LYLTY_CARD_NBR : int 1000 1002 1003 1004 1005 1007 1009 1010 1011 1012 ...
## $ LIFESTAGE : chr "YOUNG SINGLES/COUPLES" "YOUNG SINGLES/COUPLES" "YOUNG FAMILIES" "OLDER SINGLES/COUPLES" ...
## $ PREMIUM_CUSTOMER: chr "Premium" "Mainstream" "Budget" "Mainstream" ...
summary(purchasebehavior)
## LYLTY_CARD_NBR LIFESTAGE PREMIUM_CUSTOMER
## Min. : 1000 Length:72637 Length:72637
## 1st Qu.: 66202 Class :character Class :character
## Median : 134040 Mode :character Mode :character
## Mean : 136186
## 3rd Qu.: 203375
## Max. :2373711
#Examine Values of Lifestage & Premium Customer
setDT(purchasebehavior)
purchasebehavior[,.N, by= LIFESTAGE][order(-N)]
## LIFESTAGE N
## 1: RETIREES 14805
## 2: OLDER SINGLES/COUPLES 14609
## 3: YOUNG SINGLES/COUPLES 14441
## 4: OLDER FAMILIES 9780
## 5: YOUNG FAMILIES 9178
## 6: MIDAGE SINGLES/COUPLES 7275
## 7: NEW FAMILIES 2549
purchasebehavior[, .N, by= PREMIUM_CUSTOMER][order(-N)]
## PREMIUM_CUSTOMER N
## 1: Mainstream 29245
## 2: Budget 24470
## 3: Premium 18922
Now merge transdata and purchasebehaviour data
mergdata <- merge(transdata, purchasebehavior, all.x = TRUE)
#Let’s also check if some customers were not matched on by checking for nulls.
mergdata[is.null(LIFESTAGE), .N]
## [1] 0
mergdata[is.null(PREMIUM_CUSTOMER), .N]
## [1] 0
BRAVO! DATA EXPLORATION IS COMPLETED.
Data Analysis on Customer Segmentation
##Total sales by LIFESTAGE & PREMIUM_CUSTOMERS
setDT(mergdata)
sales <- mergdata[, .(SALES = sum(TOT_SALES)), .(LIFESTAGE, PREMIUM_CUSTOMER)]
#Create a plot
plt <- ggplot(data = sales) +
geom_mosaic(aes(weight = SALES, x= product(PREMIUM_CUSTOMER, LIFESTAGE), fill = PREMIUM_CUSTOMER)) +
labs(x= "Lifestage", y= "Premium Customer Flag", title = "Proportion of Sales") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
#Plot and Lable with Proportion of Sales
plt + geom_text(data = ggplot_build(plt)$data[[1]], aes(x = (xmin + xmax)/2 , y =
(ymin + ymax)/2, label = as.character(paste(round(.wt/sum(.wt),3)*100,
'%'))))
## Warning: `unite_()` was deprecated in tidyr 1.2.0.
## Please use `unite()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
Let’s see if the higher sales are due to there being more customers who buy chips.
### Number of customers by LIFESTAGE and PREMIUM_CUSTOMER
customers <- mergdata[, .(CUSTOMERS = uniqueN(LYLTY_CARD_NBR)), .(LIFESTAGE, PREMIUM_CUSTOMER)][order(‐CUSTOMERS)]
#create a plot
p <- ggplot(data = customers) +
geom_mosaic(aes(weight = CUSTOMERS, x = product(PREMIUM_CUSTOMER, LIFESTAGE), fill = PREMIUM_CUSTOMER)) +
labs(x = "Lifestage", y = "Premium customer flag", title = "Proportion of customers") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
#plot and label with proportion of customers
p + geom_text(data = ggplot_build(p)$data[[1]], aes(x = (xmin + xmax)/2 , y =
(ymin + ymax)/2, label = as.character(paste(round(.wt/sum(.wt),3)*100,
'%'))))
# Average number of units per customer by LIFESTAGE and PREMIUM_CUSTOMER
avg_units <- mergdata[, .(AVG = sum(PROD_QTY)/uniqueN(LYLTY_CARD_NBR)), .(LIFESTAGE, PREMIUM_CUSTOMER)][order(‐AVG)]
#### Create plot
ggplot(data = avg_units, aes(weight = AVG, x = LIFESTAGE, fill = PREMIUM_CUSTOMER)) +
geom_bar(position = position_dodge()) +
labs(x = "Lifestage", y = "Avg units per transaction", title = "Units per customer") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
Let’s also investigate the average price per unit chips bought for each customer segment as this is also a driver of total sales.
## Average price per unit by LIFESTAGE and PREMIUM_CUSTOMER
avg_price <- mergdata[, .(AVG = sum(TOT_SALES)/sum(PROD_QTY)), .(LIFESTAGE, PREMIUM_CUSTOMER)][order(‐AVG)]
# Create plot
ggplot(data = avg_price, aes(weight = AVG, x = LIFESTAGE, fill = PREMIUM_CUSTOMER)) +
geom_bar(position = position_dodge()) +
labs(x = "Lifestage", y = "Avg price per unit", title = "Price per unit") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
As the difference in average price per unit isn’t large, we can check if this difference is statistically different.
#### Perform an independent t‐test between mainstream vs premium and budget midage and
#### young singles and couples
pricePerUnit <- mergdata[, price := TOT_SALES/PROD_QTY]
t.test(mergdata[LIFESTAGE %in% c("YOUNG SINGLES/COUPLES", "MIDAGE SINGLES/COUPLES") & PREMIUM_CUSTOMER == "Mainstream", price]
, mergdata[LIFESTAGE %in% c("YOUNG SINGLES/COUPLES", "MIDAGE SINGLES/COUPLES") & PREMIUM_CUSTOMER != "Mainstream", price]
, alternative = "greater")
##
## Welch Two Sample t-test
##
## data: mergdata[LIFESTAGE %in% c("YOUNG SINGLES/COUPLES", "MIDAGE SINGLES/COUPLES") & PREMIUM_CUSTOMER == "Mainstream", price] and mergdata[LIFESTAGE %in% c("YOUNG SINGLES/COUPLES", "MIDAGE SINGLES/COUPLES") & PREMIUM_CUSTOMER != "Mainstream", price]
## t = 37.624, df = 54791, p-value < 2.2e-16
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 0.3187234 Inf
## sample estimates:
## mean of x mean of y
## 4.039786 3.706491
The t-test results in a p-value < 2.2e-16, i.e. the unit price for mainstream, young and mid-age singles and couples are significantly higher than that of budget or premium, young and midage singles and couples.
Deep dive into specific customer segments for insights
#### Deep dive into Mainstream, young singles/couples
segment1 <- mergdata[LIFESTAGE == "YOUNG SINGLES/COUPLES" & PREMIUM_CUSTOMER == "Mainstream",]
other <- mergdata[!(LIFESTAGE == "YOUNG SINGLES/COUPLES" & PREMIUM_CUSTOMER == "Mainstream"),]
#### Brand affinity compared to the rest of the population
quantity_segment1 <- segment1[, sum(PROD_QTY)]
quantity_other <- other[, sum(PROD_QTY)]
quantity_segment1_by_brand <- segment1[, .(targetSegment = sum(PROD_QTY)/quantity_segment1), by = BRAND]
quantity_other_by_brand <- other[, .(other = sum(PROD_QTY)/quantity_other), by = BRAND]
brand_proportions <- merge(quantity_segment1_by_brand, quantity_other_by_brand)[, affinityToBrand := targetSegment/other]
brand_proportions[order(‐affinityToBrand)]
## BRAND targetSegment other affinityToBrand
## 1: TYRRELLS 0.031552795 0.025692464 1.2280953
## 2: TWISTIES 0.046183575 0.037876520 1.2193194
## 3: DORITOS 0.122760524 0.101074684 1.2145526
## 4: KETTLE 0.197984817 0.165553442 1.1958967
## 5: TOSTITOS 0.045410628 0.037977861 1.1957131
## 6: PRINGLES 0.119420290 0.100634769 1.1866703
## 7: COBS 0.044637681 0.039048861 1.1431238
## 8: INFUZIONS 0.064679089 0.057064679 1.1334347
## 9: THINS 0.060372671 0.056986370 1.0594230
## 10: GRNWVES 0.032712215 0.031187957 1.0488733
## 11: CHEEZELS 0.017971014 0.018646902 0.9637534
## 12: SMITHS 0.096369910 0.124583692 0.7735355
## 13: FRENCH 0.003947550 0.005758060 0.6855694
## 14: CHEETOS 0.008033126 0.012066591 0.6657329
## 15: RRD 0.043809524 0.067493678 0.6490908
## 16: NATURAL 0.019599724 0.030853989 0.6352412
## 17: CCS 0.011180124 0.018895650 0.5916771
## 18: SUNBITES 0.006349206 0.012580210 0.5046980
## 19: WOOLWORTHS 0.024099379 0.049427188 0.4875733
## 20: BURGER 0.002926156 0.006596434 0.4435967
We can see that : • Mainstream young singles/couples are 23% more likely to purchase Tyrrells chips compared to the rest of the population • Mainstream young singles/couples are 56% less likely to purchase Burger Rings compared to the rest of the population
Let’s also find out if our target segment tends to buy larger packs of chips.
### Preferred pack size compared to the rest of the population
quantity_segment1_by_pack <- segment1[, .(targetSegment = sum(PROD_QTY)/quantity_segment1), by = PACK_SIZE]
quantity_other_by_pack <- other[, .(other = sum(PROD_QTY)/quantity_other), by = PACK_SIZE]
pack_proportions <- merge(quantity_segment1_by_pack, quantity_other_by_pack)[, affinityToPack := targetSegment/other]
pack_proportions[order(‐affinityToPack)]
## PACK_SIZE targetSegment other affinityToPack
## 1: 270 0.031828847 0.025095929 1.2682873
## 2: 380 0.032160110 0.025584213 1.2570295
## 3: 330 0.061283644 0.050161917 1.2217166
## 4: 134 0.119420290 0.100634769 1.1866703
## 5: 110 0.106280193 0.089791190 1.1836372
## 6: 210 0.029123533 0.025121265 1.1593180
## 7: 135 0.014768806 0.013075403 1.1295106
## 8: 250 0.014354727 0.012780590 1.1231662
## 9: 170 0.080772947 0.080985964 0.9973697
## 10: 150 0.157598344 0.163420656 0.9643722
## 11: 175 0.254989648 0.270006956 0.9443818
## 12: 165 0.055652174 0.062267662 0.8937572
## 13: 190 0.007481021 0.012442016 0.6012708
## 14: 180 0.003588682 0.006066692 0.5915385
## 15: 160 0.006404417 0.012372920 0.5176157
## 16: 90 0.006349206 0.012580210 0.5046980
## 17: 125 0.003008972 0.006036750 0.4984423
## 18: 200 0.008971705 0.018656115 0.4808989
## 19: 70 0.003036577 0.006322350 0.4802924
## 20: 220 0.002926156 0.006596434 0.4435967
It looks like Mainstream young singles/couples are 27% more likely to purchase a 270g pack of chips compared to the rest of the population but let’s dive into what brands sell this pack size.
mergdata[PACK_SIZE == 270, unique(PROD_NAME)]
## [1] "Twisties Cheese 270g" "Twisties Chicken270g"
Twisties are the only brand offering 270g packs and so this may instead be reflecting a higher likelihood of purchasing Twisties.
Conclusion Let’s recap what we’ve found! Sales have mainly been due to Budget - older families, Mainstream - young singles/couples, and Mainstream - retirees shoppers. We found that the high spend in chips for mainstream young singles/couples and retirees is due to there being more of them than other buyers. Mainstream, midage and young singles and couples are also more likely to pay more per packet of chips. This is indicative of impulse buying behaviour. We’ve also found that Mainstream young singles and couples are 23% more likely to purchase Tyrrells chips compared to the rest of the population. The Category Manager may want to increase the category’s performance by off-locating some Tyrrells and smaller packs of chips in discretionary space near segments where young singles and couples frequent more often to increase visibilty and impulse behaviour.
I can help the Category Manager with recommendations of where these segments are and further help them with measuring the impact of the changed placement. We’ll work on measuring the impact of trials in the next task and putting all these together in the third task.