The business problem that we solved was finding out what income groups use the most coupons on private label products and how we can create new customers through our coupons.
We dissected the demographics and transactions of our listed households to filter them into the Private Label items and then separated them into their income groups. We used this to take a look into what income group purchases the most Private Label Products. This gave us more specific information about the customer group, so we viewed the overall coupon usage by income group and from there we were able to calculate the average to see where the coupon usage is the highest.
In this report, you can see the statistics of the income groups and their coupon usage of Private Label Products. This allows you to view the data in a graph, so the team can get an idea of where their coupon usage is the most heavy. This could give insights into where they can direct more coupons, or where they can reduce the coupon availability.
Complete Journey: This package gives us all the data about Regork for our analysis
DT: This package allows us to display tables, once we knit the R file into an HTML document
GGPlot2: This package allows us to create graphs and charts for our data. This is the package we used to create our visualizations.
#load packages
suppressWarnings(suppressMessages(library(completejourney)))
suppressWarnings(suppressMessages(library(ggplot2)))
suppressWarnings(suppressMessages(library(tidyverse)))
suppressWarnings(suppressMessages(library(dplyr)))
suppressWarnings(suppressMessages(library(DT)))
#Load Coupons, Coupon_redemption, Transaction, Promotions, Demographics, and Products
coupons <- coupons
coupon_redemptions <- coupon_redemptions
transaction <- get_transactions()
Promotions <- get_promotions()
demographics <- demographics
Products <- products
demo_trans <- inner_join(demographics, transaction)
demo_trans_prod <- left_join(demo_trans, products)
Coup_reds <- inner_join(coupon_redemptions, coupons)
Final_table <- left_join(demo_trans_prod, Coup_reds)
#Filtered to just get the private brand transactions
Private_Brand <- Final_table %>%
filter(brand == 'Private')
#Removed N/A's
Red_dem <- Final_table %>%
drop_na(redemption_date)
#Filtered for only Private brand
Last_Table <- Red_dem %>%
filter(brand == 'Private')
#Made a table that showed the average coupon usage per income range
Avg_Coup <- Last_Table %>%
select(income, household_id, quantity)%>%
mutate(N = 1) %>%
group_by(income) %>%
summarise(sum(N)/length(unique(household_id)))%>%
rename("Avg_quantity" ='sum(N)/length(unique(household_id))' )
datatable(head(Final_table, 100), options = list(pageLength = 10))
# Number of customers by by bar graph
ggplot(demographics, aes(x = factor(income))) +
geom_bar() +
ggtitle("Number of customers by income", ) +
xlab('income') +
theme(axis.text.x=element_text(angle=45, hjust=1))+
ylab('# of customers')
This graph shows us the number of customers for each income range which is an important visualization to understand because we will be displaying more data based on income ranges and would like to show the distribution in customers. The graph is showing what a normal distribution across the number of customers per income range would be. The average income range is around 50-74k, so that range having the most customers and the falling off more as you go further from that number makes sense.
# Private label products bought by income by bar graph
ggplot(Private_Brand, aes(x = factor(income))) +
geom_bar(y=(sum(Private_Brand$quantity))) +
ggtitle("Number of Private label Products Bought by Income", ) +
xlab('income') +
theme(axis.text.x=element_text(angle=45, hjust=1))+
ylab('# items Bought')
The Graph shows the quantity of private label products bought across the income ranges. Based on the last graph above (“Number of Customers by Income Range”) this graph follows the same distribution. The income range 50-74k with the highest amount of Private label products bought at just under 60,000 units followed by 35-49k at just under 50,000 units and 75-99k at around 28,000 units. Also, an important aspect of this graph to look at is the least amount of private label products being bought which would be income ranges: 175-199k, 200k-249k, 250k+.
#Number of Coupons used on Private label by bar graph
ggplot(Red_dem, aes(x = factor(income))) +
geom_bar(y=(sum(Last_Table$quantity))) +
ggtitle("Number of Coupons used on Private Label products", ) +
xlab('income') +
theme(axis.text.x=element_text(angle=45, hjust=1))+
ylab('# items Bought')
In this graph, it shows the number of private label products bought using a coupon. From the previous graph where we showed the total number of private label products bought to this graph, we see some difference worth looking in to. The income range 35-49k bought just around 10,000 units using a coupon out of the 50,000 units bought in total. Another interesting change is the income range 150-174k bought close to 6,000 units using coupons out of the nearly 8,000 units in total.
#Table showing average Coupons used
datatable(head(Avg_Coup, 100), options = list(pageLength = 10))
#Average coupons used by Income Range by pie chart
ggplot(Avg_Coup, aes(x = "",Avg_quantity , fill = income))+
geom_bar(stat="identity", width=1, show.legend = FALSE) +
geom_text(aes(label = paste0(income)), position = position_stack(vjust=0.5), size = 3) +
coord_polar("y", start=0)+
theme_void() +
ggtitle("Average coupons used per household by income range")
In the pie chart above we showed the average coupons used per household by income. When looking at this Pie chart you get a really accurate representation of what income range groups are using coupons the most and who aren’t using the coupons to their advantage. Surprising the group that uses the most coupons per household is actually the 250+ group. This income range is probably very good with money and want to save it at every chance they get. The group that uses the least is the 200-249k interestingly enough. This income group must not be getting enough marketing towards them. The other group that is low is the 125-149k group. The two group around it are so high it is surprising that this group is not up with he 150k-174k and the 100-124k.
Overall, our analysis demonstrates that the highest usage rate of coupon redemptions were from the income group of 250K+, while the majority of the coupons were used by the income group of 50-74K. This can be seen in our Average Coupons Used per Household by Income Range and Number of Coupons used on Private Label Products graphs. We would point our that the best way to use our findings would be to push our coupons towards members of the 250K income group, as well as the 100-124K and 15-24K income groups, since their usage is low, but their rate is high, so we can see that more coupons would increase their usage. This is also the income range that purchases the least amount of Private Label Products, so this would allow us to gain many new customers.