Introduction

My sister, Lucy, is part of the Affiliate/Influencer program at Amazon. Essentially, she uses her social media platforms (Tik Tok and Instagram) to share affiliate links with her followers. Then, if they purchase items via an affiliate link, she gets a percent-based commission from Amazon.

Lucy is very good at connecting with her followers - she has grown her follower base to almost 50k since April 2020. However, the data provided by Amazon is difficult to draw conclusions from without running an in-depth analysis. Lucy has never actually looked at this data as part of her strategy. Instead, she goes by her gut - and it works!

While the most obvious way to make more commission is by continuing to grow the follower base on social media (assuming this leads to more sales via affiliate links), I am hoping to use the data provided by Amazon to come up with additional recommendations on growth opportunities within the existing follower base. Specifically, I see two main opportunities:

1. Maximizing commissions by exploiting Amazon’s commission payout structure:

This is the “increases commissions without increasing Amazon’s revenue” plan. Amazon does not use one set commission percentage. Instead, it is set dependent on the category of the item sold. My hypothesis is that Lucy could increase commissions earned by focusing on categories with higher % payout (and hypothetically sell fewer items from key categories). Additionally, Amazon pays out bonuses for any items purchased “directly” from an affiliate link, as opposed to “indirectly” (more to come there), so, hypothetically, by increasing the % of items sold directly, we could also increase commissions.

2. Maximizing commissions by understanding buying behaviors of followers:

Assuming Lucy does not gain any additional followers, another way to potentially increases commissions is by actually increasing revenue by capitalizing on buying behaviors. For example:

  • How are the followers shopping (ex. what device are they most likely to use)?
  • Are there trends with the order profiles of Lucy’s followers? Can we use these trends to better target her followers?
  • Are there days of the week where Lucy’s followers seem to do more shopping?

Ultimately, the goal is to provide some key insights and strategies on ways for Lucy to increases her commissions without relying on constant growth of her follower-base. We can do this by either exploiting Amazon’s commission structure OR by having a deep understanding of the buying behaviors of her followers so we can capialize on any patterns.

The Data

# reading in the data files

amazon_fee_orders <- read_excel("~/Documents/Eliza Geeslin/GitHub/Luluprime_Amazon.xlsx", sheet = "Fee-Orders", skip = 1)

# didn't actually use this dataset
#amazon_fee_earnings <- read_excel("~/Documents/Eliza Geeslin/GitHub/Luluprime_Amazon.xlsx", sheet = "Fee-Earnings", skip = 1)

amazon_daily_trends <- read_excel("~/Documents/Eliza Geeslin/GitHub/Luluprime_Amazon.xlsx", sheet = "Fee-DailyTrends", skip = 1)

This dataset is a download of all of Lucy’s Affiliate program data from Amazon from July - November 2021 (excluding black Friday and Cyber Monday, which may actually be a good thing because those two days are definitely outliers). I am focusing on two sheets within this dataset:

1. Orders
# quick sample of the table 
head(amazon_fee_orders)
## # A tibble: 6 x 10
##   Category Name  ASIN  Date    Qty `Price($)` `Link Type` Tag   `Indirect Sales`
##   <chr>    <chr> <chr> <chr> <dbl>      <dbl> <chr>       <chr> <chr>           
## 1 Kitchen… ТikT… B092… 2021…     1      20.0  Text Only,… lulu… ndi             
## 2 Health … ZzzQ… B007… 2021…     1       7.46 Text Only,… lulu… ndi             
## 3 Health … ZzzQ… B01M… 2021…     1      18.0  Text Only,… lulu… ndi             
## 4 Health … ZzzQ… B07N… 2021…     1      13.5  Social ads… lulu… ndi             
## 5 Office … ZZTX… B07D… 2021…     1       7.59 Text Only,… lulu… ndi             
## 6 Toys & … ZZCC… B096… 2021…     1      14.0  Text Only,… lulu… ndi             
## # … with 1 more variable: Device Type Group <chr>

This is a fairly straightforward table of all the orders that are connected to Lucy’s account. There are a few columns that I want to call out specifically:

  • Link Type & Tag - I had high hopes for these columns because I thought there was some rhyme or reason to what links were used for (for example, Instagram Stories vs. Instagram Post vs. Tik Tok), and I think this is how these fields are supposed to be used. However, after speaking to Lucy I learned that she uses all her links interchangeably all over the place, so these don’t mean anything for this dataset. One thing I will recommend is that Lucy begins to use link types strategically so we can get more insights.
  • Indirect Sales - This ties back to commission payouts; there is an additional bonus for direct sales (where the link leads directly to the item page) vs. indirect sales (where the link leads to a more general page such as search results).
  • Device Type Group - This column lists the device used to make the purchase. Since Lucy’s followers follow her on phone apps (Instagram, Tik Tok), most of the purchases are phone purchases, which makes sense.

Recommendation 1: Use Link Type and Tag Strategically - If Lucy used these links and tags based on social media post type we can tie sales back to specific platforms and post types (even without having access to social media data).

2. Conversions: Click to Sales
# quick sample of the table
head(amazon_daily_trends)
## # A tibble: 6 x 6
##   Date   Clicks `Items Ordered (A… `Items Ordered (… `Total Items Or… Conversion
##   <chr>   <dbl>              <dbl>             <dbl>            <dbl>      <dbl>
## 1 2021-…   8858                139              1172             1311       14.8
## 2 2021-…   5855                 58               548              606       10.4
## 3 2021-…   6536                 60               827              887       13.6
## 4 2021-…   5735                 60               591              651       11.4
## 5 2021-…   7166                 48               714              762       10.6
## 6 2021-…  12256                120              1367             1487       12.1

This table shows aggregate clicks on Lucy’s links and then the number of actual items purchased. Essentially, the conversion from clicks to purchases daily. The table includes conversion rate, but I will be aggregating this data slightly differently and re-calculating conversion rate as we go.

3. Amazon Commissions Table
# creating data frame with commission structure information on it. If a category isn't listed here, the commission is 4.00%.

Category <- c("Amazon Games", 
              "Luxury Beauty", "Luxury Stores Beauty", "Amazon Explore", 
              "Digital Music", "Physical Music", "Handmade", "Digital Videos", "CDs & Vinyl", "Video On Demand: Rent or Buy",
              "Physical Books", "Kitchen", "Automotive", "Books & Textbooks", "Tires & Wheels", "Major Appliances", "Kitchen & Dining", "Magazines", "AmazonBasics Microwave", "Electronic Components & Home Audio", 
              "Fire Tablets", "Fire Tablets Accessories", "Kindle E-readers", "Kindle E-readers Accessories", "Luxury Stores Fashion", "Amazon Cloud Cam", "Fire TV Devices", "Fire TV Accessories", "Echo & Alexa Accessories", "Echo Devices", "Ring Alarms and Smart Lighting", "Ring Video Doorbells and Cameras", "Watches", "Jewelry", "Luggage", "Clothing & Accessories", "Health & Household", "Shoes, Handbags, Wallets, Sunglasses", "Softlines Private Label", "Cell Phones & Accessories", "Arts, Crafts & Sewing", "Kindle Books", "Other", "Camera, Photo & Video", 
              "Furniture", "Home", "Home Entertainment", "Home Improvement", "Patio, Lawn & Garden", "Headphones", "Beauty & Grooming", "Musical Instruments", "Business & Industrial Supplies", "Outdoor Recreation", "Baby & Nursery", "Amazon Coins", "Toys & Games", "Sports & Fitness", "Pet Food & Supplies", "Power & Hand Tools", "Sports Collectibles", "Office & School Supplies",
              "Computers, Tablets & Components", "Blu-Ray & DVD", 
              "Televisions", "Video Game Downloads", "Element Smart TV",
              "Amazon Fresh Products", "Video Games", "Video Game Consoles", "Grocery", "Health & Personal Care", "Grocery & Gourmet Food", "Prime Pantry", "Health & Personal Care Appliances", "Health &amp; Personal Care Appliances",
             "Amazon Gift Cards", "Wine, Spirits & Beer", "Other Gift Card Brands", "Fresh Prepared","Digital Video Subscription")
Commission <- c(0.20, 
                0.10, 0.10, 0.10, 
                0.05, 0.05, 0.05, 0.05, 0.05, 0.05,
                0.045, 0.045, 0.045, 0.045, 0.045, 0.045, 0.045, 0.045, 0.045,
                0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04,
                0.03, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03, 0.03,  0.03,  0.03, 0.03,  
                0.025, 0.025,
                0.02, 0.02, 0.02, 
                0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01,
                0.00, 0.00, 0.00, 0.00, 0.00)

commission_by_category <- data.frame(Category, Commission)

head(commission_by_category)
##               Category Commission
## 1         Amazon Games       0.20
## 2        Luxury Beauty       0.10
## 3 Luxury Stores Beauty       0.10
## 4       Amazon Explore       0.10
## 5        Digital Music       0.05
## 6       Physical Music       0.05

Additionally, since I am interested in using the commission payout percent (so we can maximize commissions) I needed to get commission information. Full transparency, when I first started this project I found a table on Amazon.com that had the commission payouts listed.It does sound like Amazon frequently changes (lowers) the commission payout percents for different categories, so it is possible they are getting ready to update commissions for 2022.

Luckily, I had already created this quick dataframe that lists categories and the commission percent payout. We can see that commission payouts range from 20% (admittedly, I don’t know what Amazon Games are) to 0%.

Visualizations

1. Maximizing Commission by Exploiting Amazon’s Pay Structure

Again, the hypothesis here is that there is an opportunity to increase commissions simply by focusing on categories that provide a higher payout percent. Admittedly, I am making a rather large assumption that Lucy’s followers are willing and likely to buy items from any category if Lucy posts them. However, I do think there are probably some untapped categories with potential.

This table brings commission percent for each category into the orders table so we can also understand historical frequency of sales.

# Average cost in each category
average_price_commission_table <- merge(x = amazon_fee_orders, y = commission_by_category, by = "Category",
                                        all.x = TRUE) %>%
  select(Category, `Price($)`, Commission, Qty) %>%
  group_by(Category) %>%
  summarize(`Average Price` = mean(`Price($)`), `Commission` = max(Commission), `Items Sold` = sum(Qty)) %>%
  arrange(desc(`Average Price`), `Commission`)
  
datatable(average_price_commission_table, class = 'table-bordered table-condensed',
          width = '100%', options = list(scrollX = TRUE, pageLength = 10))

Choosing the Right New Categories

To maximize commissions, I’ll be looking at commission payout percent alongside average prices of items (historically in Lucy’s data) purchased from that category. From just a maximizing standpoint, we want to focus on categories that have high average prices of items AND a higher % commission payout.

average_price_commission_table %>%
  arrange(`Average Price`) %>%
  mutate(Category=factor(Category, levels=Category)) %>%
  ggplot(aes(`Average Price`)) +
  geom_histogram(binwidth = 5, fill = "#51d3d9" ) +
  scale_x_continuous(breaks = c(0, 20, 40, 60, 80, 100, 120), labels = scales::dollar) + 
  labs(x = "Average Price ($)", y = "Cateogry") +
  theme(axis.text.x = element_text(family = "Avenir", size=8),
        axis.text.y = element_text(family = "Avenir", size=6),
        axis.title.y = element_text(family = "Avenir", size=13),
        axis.title.x = element_text(family = "Avenir",size=13))

We can see the spread of average prices of Lucy’s existing categories above. We see that most of categories are on the lower end of the spread. This means that there could be an opportunity to move “upmarket” with the items that Lucy is promotion (i.e. she can promote more expensive items). It also means that Lucy’s followers could be unwilling to purchase more expensive items.

average_price_commission_table %>%
  ggplot(aes(Commission, `Average Price`)) +
  geom_point(color = "#9784c2" ) +
  scale_x_continuous(labels = scales::percent) +
  scale_y_continuous(labels = scales::dollar) + 
  labs(y = "Average Price ($)", x = "Commission (%)") +
  theme(axis.text.x = element_text(family = "Avenir", size=8),
        axis.text.y = element_text(family = "Avenir", size=8),
        axis.title.y = element_text(family = "Avenir", size=13),
        axis.title.x = element_text(family = "Avenir",size=13))

Above, we see that Lucy’s followers order from categories where the commission percent range from 0% to 10% and the average prices within the categories range from $0 (or very cheap) to almost $125. From a data perspective, the recommendation would be to focus on the categories with average prices over $50 and a commission percent 4% or over. I did notice that the highest average priced items did not have the highest commission percent (probably intentionally).

Below we can see the categories that fall into our recommendation based only on average price and commission (the upper right quadrant outlined).

average_price_commission_table %>%
  ggplot(aes(Commission, `Average Price`)) +
  geom_point(color = "#9784c2" ) +
  scale_x_continuous(labels = scales::percent) +
  scale_y_continuous(labels = scales::dollar) + 
  labs(y = "Average Price ($)", x = "Commission (%)") +
  theme(axis.text.x = element_text(family = "Avenir", size=8),
        axis.text.y = element_text(family = "Avenir", size=8),
        axis.title.y = element_text(family = "Avenir", size=13),
        axis.title.x = element_text(family = "Avenir",size=13)) +
  geom_hline(yintercept=49, color = "#33475b", linetype="dashed", size=0.5) +
  geom_vline(xintercept=0.039, color = "#33475b", linetype="dashed", size=0.5)

average_price_commission_table_recommendation <- average_price_commission_table%>%
  filter(`Average Price` >= 50 & Commission >= 0.04) %>%
  select(Category, `Average Price`, `Commission`) %>%
  arrange(desc(`Average Price`))

paged_table(average_price_commission_table_recommendation, options = list(scrollX = FALSE, pageLength = 10))

So, we have one potential recommendation. However, I already mentioned that we are making a big assumption that Lucy’s followers are interested in these categories. Even from the table we can see that these categories are a little bit random and are not totally aligned with Lucy’s brand (Woman’s Fashion, Accessories, and Lifestyle). We also see that the historical items sold for these categories is relatively low, which might be for two reasons:

  1. Lucy has not done any advertising of these categories, so any purchases have been random/indirect (not representative of future potential).
  2. Lucy has tried to sell these categories and there is not a lot of demand for these categories (representative of future potential also being low).

In order to solve for this, I am going to layer in Count of Items Sold as that historical demand to get a sense of low, medium, and high demand categories. The top category (Clothing & Accessories) outperforms the other categories by so much that I bucketed it separately so we can see where it falls on the scatterplot.

# categorize historical demand into a few buckets to show potential future opportunity
average_price_commission_table_demand <- average_price_commission_table %>%
   add_column(`Historical Demand` = if_else(.$`Items Sold` <= 40, "Low Demand", if_else(.$`Items Sold` > 40 & .$`Items Sold` <= 1000, "Medium Demand", if_else(.$`Items Sold` > 1000 & .$`Items Sold` <= 100000, "High Demand", "Top Category"))))
  

average_price_commission_table_demand$`Historical Demand` <- ordered(average_price_commission_table_demand$`Historical Demand`, levels=c("Low Demand", "Medium Demand", "High Demand", "Top Category"))

#average_price_commission_table_demand
average_price_commission_table_demand %>%
  ggplot(aes(Commission, `Average Price`)) +
  geom_point(aes(color = `Historical Demand`)) +
  scale_x_continuous(labels = scales::percent) +
  scale_y_continuous(labels = scales::dollar) + 
  labs(y = "Average Price ($)", x = "Commission (%)") +
  theme(legend.position="bottom",
        legend.text = element_text(family = "Avenir")) +
  theme(axis.text.x = element_text(family = "Avenir", size=8),
        axis.text.y = element_text(family = "Avenir", size=8),
        axis.title.y = element_text(family = "Avenir", size=13),
        axis.title.x = element_text(family = "Avenir",size=13)) +
  scale_color_manual(values=HubPalette)

Notably, we see that the majority of Lucy’s historical sales fall into a similar intersection of commission and average price (the green/purple dots). We also see that the categories with the highest average price tend to have the lowest historical demand. With the historical demand included, I see two potential strategy recommendations:

  1. We focus on categories that fall into a similar intersection as Lucy’s historically high-demand categories. This assumes that one of the factors that Lucy’s followers respond to is a lower average price point. We will exclude categories that are already in high demand and sort our recommendation from medium demand to low demand.
average_price_commission_table_demand %>%
  ggplot(aes(Commission, `Average Price`)) +
  geom_point(aes(color = `Historical Demand`)) +
  scale_x_continuous(labels = scales::percent) +
  scale_y_continuous(labels = scales::dollar) + 
  labs(y = "Average Price ($)", x = "Commission (%)",
       caption = "We are looking at the bottom right quadrant. These categories are listed in the table.") +
  theme(legend.position="bottom",
        legend.text = element_text(family = "Avenir")) +
  theme(axis.text.x = element_text(family = "Avenir", size=8),
        axis.text.y = element_text(family = "Avenir", size=8),
        axis.title.y = element_text(family = "Avenir", size=13),
        axis.title.x = element_text(family = "Avenir",size=13)) +
  scale_color_manual(values=HubPalette) +
  geom_hline(yintercept=51, color = "#33475b", linetype="dashed", size=0.5) +
  geom_vline(xintercept=0.028, color = "#33475b", linetype="dashed", size=0.5)

average_price_commission_table_recommendation2 <- average_price_commission_table_demand %>%
  filter(`Average Price` <= 50 & Commission >= 0.03 & `Historical Demand` != "High Demand" & `Historical Demand` != "Top Category", Category != "Other") %>%
  select(Category, `Average Price`, `Commission`, `Historical Demand`) %>%
  arrange(desc(`Historical Demand`), desc(`Average Price`))

paged_table(average_price_commission_table_recommendation2, options = list(scrollX = TRUE, pageLength = 5))
  1. We focus on categories with higher price points and commission percentages and also have medium historical demand. The medium demand may indicate that there is future potential for these categories if Lucy were to focus on promoting them more often.
average_price_commission_table_demand %>%
  ggplot(aes(Commission, `Average Price`)) +
  geom_point(aes(color = `Historical Demand`)) +
  scale_x_continuous(labels = scales::percent) +
  scale_y_continuous(labels = scales::dollar) + 
  labs(y = "Average Price ($)", x = "Commission (%)", caption = "We are looking at the top right quadrant, at the points that are Medium Demand (Green) These categories are listed in the table.") +
  theme(legend.position="bottom",
        legend.text = element_text(family = "Avenir")) +
  theme(axis.text.x = element_text(family = "Avenir", size=8),
        axis.text.y = element_text(family = "Avenir", size=8),
        axis.title.y = element_text(family = "Avenir", size=13),
        axis.title.x = element_text(family = "Avenir",size=13)) +
  scale_color_manual(values=HubPalette) +
  geom_hline(yintercept=24, color = "#33475b", linetype="dashed", size=0.5) +
  geom_vline(xintercept=0.028, color = "#33475b", linetype="dashed", size=0.5)

average_price_commission_table_recommendation3 <- average_price_commission_table_demand %>%
  filter(`Average Price` >= 25 & Commission >= 0.03 & `Historical Demand` == "Medium Demand", Category != "Other") %>%
  select(Category, `Average Price`, `Commission`, `Historical Demand`)

paged_table(average_price_commission_table_recommendation3, options = list(scrollX = TRUE))

Recommendation 2: New Categories - Based on all three potential models and what I know of Lucy’s business (which, again, is focused a lot on Woman’s Fashion, Accessories, and Lifestyle), if Lucy were interested in focusing on some additional categories, I would recommend (in order):

(1) Luxury Beauty, (2) Homemade, (3) Camera, Photo & Video, (4) Watches, (5) Outdoor Recreation

Focusing on Direct Sales

Another way to maximize commission is to increases the percentage of sales that come from direct links vs. indirect links. Essentially, Amazon rewards affiliates (Lucy) with a higher commission rate when one of her followers buys the product she recommends directly or a product in the same category. Conversely, Amazon pays a significantly lower commission rate if referred shopper buys something unrelated to what you were promoting.

I was unable to find the exact commission rate differences between indirect and directs sales (and I actually don’t know which rate I am looking at in the first analysis), but we are going to assume that the rate difference is high enough that it is worth it for Lucy to drive her followers to specific products where possible.

# grouping orders by month and then getting a count of direct vs. indirect sales per month
amazon_fee_orders_by_month <- amazon_fee_orders %>%
  mutate(Datetime = as_datetime(Date)) %>%
  group_by(Month = floor_date(Datetime, "month")) %>%
  count(Month, `Indirect Sales`) %>%
  mutate(Percent_col = n / sum(n)) %>%
  group_by(Month) %>%
  mutate(Percent = n/sum(n))

# changing the percent column into a percent (for visualization)
amazon_fee_orders_by_month$Percent <- with(amazon_fee_orders_by_month, paste(formatC(Percent*100, digits=2
              , format="fg"),"%",sep=""))

#amazon_fee_orders_by_month
# plotting a bar graph to show direct vs. indirect sales per month
amazon_fee_orders_by_month %>%  
  ggplot(aes(x = Month, y = n, fill = `Indirect Sales`)) +
  geom_col(position = "fill") +
  scale_y_continuous(labels = scales::percent) +
  geom_text(aes(label = Percent), family = "Avenir", colour = "white", position = position_fill(vjust = 0.5)) +
  labs(y = "Indirect vs. Direct Sales (%)", x = "Month", fill = "Direct or Indirect Sale") +
  theme(legend.position="bottom",
        legend.text = element_text(family = "Avenir")) +
  theme(axis.text.x = element_text(family = "Avenir", size=8),
        axis.text.y = element_text(family = "Avenir", size=8),
        axis.title.y = element_text(family = "Avenir", size=13),
        axis.title.x = element_text(family = "Avenir",size=13)) +
scale_fill_manual(values=HubPalette)

This graph shows that there is definitely opportunity for Lucy to use more direct links. Lucy does usually point followers to a individual items, but from some of the research I did, it seems like Amazon tracks direct vs. indirect specifically based on what webpage the user is sent to, so it is worth it to take another look there.

Recommendation 3: Double Check Tracking Links to Ensure “Direct” Credit - This might be easier recommended than done. From what I can tell, Amazon’s way of crediting “Direct” vs. “Indirect” is highly controversial (affiliates do not trust the data). However, if Lucy were able to increase the % of direct sales, she would see the positive impact in her commissions.

2. Maximizing Commissions by Understanding Consumer Behavior

Another, potentially less straightforward, way of maximizing commissions is by understanding and capitalizing on the behaviors of Lucy’s followers. There are tons of things we can look at when it comes to consumer behavior, but we also need to make sure that these findings are as actionable as possible.

Analyzing Orders

The first thing I did to understand Lucy’s followers was to group the items purchased into orders by the timestamp of the order. The assumption here is that the timestamps were granular enough to group into orders, and for the most part this seems to have been an okay assumption (the orders seem to be reasonable sizes).

# Group orders by timestamp and give unique ID assuming that datetime stamps are specific enough to order
group_unique_order <- amazon_fee_orders %>%
  mutate(Datetime = as_datetime(Date)) %>%
  group_by(Datetime) %>%
  mutate(`Order ID` = cur_group_id())

#group_unique_order
#Items per order - we see that most orders are just one item
group_unique_order %>%
  group_by(`Order ID`) %>%
  summarise(`Items per order` = sum(Qty)) %>%
  ggplot(aes(`Items per order`)) +
  geom_histogram(binwidth = 1, fill = "#f5c78e") +
  scale_y_sqrt() + 
  labs(y = "Occurances", x = "Items per order") +
  theme(axis.text.x = element_text(family = "Avenir", size=8),
        axis.text.y = element_text(family = "Avenir", size=8),
        axis.title.y = element_text(family = "Avenir", size=13),
        axis.title.x = element_text(family = "Avenir",size=13))

# Analyze orders (are they mostly in the same category)? - tie back to commission structure
# Items per order (more than 1 item)
multiple_items_per_order <- group_unique_order %>%
  group_by(`Order ID`) %>%
  summarise(`Items per order` = sum(Qty)) %>%
  filter(`Items per order` > 1)

multiple_items_per_order <- merge(x = multiple_items_per_order, y = group_unique_order, by = "Order ID")

multiple_items_per_order %>%
  group_by(`Order ID`) %>%
  summarise(`Categories Per Order` = n()) %>%
  ggplot(aes(`Categories Per Order`)) +
  geom_histogram(binwidth = 1, fill = "#a2d28f") +
  scale_y_sqrt() +
  labs(y = "Occurances", x = "Categories per order") +
  theme(axis.text.x = element_text(family = "Avenir", size=8),
        axis.text.y = element_text(family = "Avenir", size=8),
        axis.title.y = element_text(family = "Avenir", size=13),
        axis.title.x = element_text(family = "Avenir",size=13))

These two graphs show slightly different things, but I think both are valuable as we think about Lucy’s strategy.

The first graph (in orange) shows that the majority of Lucy’s followers buy one item at a time. They see it, they like it, they buy it. This is encouraging when combined with Recommendation 3, because it means that if we can turn all those one-item purchases into direct sales Lucy will see a major increase in commissions. It is basically an “plus one” on Recommendation 3.

The next graph filters out those one-item purchasers and looks at the category breakdown of orders where more than one item is purchased. We can see that the most frequent outcome for followers who purchase more than one item is that the items are most likely from different categories. Unfortunately, this behavior does not maximize Lucy’s sales (we know from our analysis of Amazon’s pay structure that they reward sales from the same category). If I had to guess, Amazon may subtly encourage customers to jump around in categories.

Recommendation 4: Stick to a category! - There may be ways for Lucy to encourage her followers to buy more than one item from a single category:

1. Promoting items from the same category together - Lucy often tries out the items (clothing) that she is selling, so she could try on a few different items all from the same category and then promote them together. For example, she tries on jeans and a t-shirt and then they are promoted together.

2. Promoting items from one category at a time - Lucy could focus on one category a day (or an hour), so her followers don’t have the option (at least from social media) to purchase items from multiple categories. For example, one day she is only promoting Luxury Beauty, and another day she promotes Watches.

How are Followers Shopping?

Next, I wanted to look at how Lucy’s follower’s were shopping. Specifically, what devices they were using. I knew, from glancing at the dataset, that most purchases were made from phones.

# Phone vs. laptop by month
# grouping orders by month and then getting a count of direct vs. indirect sales per month
orders_by_modal_by_month <- amazon_fee_orders %>%
  mutate(Datetime = as_datetime(Date)) %>%
  group_by(Month = floor_date(Datetime, "month")) %>%
  count(Month, `Device Type Group`) %>%
  mutate(Percent_col = n / sum(n)) %>%
  group_by(Month) %>%
  mutate(Percent = n/sum(n))

# changing the percent column into a percent (for visualization)
orders_by_modal_by_month$Percent <- with(orders_by_modal_by_month, paste(formatC(Percent*100, digits=3
              , format="fg"),"%",sep=""))

#orders_by_modal_by_month
# plotting a bar graph to show device group type
orders_by_modal_by_month %>%  
  ggplot(aes(x = Month, y = n, fill = `Device Type Group`)) +
  geom_col(position = "fill") +
  scale_y_continuous(labels = scales::percent) +
  geom_text(aes(label = Percent), family = "Avenir", size=3, colour = "white", position = position_fill(vjust = 0.5)) +
  labs(y = "% Device Used", x = "Month", fill = "Device Type") +
  theme(legend.position="bottom", legend.text = element_text(family = "Avenir")) +
  theme(axis.text.x = element_text(family = "Avenir", size=8),
        axis.text.y = element_text(family = "Avenir", size=8),
        axis.title.y = element_text(family = "Avenir", size=13),
        axis.title.x = element_text(family = "Avenir", size=13)) +
  scale_fill_manual(values= c("#81c1fd", "#ea90b1", "#a2d28f"))

The vast majority of purchases are made on phones. This might also explain why most orders are only for one item: people aren’t doing extensive shopping on their phones, they are seeing an item they want, and buying it immediately. It’s the immediate impulse purchases.

Lucy should probably continue to lean into this buying style (which obviously does work), and she could attempt to provide more opportunities for her followers who might already shop via the computer to access her promotions via computers.

device_type_average_price <- amazon_fee_orders %>%
  group_by(`Device Type Group`) %>%
  summarize(`Average Item Price` = mean(`Price($)`))

device_type_average_price %>%
  ggplot(aes(x = `Device Type Group`, y =`Average Item Price`)) +
  geom_col(fill = "#51d3d9" ) +
  scale_y_continuous(labels = scales::dollar) + 
  labs(y = "Average Item Price ($)", x = "Device of Purchase") +
  theme(axis.text.x = element_text(family = "Avenir", size=8),
        axis.text.y = element_text(family = "Avenir", size=6),
        axis.title.y = element_text(family = "Avenir", size=13),
        axis.title.x = element_text(family = "Avenir",size=13))

#Items per order - not using in the final analysis in favor of items per order per device type.
#group_unique_order %>%
#  filter(`Device Type Group` == "DESKTOP") %>%
#  group_by(`Order ID`) %>%
#  summarise(`Items per order` = sum(Qty)) %>%
#  ggplot(aes(`Items per order`)) +
#  geom_histogram(binwidth = 1, fill = "#fea58e") +
#  scale_y_sqrt() + 
#  labs(y = "Occurances", x = "Items per order") +
#  theme(axis.text.x = element_text(family = "Avenir", size=8),
#        axis.text.y = element_text(family = "Avenir", size=8),
#        axis.title.y = element_text(family = "Avenir", size=13),
#        axis.title.x = element_text(family = "Avenir",size=13))

group_unique_order %>%
  group_by(`Order ID`) %>%
  mutate(`Items per order` = sum(Qty)) %>%
  group_by(`Device Type Group`) %>%
  count(`Order ID`, `Items per order`) %>%
  summarize(`Average items per order` = mean(`Items per order`)) %>%
  ggplot(aes(x = `Device Type Group`, y =`Average items per order`)) +
  geom_col(fill = "#fea58e" ) +
  scale_y_continuous(labels = scales::comma) + 
  labs(y = "Average Item Per Order", x = "Device of Purchase") +
  theme(axis.text.x = element_text(family = "Avenir", size=8),
        axis.text.y = element_text(family = "Avenir", size=6),
        axis.title.y = element_text(family = "Avenir", size=13),
        axis.title.x = element_text(family = "Avenir",size=13))

Looking just at followers who shop via desktop, we see that desktop purchases, we do see that items purchased are higher in price, on average. We also see that these followers who are shopping on the computer purchase slightly more items on average.That being said, we know that the vast majority of the sales that are credited to Lucy are from followers shopping on their phones. However, if Lucy can get her followers shopping from their computers, she could get her followers to buy expensive items and she might see a bump in commissions. Tge risk here is that today she does make it very easy for her followers to make impulsive (cheap) purchases, and I don’t want her to loses that focus.

Recommendation 5: Continue to target followers making low-price, one-item phone purchases - Lucy is already successful at providing her followers with low-priced deals that are easy for them to buy quickly and impulsively. She should continue to use social media (that people access via their phones) for items that are cheap and easy for her followers to buy.

Recommendation 6: Test a newsletter to target followers who already shop on the computer - I have already recommended that Lucy may want to move slightly more upmarket into some more expensive categories. However, the risk there is that her followers shopping on the phone may not want to make those expensive purchases on their phones. Based on the analysis of different device types, one way she could market the higher-priced items is by creating an email newsletter. Her followers who are already shopping on the computer are more willing to buy slightly more expensive items on average and her typical-phone shoppers may also be interested in the higher priced items.

When are Followers Buying?

Finally, we need to understand when Lucy’s followers shop. Specifically, are there some days of the week where Lucy’s followers are doing more shopping?

# Variation over time - is there a bests day to shop?
amazon_daily_trends_by_weekday <- amazon_daily_trends %>%
   mutate(Date = as_date(Date))

amazon_daily_trends_by_weekday$weekday <- weekdays(amazon_daily_trends_by_weekday$Date)

amazon_daily_trends_by_weekday$weekday <- ordered(amazon_daily_trends_by_weekday$weekday, levels=c("Monday", "Tuesday", "Wednesday", "Thursday","Friday", "Saturday", "Sunday"))

amazon_daily_trends_by_weekday_count <- amazon_daily_trends_by_weekday %>%
  group_by(weekday) %>%
  summarize(`Total Item Clicks` = sum(Clicks), `Total Items Ordered` = sum(`Total Items Ordered`) ) %>%
  mutate(`Weekday Conversion` = `Total Items Ordered`/`Total Item Clicks`) %>%
  mutate(Percent = `Total Items Ordered`/`Total Item Clicks`)

# changing the percent column into a percent (for visualization)
amazon_daily_trends_by_weekday_count$Percent <- with(amazon_daily_trends_by_weekday_count, paste(formatC(Percent*100,
                                                                                                         digits=4, format="fg"),"%",sep=""))

# redundant to show table - hidden in favor of graphs
#datatable(amazon_daily_trends_by_weekday_count, class = 'table-bordered table-condensed',
#          width = '75%', options = list(scrollX = TRUE, pageLength = 10))
amazon_daily_trends_by_weekday_count_longer <- amazon_daily_trends_by_weekday_count %>%
  select(weekday, `Total Item Clicks`, `Total Items Ordered`) %>%
  pivot_longer(c(`Total Item Clicks`, `Total Items Ordered`), names_to = "Count of", values_to = "Count")

amazon_daily_trends_by_weekday_count_longer %>%
  ggplot(aes(x = weekday, y = `Count`, fill = `Count of`)) +
  geom_col(position = "dodge") +
  scale_y_continuous(label = scales::comma) +
  scale_fill_manual(values= c("#f5c78e", "#9784c2")) +
  labs(y = "Count", x = "Day of the Week") +
  theme(legend.position="bottom",
        legend.title = element_blank(),
        legend.text = element_text(family = "Avenir")) +
  theme(axis.text.x = element_text(family = "Avenir", size=8),
        axis.text.y = element_text(family = "Avenir", size=8),
        axis.title.y = element_text(family = "Avenir", size=13),
        axis.title.x = element_text(family = "Avenir",size=13))

amazon_daily_trends_by_weekday_count %>%
  ggplot(aes(x = weekday, y = `Weekday Conversion`)) +
  geom_line(aes(group = 1), color = "#ea90b1", size=1.1) +
  geom_text(aes(label = Percent), nudge_x = -0.29, nudge_y = 0.00062, family = "Avenir", size=3, color = "#ea90b1", fontface = "bold") +
  scale_y_continuous(label = scales::percent, limits = c(0.11, 0.13)) +
  labs(y = "Conversion or Order Rate (%)", x = "Day of the Week") +
  theme(axis.text.x = element_text(family = "Avenir", size=8),
        axis.text.y = element_text(family = "Avenir", size=8),
        axis.title.y = element_text(family = "Avenir", size=13),
        axis.title.x = element_text(family = "Avenir",size=13))

We can see that things are generally consistent from day to day. There is definitely some variation in click and purchase volume based on day of the week. This could be due to buyer behavior, but it could also be due to Lucy’s patterns of work (does she post more on some days of the week than others). Generally, Lucy is fairly consistent with her daily promotions, so I think this variation might have more to due with buyer behavior.

Further supporting the theory that the variability comes from buyer behavior is the graph of conversion rate from clicks to purchases graph (on the right). We see that regardless of the clicks, there is a very similar conversion rate to purchase. So, it doesn’t seem like Lucy’s behaviors affect the outcome (a purchase) as much here. We see that Thursdays seem to be the day that Lucy’s followers are most likely to actually make a purchase.

Recommendation 7: Get Followers to Commit - I was surprised to see that only ~12% of clicks on Lucy’s posts lead to actual purchases. If Lucy could encourage her followers to commit to more purchases, she could see a substantial increase in her commission. Maybe she could talk about how easy it is to make returns (if people don’t like what they buy), or somehow create a sense of urgency with her links? This might be the least actionable recommendation, but there is a lot of potential here.

Recommendation 8: Take Advantage of Thursday - The conversion rate is relatively consistent based on day of the week. However, Thursday is the day with the highest conversion to purchase rate. There are a few ways Lucy might be able to capitalize on this higher conversion day:

1. Lucy could focus on Thursdays to post her best deals and promotions since her followers do seem more willing to make a purchase.

2. Since it seems like Lucy’s followers are more willing to buy on Thursdays, she could also just push a very high volume of promotions out on Thursdays (not considering quality - really focusing on quantity). If the conversion rate holds, she will make more sales just by posting more promotions.

3. Thursdays may also be the best day to to promote those new categories or experiment with any of the other recommendations.

Recommendation 9: Use the conversion rate metric to track success of these recommendations - All these recommendations are just hypotheses with some data to back them. These recommendation should be tested as experiments, especially since it is possible that some of them will actually negatively impact Lucy’s sales. My last recommendation is that the key metric we can look at to measure the success of any of these recommendations is this conversion rate metric. We do want Lucy’s follower’s to be shopping, but in the end, in order for her to get paid, they need to be buying. Anything we can do to increase this conversion rate will be a win!

Reflection

Overall, I think I was able to develop some actual recommendations for Lucy based on this dataset, which was the goal. The biggest piece that was missing from this analysis was being able to tie it back to specific Social Media activities. Part of this will be solved if Lucy starts to use her tracking links and tag types more strategically, and I look forward to being able to use that data for future analysis. I do wish I could have gotten access to data directly from the social media sites, but that actually proved to be very complicated (one thing that can be said about Amazon, is that the raw datasets are readily available to their affiliates). I know there must be a way to get a hold of Instagram or Tik Tok data (maybe using a third-party Apps to extract the data), so that is a big opportunity for furture analysis.

Since I didn’t have Lucy’s social media data, I did have to make some assumptions around her activities. Luckily I am a follower, so I am familiar with the frequency and content of her posting and promotions, but I was unable to incorporate that into my data. This is probably the biggest risk with this analysis: the data story is incomplete and it could mean that the analysis is not as thourough as it could be.

Going into this analysis, the best/easiest part of the Amazon dataset is that it was already tidy when I started working with it. There was still a lot of manipulation that I wanted to do (especially as part of the graphing that I wanted to do), but I did not have to do a bunch of cleanup at the beginning. I was surprised by how time-consuming it was for me to make some graphs (that are generally seem very easy to do). For example, I spent WAY too long trying to layer a line graph on top of a bar graph using two different y-axes - and, it turns out that ggplot really does easily allow for multiple y-axes. So, instead I placed two graphs next to each other, which also a great solution. However, I was surprised by this limitation in ggplot.

Speaking of graphing, I also wanted my graphs to be really easy to look at and understand. As noted in Storytelling with Data: A Data Visualization Guide for Business, people are more likely to look at graphs for longer and really understand them if they are nice to look at. This didn’t take a HUGE amount of time, but it was a bit of a lift that I potentially didn’t anticipate. The part that took the longest was figuring out what was possible, not necessarily how to do it. For example, I didn’t know if it was possible to set the font of a graph, but once I knew it was possible, it wasn’t hard to do.

Conclusion

The goal with this analysis was to provide Lucy with actionable recommendations that she could implement to potentially increase her commissions earned on sales made through her affiliate links. In summary, here are the recommendations:

  1. Use Link Type and Tag Strategically for future analysis opportunities
  2. 5 New categories to focus on might be Luxury Beauty, Homemade, Camera, Photo & Video, Watches, and Outdoor Recreation
  3. Double Check Tracking Links to Ensure “Direct” Credit
  4. Promote one category at a time to maximize commissions
  5. Continue to use the quick promotion model for followers who shop on their phones
  6. Consider using a newsletter for followers who already shop on the computer to push higher price items and bundles
  7. Get followers to commit to making a purchase
  8. Followers seem more willing to actually make a purchase on Thursdays
  9. Experiment with these recommendations and use conversion-to-purchase rate to measure effectiveness

Some of these are more actionable than others, so I am glad that I was able to come up with a few recommendations. Some of them are also very dependent on how Amazon wants to continue to incentivize Affiliates via the commission rates and other bonuses - there is risk that they will change their commission structures and these recommendations will no longer be applicable.

Another concerns with some of these recommendations is that I think one thing that makes Lucy really successful today is that she makes it really easy for her followers to make spontaneous and impulsive purchases because she focuses on items that are low-cost. If she takes some of these recommendations she will be moving into higher priced products or fundamentally changing the way her followers are shopping (on a computer vs. on a phone), and there is some risk that her number of sales will decrease. If that happens, her commissions could be negatively impacted, even if she is optimizing them, because she will be driving fewer sales.

That being said, experimenting with some of these recommendations would be a good way to mitigate risk while still reaping the benefits of a more strategic promotion process. Being able to use conversion-to-purchase rate to measure the effectiveness of any changes she makes based on these recommendations will help us measure the impact of these recommendations. I look forward to running this analysis again in the future to see how trends have changed.

Bibliography

Knaflic, C. N. (2015). Storytelling with data: A Data Visualization Guide for Business Professionals. John Wiley & Sons, Inc. 

Lakes, J. (2020, August 14). Amazon.com affiliate commission rates: A historical guide. Geniuslink Blog. Retrieved December 17, 2021, from https://geniuslink.com/blog/amazon-com-affiliate-commission-rates-a-historical-guide/#4.21.2020

Prowse, P. (2016). Associates Program Standard Commission Income Statement. Amazon. Retrieved December 17, 2021, from https://affiliate-program.amazon.com/help/node/topic/GRXPHT8U84RAYDXZ