Introduction: Homework 6

For this assignment, you must select at least two variables of interest from your dataset, provide a basic description of the variables, clean and recode as needed, and present descriptive statistics and any visualizations. Your R Markdown document should include:

  1. Descriptions of the variables - how they were collected, any missing values, etc
  2. How you cleaned and coded the data, including a before/after comparison as needed
  3. summary descriptives of the recoded variables
  4. appropriate visualization(s)
  5. Description of the relationship between the variables, including a hypothesis (or hypotheses) about the relationship.
  6. Initial demonstration of the relationship, which could include correlation, visualization, or statistical model.
  7. (Optional Advanced) Try creating a function that will allow you to easily and accurately implement a repetitive recoding task. (If you are cutting and pasting, use a function!)
  8. (Optional Advanced) If you are working on a model, play around with some of the visual diagnostic tools described in RDS.

This is the first project where I am beginning to work with the data I am going to uses for my final project.

The Data & Variables

I am using a few different data sets for my final project. They are all related to my sister’s job, My sister, Lucy, is an Influencer working with Amazon. She primarily uses social media and tracking links to share deals with her followers. She gets a commission for items sold via her links (as well as some other monetary incentives).

Lucy gets a ton of data from both Amazon and Instagram, but she doesn’t do anything with it, for this homework, I have one main question to answer:

  1. How much of her revenue is ad-driven? What should she be spending on ads?

So, I am looking at the relationsip between revenue and ad-spend. (For my final project I have more questions that I’d like to answer for her).

The data for this analysis comes from her “Amazon Influcer” account. It is multiple tabs, so I may end up using a few different pieces (tabs) of the data:

  • ‘Fee-BonusEarnings’ - This tab shows revenue (for Amazon) and commission (what my sister earns) by Amazon category. I think this will be helpful in giving my sister an idea of what is most lucritive for her to focus on. It also has information on Tracking ID but there seems to only be three options. I will have to ask my sister if she has context on the difference here.
  • ‘Fee-DailyTrends’ - This looks at activites daily: Count of clicks (on links), Count of items orders from amazon, Count of items ordered from 3rd party sellers on Amazon, Conversion rate from clicks to items ordered. (Might be the best to combine with social media data?)
  • ‘Fee-Tracking’ - This is primarily a summary tab of clicks, items ordered, items shipped, revenue, ad fees. I think I can get summarize the data I have to get this same information, but this will be good to reference (disregard for now).
  • ‘Bounty’ - I think this looks specifically at reoccurring sales (subscriptions or wardrobe box orders) as well as some ad fee information. I am not sure if I will work with this table right now (disregard for now).
  • ‘Fee-OrdersWithClicks’ - Maybe another summary table - not a lot of information here (disregard for now).
  • ‘Fee-Orders’ - This seems to be one of the more complete table, with orders by date. Columns include Amazon Category, Item Name, Item ID, Date ordered, Quantity, Price, Link Type, Tags, Indirect/direct sales1, Device used (Desktop, Phone, Tablet).
  • ‘Fee-Earnings’ - This might be the most complete table, it has orders by date and also what that translates to in earnings. It has all the same columns as the Fee-Orders table, but also Tracking ID, Date Shipped, Returns, Revenue, Ad Fees. (Can I combine Orders and Earnings for a more complete table?)

For the final project I will also be looking at her social media data (Instagram and maybe Tik Tok). One thing I am already thinking about is that there won’t be any unique IDs that I can use to clearly tie the Amazon dataset to the Instagram dataset. However, we may be able to use dates to associate social media actions with Amazon sales.

Reading in the Dataset

As I noted in the summary of tables, I want to try to combine the Orders and Earnings tables together. At first I thought I could use the ASIN unique identifier, but then I realized that is actually a unique number for the product, not the sale. So, I will work with them separately.

amazon_fee_orders <- read_excel("Documents/GitHub/Luluprime_Amazon.xlsx", sheet = "Fee-Orders", skip = 1)

paged_table(amazon_fee_orders)
amazon_fee_earnings <- read_excel("Documents/GitHub/Luluprime_Amazon.xlsx", sheet = "Fee-Earnings", skip = 1)

paged_table(amazon_fee_earnings)

The good thing about this dataset is that it doesn’t really need to be tidied up (for now). But that means we can dive right in.

Since my question for this homework is specficially around revenue vs. ad cost, I will likely only use the Fee-Earnings table.

Visualizations

I know there is a lot of summary analysis that could be done here, but I am saving a lot of it for my final project (so I will keep it light). Since we are looking specifically at revenue and ad-spend, let’s do some summary tables first:

# table of all the different categories and how many times they occur in the earnings table. 

table(select(amazon_fee_earnings, Category))
## 
##                      Amazon Cloud Cam                 Amazon Fresh Products 
##                                     8                                  2315 
##                     Amazon Gift Cards                AmazonBasics Microwave 
##                                   120                                     2 
##                 Arts, Crafts & Sewing                    Audible Audiobooks 
##                                    30                                     3 
##                            Automotive                        Baby & Nursery 
##                                   591                                  1211 
##                     Beauty & Grooming                         Blu-Ray & DVD 
##                                 14356                                    56 
##                     Books & Textbooks        Business & Industrial Supplies 
##                                  1708                                  1057 
##                 Camera, Photo & Video                           CDs & Vinyl 
##                                   589                                    24 
##             Cell Phones & Accessories                Clothing & Accessories 
##                                  2705                                143071 
##       Computers, Tablets & Components                         Digital Music 
##                                   603                                     2 
##            Digital Products Accessory            Digital Video Subscription 
##                                     1                                     5 
##              Echo & Alexa Accessories                          Echo Devices 
##                                    21                                    46 
##    Electronic Components & Home Audio                      Element Smart TV 
##                                   911                                     7 
##                          Fire Tablets              Fire Tablets Accessories 
##                                    45                                    19 
##                   Fire TV Accessories                       Fire TV Devices 
##                                     4                                    21 
##                        Fresh Prepared                             Furniture 
##                                    43                                  1972 
##                Grocery & Gourmet Food                              Handmade 
##                                  3866                                   102 
##                    Health & Household Health &amp; Personal Care Appliances 
##                                  5561                                  1327 
##                                  Home                    Home Entertainment 
##                                 14073                                    29 
##                      Home Improvement                               Jewelry 
##                                  2652                                  9433 
##                          Kindle Books                      Kindle E-readers 
##                                    27                                    13 
##          Kindle E-readers Accessories                      Kitchen & Dining 
##                                     5                                  5452 
##                               Luggage                         Luxury Beauty 
##                                  1740                                   697 
##                             Magazines                      Major Appliances 
##                                     8                                    44 
##                   Musical Instruments              Office & School Supplies 
##                                    62                                  2466 
##                                 Other                Other Gift Card Brands 
##                                   180                                    32 
##                    Outdoor Recreation                  Patio, Lawn & Garden 
##                                   832                                  1480 
##                   Pet Food & Supplies                    Power & Hand Tools 
##                                  3248                                   659 
##                          Prime Pantry        Ring Alarms and Smart Lighting 
##                                     4                                     5 
##      Ring Video Doorbells and Cameras  Shoes, Handbags, Wallets, Sunglasses 
##                                     9                                  9468 
##               Softlines Private Label                              Software 
##                                  5253                                     1 
##                     Software Download                      Sports & Fitness 
##                                     5                                  3691 
##                   Sports Collectibles                        Tires & Wheels 
##                                     4                                     5 
##                          Toys & Games                  Video Game Downloads 
##                                  3578                                    28 
##                           Video Games          Video On Demand: Rent or Buy 
##                                    71                                    10 
##                               Watches                  Wine, Spirits & Beer 
##                                    22                                    13
# proportional table of all the different categories and how many times they occur in the earnings table. 

prop.table(table(select(amazon_fee_earnings, Category)))
## 
##                      Amazon Cloud Cam                 Amazon Fresh Products 
##                          3.229700e-05                          9.345945e-03 
##                     Amazon Gift Cards                AmazonBasics Microwave 
##                          4.844550e-04                          8.074251e-06 
##                 Arts, Crafts & Sewing                    Audible Audiobooks 
##                          1.211138e-04                          1.211138e-05 
##                            Automotive                        Baby & Nursery 
##                          2.385941e-03                          4.888959e-03 
##                     Beauty & Grooming                         Blu-Ray & DVD 
##                          5.795697e-02                          2.260790e-04 
##                     Books & Textbooks        Business & Industrial Supplies 
##                          6.895410e-03                          4.267242e-03 
##                 Camera, Photo & Video                           CDs & Vinyl 
##                          2.377867e-03                          9.689101e-05 
##             Cell Phones & Accessories                Clothing & Accessories 
##                          1.092042e-02                          5.775956e-01 
##       Computers, Tablets & Components                         Digital Music 
##                          2.434387e-03                          8.074251e-06 
##            Digital Products Accessory            Digital Video Subscription 
##                          4.037125e-06                          2.018563e-05 
##              Echo & Alexa Accessories                          Echo Devices 
##                          8.477963e-05                          1.857078e-04 
##    Electronic Components & Home Audio                      Element Smart TV 
##                          3.677821e-03                          2.825988e-05 
##                          Fire Tablets              Fire Tablets Accessories 
##                          1.816706e-04                          7.670538e-05 
##                   Fire TV Accessories                       Fire TV Devices 
##                          1.614850e-05                          8.477963e-05 
##                        Fresh Prepared                             Furniture 
##                          1.735964e-04                          7.961211e-03 
##                Grocery & Gourmet Food                              Handmade 
##                          1.560753e-02                          4.117868e-04 
##                    Health & Household Health &amp; Personal Care Appliances 
##                          2.245045e-02                          5.357265e-03 
##                                  Home                    Home Entertainment 
##                          5.681447e-02                          1.170766e-04 
##                      Home Improvement                               Jewelry 
##                          1.070646e-02                          3.808220e-02 
##                          Kindle Books                      Kindle E-readers 
##                          1.090024e-04                          5.248263e-05 
##          Kindle E-readers Accessories                      Kitchen & Dining 
##                          2.018563e-05                          2.201041e-02 
##                               Luggage                         Luxury Beauty 
##                          7.024598e-03                          2.813876e-03 
##                             Magazines                      Major Appliances 
##                          3.229700e-05                          1.776335e-04 
##                   Musical Instruments              Office & School Supplies 
##                          2.503018e-04                          9.955551e-03 
##                                 Other                Other Gift Card Brands 
##                          7.266826e-04                          1.291880e-04 
##                    Outdoor Recreation                  Patio, Lawn & Garden 
##                          3.358888e-03                          5.974946e-03 
##                   Pet Food & Supplies                    Power & Hand Tools 
##                          1.311258e-02                          2.660466e-03 
##                          Prime Pantry        Ring Alarms and Smart Lighting 
##                          1.614850e-05                          2.018563e-05 
##      Ring Video Doorbells and Cameras  Shoes, Handbags, Wallets, Sunglasses 
##                          3.633413e-05                          3.822350e-02 
##               Softlines Private Label                              Software 
##                          2.120702e-02                          4.037125e-06 
##                     Software Download                      Sports & Fitness 
##                          2.018563e-05                          1.490103e-02 
##                   Sports Collectibles                        Tires & Wheels 
##                          1.614850e-05                          2.018563e-05 
##                          Toys & Games                  Video Game Downloads 
##                          1.444483e-02                          1.130395e-04 
##                           Video Games          Video On Demand: Rent or Buy 
##                          2.866359e-04                          4.037125e-05 
##                               Watches                  Wine, Spirits & Beer 
##                          8.881676e-05                          5.248263e-05

Based on these I can see that there is a top category (Clothing and Accessories) that far outperforms other categories in terms of quantity of sales.

# For a clearer graph, I am taking the only the top 20 categories by total items shipped
total_items_shipped <- amazon_fee_earnings %>%
  select(Category, `Items Shipped`) %>%
  group_by(Category) %>%
  summarize(`Total Items Shipped` = sum(`Items Shipped`)) %>%
  arrange(desc(`Total Items Shipped`)) %>%
  slice(1:20)

# Graphing - arrange and mutate to order the y-axis by most to least  
total_items_shipped %>%
  arrange(`Total Items Shipped`) %>%
  mutate(Category=factor(Category, levels=Category)) %>%
  ggplot(aes(y = Category, x = `Total Items Shipped`, label = `Total Items Shipped`)) + 
    geom_col() +
    labs(y = "Category", x = "Total Items Shipped") +
    theme(axis.text.y = element_text(angle = 15, hjust = 1))

# Taking out the bar labels as they are a little bit messy
# geom_text(position = position_stack(vjust = 0.7),size = 2.5) +

Looking at just the top 20 categories (by Total Items Shipped) we can see that there aren’t really any other categories that come close to Clothing and Accessories. I then look at the same data by revenue.

# For a clearer graph, I am taking the only the top 20 categories by total revenue
total_revenue <- amazon_fee_earnings %>%
  select(Category, `Revenue($)`) %>%
  group_by(Category) %>%
  summarize(`Total Revenue` = sum(`Revenue($)`)) %>%
  arrange(desc(`Total Revenue`)) %>%
  slice(1:20)

# Graphing - arrange and mutate to order the y-axis by most to least (by revenue) 
total_revenue %>%
  arrange(`Total Revenue`) %>%
  mutate(Category=factor(Category, levels=Category)) %>%
  ggplot(aes(y = Category, x = `Total Revenue`)) +
    geom_col() +
    scale_x_continuous(labels = scales::dollar) +
    labs(y = "Category", x = "Total Revenue") +
    theme(axis.text.y = element_text(angle = 15,hjust = 1))

Interestingly, the top 20 categories for Revenue do not perfectly match the top 20 categories for Total Sales. This likely has to do with the average cost of items in each category (keep in mind to look at for final paper).

# For a clearer graph, I am taking the only the top 20 categories by ad spend
total_ads <- amazon_fee_earnings %>%
  select(Category, `Ad Fees($)`) %>%
  group_by(Category) %>%
  summarize(`Total Ad Fees` = sum(`Ad Fees($)`)) %>%
  arrange(desc(`Total Ad Fees`)) %>%
  slice(1:20)

# Graphing - arrange and mutate to order the y-axis by most to least (by revenue) 
total_ads %>%
  arrange(`Total Ad Fees`) %>%
  mutate(Category=factor(Category, levels=Category)) %>%
  ggplot(aes(y = Category, x = `Total Ad Fees`)) +
    geom_col() +
    scale_x_continuous(labels = scales::dollar) +
    labs(y = "Category", x = "Total Ad Fees") +
    theme(axis.text.y = element_text(angle = 15,hjust = 1))

Ad spend, however, does seem to closely match revenue by Category. This indicates that Ad Spend might directly lead to more revenue (and thus, spending money on ads is worth it).

Relationship between Revenue and Ad-Spend

I want to take a closer look at the relationship between Revenue and Ad Spend by Category. My hypothesis is that there is a relationship between dollars spent in Ads and Revenue.

# I am using a scatterplot to look at the relationship btwn revenue and ad spend, so no need to limit to the top 20 categories.
total_ads_revenue <- amazon_fee_earnings %>%
  select(Category, `Ad Fees($)`, `Revenue($)`) %>%
  group_by(Category) %>%
  summarize(`Total Ad Fees` = sum(`Ad Fees($)`), `Total Revenue` = sum(`Revenue($)`))

# Scatterplot
total_ads_revenue %>%
  ggplot(aes(y = `Total Revenue`, x = `Total Ad Fees`)) +
    geom_point() +
    scale_x_continuous(labels = scales::dollar) +
    scale_y_continuous(labels = scales::dollar)

It looks like there is a clear correlation between $ spent in Ad Fees and Revenue. It might be that this is the way Amazon collects ad fees (based on revenue), and not an indication of Ad Fees leading to Revenue.

It is also difficult to see what is happening at lower left corner of the graph because it is so skewed by Clothing & Accessories, so I am going to try to exclude that category to take a closer look.

# Excluding the Clothing & Accessories category to get a better look at the other categories. 
total_ads_revenue %>%
  filter(Category != "Clothing & Accessories") %>%
  ggplot(aes(y = `Total Revenue`, x = `Total Ad Fees`)) +
    geom_point() +
    scale_x_continuous(labels = scales::dollar) +
    scale_y_continuous(labels = scales::dollar)

Okay, there is definitely a relationship. I am still worried that this is based on the payment structure of Amazon Influencers. I will test this same scatter plot on the total number of items shipped to see if the relationship holds up.

# I am using a scatterplot to look at the relationship btwn revenue and ad spend, so no need to limit to the top 20 categories.
total_ads_items_shipped <- amazon_fee_earnings %>%
  select(Category, `Ad Fees($)`, `Items Shipped`) %>%
  group_by(Category) %>%
  summarize(`Total Ad Fees` = sum(`Ad Fees($)`), `Total Items Shipped` = sum(`Items Shipped`))

# Scatterplot
total_ads_items_shipped %>%
  ggplot(aes(y = `Total Items Shipped`, x = `Total Ad Fees`)) +
    geom_point() +
    scale_x_continuous(labels = scales::dollar) +
    scale_y_continuous(labels = scales::dollar)

# Again, I am going to exclude Clothing & Accessories
total_ads_items_shipped %>%
  filter(Category != "Clothing & Accessories") %>%
  ggplot(aes(y = `Total Items Shipped`, x = `Total Ad Fees`)) +
    geom_point() +
    scale_x_continuous(labels = scales::dollar) +
    scale_y_continuous(labels = scales::dollar)

From my preliminary analysis here, I am concluding that there is a relationship between Total Ad Spend and Revenue/Items Shipped. However, it looks like there are categories where ads might not be as relavent to sales. Next steps (for my final paper) that I might want to consider in the relationship between ads and revenue: 1. Looking at Direct vs. Indirect sales (what do people follow links for vs. what do they buy when they’re already shopping). 2. As I noted above, I want to do some more summary analyses of categories (such as average price per item). It might also be cool to do a deep dive into a specific category (such as Clothing & Accessories). 3. Buyer analyses: when are they buying, how?

```

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.


  1. Indirect/direct - If someone follows Lucy’s link to Amazon and buys that item it will count as direct sales, but if they also buy additional items, she also gets commissions for those sales but they are counted as indirect sales.↩︎