Introduction

I run a small pottery studio out of my home as a side business to my day job at UMaine. While I mainly sell to shops and at craft fairs throughout New England, I have occasionally sold a few pieces on Etsy too. With COVID-19 canceling all my fairs this year, I have considered ramping up my Etsy presence. While doing some research and looking around the site, I was struck by the “Etsy Design Award Winners” page and was curious about the items featured. This report will be looking at the award winning items along price, shop reviews, and whether or not items ship for free to see if anything specific or interesting sticks out.

Data

Procuring Data

The data for this project was procured by web scraping the Etsy page featuring the design winners. No intellectual property has been stolen in this process and I kept my scraping session as short as possible so as not to degrade the site quality. Given that web scraping programs are brittle and it seemed semi likely that Etsy might change or remove the page I am working with with during the course of the project, I archived the page here as a back up.

Importing Data

library(rvest)
library(stringr)
library(reshape2)
library(tidyverse)
library(rsconnect)
library(knitr)

etsy_award_winners <- read_html("https://www.etsy.com/featured/etsydesignawards?ref=finds_c")

Cleaning Data

Below is a table of the 25 most expensive items from the Etsy Design award winners list. While I was unsurprised that there were shops that made this list multiple times with different items, I was slightly shocked that there were extremely expensive, award winning items from shops with so few (or in some cases no) reviews. This seems to speak to items being chosen only on the visual impression of the work and without taking into account the shop’s broader contribution to the platform, though there are certainly outliers to this theory.

shop_names <- etsy_award_winners %>% html_nodes("p.display-inline-block") %>% html_text()

total_shop_reviews <- etsy_award_winners %>% html_nodes(".icon-b-1") %>% html_text()
total_shop_reviews1 <- total_shop_reviews %>% str_replace_all("\\(", "") %>% str_replace_all("\\)", "")
total_shop_reviews1[100] <-""
total_shop_reviews2 <- total_shop_reviews1[c(1:17,25,18:22,78,23:24,26:64,100,65:77,79:99)]

free_shipping_price <- etsy_award_winners %>% html_nodes(".text-body-larger , .wt-badge--sale-01") %>% html_text()
free_shipping_price1 <- free_shipping_price %>% str_replace_all(" ", "") %>% str_replace_all("\n", "") %>% str_replace_all("FREEshipping", "&FREE shipping")
free_shipping_price2 <- free_shipping_price1[free_shipping_price1 != "&FREE shipping"]
free_shipping_price3 <- colsplit(free_shipping_price2, "\\&", names = c("price", "shipping"))

award_winners <- cbind(shop_names, total_shop_reviews2, free_shipping_price3)
award_winners1 <- award_winners[-c(23,35,39,45,67),]

award_winners1$total_shop_reviews2 <- as.numeric(gsub(",", "",award_winners1$total_shop_reviews2))
award_winners1$price <- as.numeric(gsub("[\\, $]","", award_winners1$price))
award_winners1$shipping[award_winners1$shipping ==""] <- "CFR"

award_winners1<- award_winners1 %>% arrange(desc(price))
kable(head(award_winners1,25), padding = 10)
shop_names total_shop_reviews2 price shipping
MineralogyDesign 538 8000.00 CFR
artemer 832 5430.00 FREE shipping
wrenandcooper NA 5350.00 CFR
AdrianMartinus 653 4490.00 FREE shipping
AdrianMartinus 653 4490.00 FREE shipping
DreamersandLovers 177 1955.00 FREE shipping
AdrianMartinus 653 1840.00 FREE shipping
SumarokovaAtelier 201 1489.00 FREE shipping
DemiMacrameDesigns 9 1300.00 FREE shipping
DemiMacrameDesigns 15 1300.00 FREE shipping
PatienceAndGough 5 1010.81 CFR
WardrobeByDulcinea 676 998.00 FREE shipping
WardrobeByDulcinea 676 998.00 FREE shipping
sibodesigns 548 564.00 FREE shipping
DodoLeather 923 550.00 FREE shipping
LABBVENN 6 505.00 CFR
ATUKO 62 494.00 FREE shipping
TeslerMendelovitch 476 480.00 CFR
TeslerMendelovitch NA 480.00 CFR
maisolorzano 103 476.35 FREE shipping
maisolorzano 29 476.35 FREE shipping
FashionforFables 236 316.30 CFR
FashionforFables 236 316.30 CFR
PaniJurek 81 305.00 CFR
PaniJurek 81 305.00 CFR

Plots

Boxplot: Reviews for Items that Shipped for Free vs CFR

Below is a plot looking at the availability of free shipping vs CFR and comparing that to the number of reviews the corresponding shops have. In my experience, everyone who I have ever refunded a shipping amount to (because it cost less to ship than they paid) has left me a positive review. However, none of them mentioned that fact in their actual review. I suspect that the amount a person pays for shipping has both a conscious and unconscious effect on whether they are vocally happy with the item they purchased.

ggplot( data= award_winners1, aes(x= shipping, y= total_shop_reviews2)) +
  geom_boxplot() +
  ylim(0,2000)

Scatterplot: Comparing Price and Shipping to its Effect on Reviews

It appears items that shipped free had more reviews than items that had CFR shipping. However, there was a lot more variance within the free shipping data. To dig into this further, I also wanted to account for the price of the items. Below is a plot looking at the price of an item compared to how many reviews the shop that is selling it has. Overlaid are two lines. The blue line is the average impact of price on reviews for items that shipped for free and the red line is the average impact of price on reviews for item that shipped CFR.

summary(lm(total_shop_reviews2 ~ shipping*price, data = award_winners1))
## 
## Call:
## lm(formula = total_shop_reviews2 ~ shipping * price, data = award_winners1)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -761.5 -678.1 -480.9  -12.4 9801.6 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)   
## (Intercept)                 755.77479  275.30378   2.745  0.00752 **
## shippingFREE shipping        28.82577  377.53888   0.076  0.93934   
## price                        -0.02751    0.19731  -0.139  0.88946   
## shippingFREE shipping:price  -0.03729    0.27404  -0.136  0.89211   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1542 on 77 degrees of freedom
##   (14 observations deleted due to missingness)
## Multiple R-squared:  0.001759,   Adjusted R-squared:  -0.03713 
## F-statistic: 0.04523 on 3 and 77 DF,  p-value: 0.9871
ggplot( data= award_winners1, aes(x= price, y= total_shop_reviews2, color =shipping)) +
  geom_point(alpha = .5) +
  geom_abline(slope= 0.6994, intercept =255.7162, legend=TRUE, color = "tomato") +
  geom_abline(slope =0.6994-0.8443, intercept =255.7162 +674.0918, legend = TRUE, color = "cyan3") +
  xlim(0, 2000) +
  ylim(0, 2600)

Conclusion

Items that shipped for free were slightly more likely to receive reviews. That said, price and shipping together seem to have a much larger impact. Unfortunately, with only 95 usable observations- a handful of which were outliers- there is a lot or variation in this data. A larger sample would, perhaps, help draw a more substantive conclusion. I could also get more stable answers by taking the log of my data, but that seems slightly outside the scope of this project.

I definitely struggled with the data cleaning aspect of this assignment. There were some missing shop reviews that came through out of order (except for 1) or were outright missing. After much trial and error, I manually adjusted them, which wouldn’t be sustainable in a larger data set. I struggled to adjust and fix random html tags that came through in odd places and it was overall a bit of a slog. Web scraping is certainly valuable and saves quite a bit of time but it’s definitely not without its own pitfalls and frustrations.