Project 2 Wirecutter v Amazon

Nick Oliver

Project 2 - The Wirecutter Vs. Amazon

The goal of this analysis is to evaluate the quality of The Wirecutter’s product recommendations. I am suspicious that The Wirecutter is bad judge of quality due to my own personal experience of going to purchase a recommended product on Amazon and seeing generally negative sentiment about the product Some of the bad reviews even mentioned The Wirecutter’s recommendation as their reason for buying the product.

To perform my analysis I used the article that The Wirecutter posted with recommendations for the best wired earbuds under $2001. https://www.nytimes.com/wirecutter/reviews/the-best-200-in-ear-headphones/

Setup

Load Libraries

library(RCurl)
library(stringr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
## 
## Attaching package: 'tidyr'
## The following object is masked from 'package:RCurl':
## 
##     complete
library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
library(readr)
library(fuzzyjoin)

Load Data

reviewsUrl <- 'https://raw.githubusercontent.com/nolivercuny/data607/master/project2/datasets/wirecuttervsamazon/Customer%20reviews'
customerReviewsRaw <- getURL(reviewsUrl)
fileUrl <- 'https://raw.githubusercontent.com/nolivercuny/data607/master/project2/datasets/wirecuttervsamazon/wirecutter_earbuds.csv'
wirecutterDataRaw <- getURL(fileUrl)

Tidy

Amazon Scrapped Data

The Amazon review data was scrapped directly from the product pages for the products reviewed by The Wirecutter. To make things a bit easier for myself I put Customer Reviews and End Customer Reviews between each scrapped block.

First I am removing the tab character because it’s not adding any value

Next I use the string_extract_all function to grab all characters (including newlines) between Customer Reviews and End Customer Reviews because each block of text represents the review for one product.

customerReviews <- customerReviewsRaw %>% 
  str_replace_all('\t','') %>%
  str_extract_all("Customer Reviews[\\s\\S]*?End Customer Reviews")

Now I have a list where each item is a single review. I have 26 items in the list but the data is still one big long string. I used the as.data.frame function to turn the data into a 26 row data frame. I then use the tidyr.separate function to break out each list item into a wide dataframe structure.

The items are split into columns on the newline character. This gets us closer to where we want the data but there are still some issues.

  1. The rating number and the percentage of users who gave it that rating are two separate columns e.g. (4 stars|45%).
  2. Each listing wasn’t uniform in its structure there are a number of columns with NA values near the end of the row.
  3. Column names don’t make sense
  4. There are columns with meaningless data like Customer Reviews

The regex used to extract the price from the combined columns was complex due to the fact that some product listings had multiple prices.

It looked something like this

List Price: $19.99 Details
Price:  $14.90
You Save:   $5.09 (25%)

What made it even more complicated is the value we actually wanted was Price: but there is a value called List Price: so I couldn’t just grab everything after the word Price:. I learned about something called a reverse look-up.

(?<!List ) - This is a negative capture, reverse look-up. It’s basically saying on this match, look backwards for anything that matches the string List. Then do not match this group. That is where the negative part comes in. Price:(\\$(\\d+)\\.(\\d+)) - Match anything that starts with exactly Price: and starts with a dollar symbol followed by 1 or more digits followed by a decimal point, followed by 1 or more digits.

#convert to data frame with one column
customerReviewsDf <- as.data.frame(customerReviews, col.names = c("data")) %>%
  #separate column into 23 new columns on newline character
  # using trick to generate column names 1-23 with seq and paste to conver to strings
  separate("data",into=paste(seq(1,23), "", sep=""),sep = '\\n') %>%
  #remove unneeded columns
  select(!c('1','3','5','7','9','11','13','16','17')) %>%
  #rename rating % columns so they make sense
  rename(Rating = "2", TotalRatings = "4", FiveStar = '6', FourStar = '8',ThreeStar = '10', TwoStar = '12', OneStar = '14', Name = '15') %>%
  # combine the remaining columns so we can extract the price
  unite(Price, "18","19","20","21","22","23", sep="|", na.rm = TRUE) %>%
  # extract the price from price column.
  # This was more complicated due to the fact that some of the combined columns contain other prices in addition to the sales price
  mutate(Price = str_extract(Price,'(?<!List )Price:(\\$(\\d+)\\.(\\d+))'))
## Warning: Expected 23 pieces. Missing pieces filled with `NA` in 25 rows [1, 2,
## 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, ...].

Show the tidy data

kable(customerReviewsDf, format = "html") %>% kable_styling("striped") %>% scroll_box(width = "100%")
Rating TotalRatings FiveStar FourStar ThreeStar TwoStar OneStar Name Price
4.1 out of 5 stars 10,010 global ratings 59% 16% 9% 7% 9% 1MORE Triple Driver In-Ear Earphones Hi-Res Headphones with High Resolution, Bass Driven Sound, MEMS Mic, In-Line Remote, High Fidelity for Smartphones/PC/Tablet - Silver Price:$70.77
4.2 out of 5 stars 183 global ratings 59% 18% 7% 11% 5% Final E4000 High Resolution Sound Isolating In-Ear Headphones Earphones Price:$149.00
3.8 out of 5 stars 976 global ratings 54% 13% 10% 8% 15% Marshall Mode in-Ear Headphones, Black/White (4090939) NA
4.0 out of 5 stars 6 global ratings 75% 0% 0% 0% 25% Campfire Audio Satsuma in Ear Monitors - in Ear Headphones for Musicians - Professional IEM Earphones Price:$199.00
4.1 out of 5 stars 13 global ratings 66% 11% 0% 11% 12% Campfire Audio Honeydew in Ear Monitors - in Ear Headphones for Musicians - Professional IEM Earphones Price:$249.00
4.4 out of 5 stars 3,483 global ratings 68% 16% 8% 3% 5% MEE audio M6 PRO Musicians’ In-Ear Monitors with Detachable Cables; Universal-Fit and Noise-Isolating (2nd Generation) (Black) Price:$49.99
4.3 out of 5 stars 122,071 global ratings 65% 17% 9% 4% 5% Panasonic ErgoFit Earbud Headphones with Microphone and Call Controller Compatible with IPhone, Android and Blackberry - RP-TCM125-A - In-Ear (Blue) Price:$14.90
4.4 out of 5 stars 2,419 global ratings 71% 13% 7% 4% 6% 1MORE Quad Driver in-Ear Earphones Hi-Res High Fidelity Headphones Warm Bass, Spacious Reproduction, High Resolution, Mic in-Line Remote Smartphones/PC/Tablet - Silver/Gray Price:$142.07
4.2 out of 5 stars 1,123 global ratings 64% 14% 11% 4% 8% beyerdynamic Soul BYRD wired premium in-ear headphones in black Price:$69.00
3.1 out of 5 stars 96 global ratings 32% 11% 13% 26% 18% Brainwavz S3 Noise Isolating in-Ear Headphones | Earbuds | Earphones with Remote and Microphone Price:$29.50
4.4 out of 5 stars 341 global ratings 70% 15% 7% 4% 3% Etymotic Research ER2XR Extended Response High Performance In-Ear Earphones (Detachable Dynamic Drivers, Noise Isolating, High Accuracy, Robust Low Frequencies) Price:$99.95
4.2 out of 5 stars 165 global ratings 61% 20% 7% 4% 8% Etymotic Research ER3SE Studio Edition High Performance In-Ear Earphones (Detachable Balanced Armature Drivers, Noise Isolating, High Accuracy, Studio Grade Accuracy),Black Price:$101.75
4.7 out of 5 stars 385 global ratings 80% 14% 2% 2% 1% FiiO FH3 Triple Drive(1 Dynamic + 2 Knowles BA) in-Ear HiFi Earphones with High Resolution,Bass Sound, High Fidelity for Smartphones/PC/Tablet Price:$129.99
4.0 out of 5 stars 295 global ratings 55% 17% 12% 8% 8% Final Audio Design High Resolution Headphone - Stainless Steel (E3000) Black Price:$44.42
5.0 out of 5 stars 3 global ratings 100% 0% 0% 0% 0% Final Audio Design High Resolution Headphone - Black (F3100) Price:$189.00
4.1 out of 5 stars 35 global ratings 55% 21% 8% 8% 8% Yamaha EPH-M200RE High-Performance Earphones with Remote and Mic, Red Price:$49.00
3.6 out of 5 stars 9 global ratings 21% 36% 30% 14% 0% YAMAHA EPH-M100 in-Ear Headphones Blue NA
4.0 out of 5 stars 248 global ratings 56% 17% 7% 10% 10% V-MODA Zn In-Ear Modern Audiophile Headphones with microphone - 3 Button Price:$158.39
3.9 out of 5 stars 88 global ratings 52% 15% 13% 10% 10% V-MODA Forza Metallo In-Ear Headphones with 3-Button Remote & Microphone - Samsung and Android Devices, Gunmetal Black Price:$121.28
3.8 out of 5 stars 17 global ratings 48% 14% 14% 14% 9% Optoma NuForce HEM2 Reference Class Hi-Res in-Ear Headphones with Balanced Armature Drivers Price:$129.00
3.6 out of 5 stars 17 global ratings 27% 34% 26% 0% 14% NuForce Hem Dynamic in-Ear Monitors Hi-Res Audio Noise Isolating Single Micro Dynamic Driver Microphone and Remote Charcoal Black (Hem-Dynamic-Black) Price:$92.13
2.8 out of 5 stars 31 global ratings 20% 15% 16% 23% 26% NAD VISO HP20 in-Ear Headphones in Black Price:$99.00
4.0 out of 5 stars 15 global ratings 51% 28% 0% 10% 10% Monoprice MP80 Aluminum In-Ear Earphone, Balanced Armature Driver And Dynamic Driver With Three Tuning Nozzles Price:$64.16
3.3 out of 5 stars 70 global ratings 39% 10% 13% 17% 22% MEE audio Pinnacle P2 High Fidelity Audiophile In-Ear Headphones with Detachable Cables Price:$66.00
4.2 out of 5 stars 271 global ratings 59% 16% 14% 4% 7% MEE audio Pinnacle P1 High Fidelity Audiophile In-Ear Headphones with Detachable Cables - EP-P1-ZN-MEE, Pinnacle P1 (Zinc) Price:$159.35
4.1 out of 5 stars 25 global ratings 58% 16% 8% 8% 9% GRADO iGe3 Wired In Ear Headphone (earbuds) Smart Device Controller w/Microphone Price:$99.00

Wirecutter Data

The Wirecutter review data is pretty basic CSV data with the product name and its “recommendation” level.

There are 4 levels of recommendations which roughly map to the following

recommendation meaning
Top Pick Buy this it’s the best
Budget Pick Buy this if you’re on a budget. Not as good as the Top Pick but still good
Other good wired earbuds Recommended with caveats
the competition Not recommended
wirecutterDf <- read_csv(wirecutterDataRaw, skip=1) %>% as.data.frame()
## Rows: 36 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Product Name, Wirecutter Rating, Amazon Name
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Combining Data Sets

Given that there are two data sets I need to combine them to do analysis. This presents a challenge because for the most part the name on Amazon does not match the name from The Wirecutter.

joinedDf <- left_join(wirecutterDf, customerReviewsDf, by = c("Amazon Name"  = "Name")) %>%
  mutate(RatingNum = as.numeric(str_extract(Rating,"^\\D*(\\d+(?:\\.\\d+)?)"))) %>%
  mutate(TotalRatingsNum = as.numeric(gsub("[^0-9]","",TotalRatings)))
## Warning: One or more parsing issues, see `problems()` for details

Analysis

To analyze my results I simple created a table with ordered by the rating out of 5 stars with the highest rated at the top.

The evidence seems to support my suspicion that The Wirecutter’s recommendations are not representative of customer’s real world experiences with products.

  1. The number one pick (Final E4000) ranks 6 out 36, below a similarly priced, higher rated, and more reviewed product (1More Quad Driver)
  2. Their budget pick (Marshall Mode) is 19 out of 36, with a rating of just 3.8. Unfortunately there was no price on this item on Amazon as it was no longer being sold but a low priced pair of earbuds (Final E3000 ) at $44.42, has a higher rating of 4.0 out of 5.0 and more reviews with 295 reviews.
joinedDf %>%   
  arrange(desc(RatingNum)) %>% 
  select(`Product Name`, `Wirecutter Rating`,RatingNum, TotalRatingsNum, Price)%>% 
  
  kable( format = "html") %>% kable_styling("striped") %>% scroll_box(width = "100%")
Product Name Wirecutter Rating RatingNum TotalRatingsNum Price
Final F3100 the competition 5.0 3 Price:$189.00
FiiO FH3 the competition 4.7 385 Price:$129.99
1More Quad Driver the competition 4.4 2419 Price:$142.07
Etymotic ER2XR the competition 4.4 341 Price:$99.95
Panasonic RP-TCM125 ErgoFit Other good wired earbuds 4.3 122071 Price:$14.90
Final E4000 Top Pick 4.2 183 Price:$149.00
Beyerdynamic Soul Byrd the competition 4.2 1123 Price:$69.00
Etymotic ER3SE the competition 4.2 165 Price:$101.75
MEE Audio Pinnacle P1 the competition 4.2 271 Price:$159.35
1More Triple Driver the competition 4.1 10010 Price:$70.77
Campfire Audio Honeydew the competition 4.1 13 Price:$249.00
Grado iGe3 the competition 4.1 25 Price:$99.00
Yamaha EPH-M200 the competition 4.1 35 Price:$49.00
Campfire Audio Satsuma Other good wired earbuds 4.0 6 Price:$199.00
Final E3000 the competition 4.0 295 Price:$44.42
Monoprice MP80 the competition 4.0 15 Price:$64.16
V-Moda Zn the competition 4.0 248 Price:$158.39
V-Moda Forza Metallo the competition 3.9 88 Price:$121.28
Marshall Mode Budget Pick 3.8 976 NA
Optoma NuForce HEM2 the competition 3.8 17 Price:$129.00
Optoma NuForce HEM Dynamic the competition 3.6 17 Price:$92.13
Yamaha EPH-M100 the competition 3.6 9 NA
MEE Audio Pinnacle P2 the competition 3.3 70 Price:$66.00
Brainwavz S3 the competition 3.1 96 Price:$29.50
NAD Viso HP20 the competition 2.8 31 Price:$99.00
Honeydew Other good wired earbuds NA NA NA
MEE Audio M6 Pro 2nd Generation Other good wired earbuds NA NA NA
Flare Audio Flares Jet the competition NA NA NA
Marshall Mode EQ the competition NA NA NA
Massdrop x NuForce EDC the competition NA NA NA
Massdrop x NuForce EDC3 the competition NA NA NA
Meze Audio Rai Solo the competition NA NA NA
Monoprice Quintet Wired In Ear Monitor the competition NA NA NA
Sennheiser IE 100 Pro the competition NA NA NA
Sennheiser IE 300 the competition NA NA NA
Shure Aonic 4 the competition NA NA NA

Conclusions

While I did find some evidence to support my initial hypothesis, it is pretty weak evidence. If I truly wanted to support my claim there are a number of further steps I would need to take.

  1. Collect more data from more products.
  2. Account for number of ratings with the rating value.

References


  1. The New York Times. (2013, October 6). The Best Wired earbuds. The New York Times. Retrieved October 4, 2021, from https://www.nytimes.com/wirecutter/reviews/the-best-200-in-ear-headphones/.↩︎