Project 2 - The Wirecutter Vs. Amazon
The goal of this analysis is to evaluate the quality of The Wirecutter’s product recommendations. I am suspicious that The Wirecutter is bad judge of quality due to my own personal experience of going to purchase a recommended product on Amazon and seeing generally negative sentiment about the product Some of the bad reviews even mentioned The Wirecutter’s recommendation as their reason for buying the product.
To perform my analysis I used the article that The Wirecutter posted with recommendations for the best wired earbuds under $2001. https://www.nytimes.com/wirecutter/reviews/the-best-200-in-ear-headphones/
Setup
Load Libraries
library(RCurl)
library(stringr)
library(dplyr)##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)##
## Attaching package: 'tidyr'
## The following object is masked from 'package:RCurl':
##
## complete
library(kableExtra)##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
library(readr)
library(fuzzyjoin)Load Data
reviewsUrl <- 'https://raw.githubusercontent.com/nolivercuny/data607/master/project2/datasets/wirecuttervsamazon/Customer%20reviews'
customerReviewsRaw <- getURL(reviewsUrl)
fileUrl <- 'https://raw.githubusercontent.com/nolivercuny/data607/master/project2/datasets/wirecuttervsamazon/wirecutter_earbuds.csv'
wirecutterDataRaw <- getURL(fileUrl)Tidy
Amazon Scrapped Data
The Amazon review data was scrapped directly from the product pages for the products reviewed by The Wirecutter. To make things a bit easier for myself I put Customer Reviews and End Customer Reviews between each scrapped block.
First I am removing the tab character because it’s not adding any value
Next I use the string_extract_all function to grab all characters (including newlines) between Customer Reviews and End Customer Reviews because each block of text represents the review for one product.
customerReviews <- customerReviewsRaw %>%
str_replace_all('\t','') %>%
str_extract_all("Customer Reviews[\\s\\S]*?End Customer Reviews")Now I have a list where each item is a single review. I have 26 items in the list but the data is still one big long string. I used the as.data.frame function to turn the data into a 26 row data frame. I then use the tidyr.separate function to break out each list item into a wide dataframe structure.
The items are split into columns on the newline character. This gets us closer to where we want the data but there are still some issues.
- The rating number and the percentage of users who gave it that rating are two separate columns e.g. (4 stars|45%).
- Each listing wasn’t uniform in its structure there are a number of columns with
NAvalues near the end of the row. - Column names don’t make sense
- There are columns with meaningless data like
Customer Reviews
The regex used to extract the price from the combined columns was complex due to the fact that some product listings had multiple prices.
It looked something like this
List Price: $19.99 Details
Price: $14.90
You Save: $5.09 (25%)
What made it even more complicated is the value we actually wanted was Price: but there is a value called List Price: so I couldn’t just grab everything after the word Price:. I learned about something called a reverse look-up.
(?<!List ) - This is a negative capture, reverse look-up. It’s basically saying on this match, look backwards for anything that matches the string List. Then do not match this group. That is where the negative part comes in. Price:(\\$(\\d+)\\.(\\d+)) - Match anything that starts with exactly Price: and starts with a dollar symbol followed by 1 or more digits followed by a decimal point, followed by 1 or more digits.
#convert to data frame with one column
customerReviewsDf <- as.data.frame(customerReviews, col.names = c("data")) %>%
#separate column into 23 new columns on newline character
# using trick to generate column names 1-23 with seq and paste to conver to strings
separate("data",into=paste(seq(1,23), "", sep=""),sep = '\\n') %>%
#remove unneeded columns
select(!c('1','3','5','7','9','11','13','16','17')) %>%
#rename rating % columns so they make sense
rename(Rating = "2", TotalRatings = "4", FiveStar = '6', FourStar = '8',ThreeStar = '10', TwoStar = '12', OneStar = '14', Name = '15') %>%
# combine the remaining columns so we can extract the price
unite(Price, "18","19","20","21","22","23", sep="|", na.rm = TRUE) %>%
# extract the price from price column.
# This was more complicated due to the fact that some of the combined columns contain other prices in addition to the sales price
mutate(Price = str_extract(Price,'(?<!List )Price:(\\$(\\d+)\\.(\\d+))'))## Warning: Expected 23 pieces. Missing pieces filled with `NA` in 25 rows [1, 2,
## 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, ...].
Show the tidy data
kable(customerReviewsDf, format = "html") %>% kable_styling("striped") %>% scroll_box(width = "100%")| Rating | TotalRatings | FiveStar | FourStar | ThreeStar | TwoStar | OneStar | Name | Price |
|---|---|---|---|---|---|---|---|---|
| 4.1 out of 5 stars | 10,010 global ratings | 59% | 16% | 9% | 7% | 9% | 1MORE Triple Driver In-Ear Earphones Hi-Res Headphones with High Resolution, Bass Driven Sound, MEMS Mic, In-Line Remote, High Fidelity for Smartphones/PC/Tablet - Silver | Price:$70.77 |
| 4.2 out of 5 stars | 183 global ratings | 59% | 18% | 7% | 11% | 5% | Final E4000 High Resolution Sound Isolating In-Ear Headphones Earphones | Price:$149.00 |
| 3.8 out of 5 stars | 976 global ratings | 54% | 13% | 10% | 8% | 15% | Marshall Mode in-Ear Headphones, Black/White (4090939) | NA |
| 4.0 out of 5 stars | 6 global ratings | 75% | 0% | 0% | 0% | 25% | Campfire Audio Satsuma in Ear Monitors - in Ear Headphones for Musicians - Professional IEM Earphones | Price:$199.00 |
| 4.1 out of 5 stars | 13 global ratings | 66% | 11% | 0% | 11% | 12% | Campfire Audio Honeydew in Ear Monitors - in Ear Headphones for Musicians - Professional IEM Earphones | Price:$249.00 |
| 4.4 out of 5 stars | 3,483 global ratings | 68% | 16% | 8% | 3% | 5% | MEE audio M6 PRO Musicians’ In-Ear Monitors with Detachable Cables; Universal-Fit and Noise-Isolating (2nd Generation) (Black) | Price:$49.99 |
| 4.3 out of 5 stars | 122,071 global ratings | 65% | 17% | 9% | 4% | 5% | Panasonic ErgoFit Earbud Headphones with Microphone and Call Controller Compatible with IPhone, Android and Blackberry - RP-TCM125-A - In-Ear (Blue) | Price:$14.90 |
| 4.4 out of 5 stars | 2,419 global ratings | 71% | 13% | 7% | 4% | 6% | 1MORE Quad Driver in-Ear Earphones Hi-Res High Fidelity Headphones Warm Bass, Spacious Reproduction, High Resolution, Mic in-Line Remote Smartphones/PC/Tablet - Silver/Gray | Price:$142.07 |
| 4.2 out of 5 stars | 1,123 global ratings | 64% | 14% | 11% | 4% | 8% | beyerdynamic Soul BYRD wired premium in-ear headphones in black | Price:$69.00 |
| 3.1 out of 5 stars | 96 global ratings | 32% | 11% | 13% | 26% | 18% | Brainwavz S3 Noise Isolating in-Ear Headphones | Earbuds | Earphones with Remote and Microphone | Price:$29.50 |
| 4.4 out of 5 stars | 341 global ratings | 70% | 15% | 7% | 4% | 3% | Etymotic Research ER2XR Extended Response High Performance In-Ear Earphones (Detachable Dynamic Drivers, Noise Isolating, High Accuracy, Robust Low Frequencies) | Price:$99.95 |
| 4.2 out of 5 stars | 165 global ratings | 61% | 20% | 7% | 4% | 8% | Etymotic Research ER3SE Studio Edition High Performance In-Ear Earphones (Detachable Balanced Armature Drivers, Noise Isolating, High Accuracy, Studio Grade Accuracy),Black | Price:$101.75 |
| 4.7 out of 5 stars | 385 global ratings | 80% | 14% | 2% | 2% | 1% | FiiO FH3 Triple Drive(1 Dynamic + 2 Knowles BA) in-Ear HiFi Earphones with High Resolution,Bass Sound, High Fidelity for Smartphones/PC/Tablet | Price:$129.99 |
| 4.0 out of 5 stars | 295 global ratings | 55% | 17% | 12% | 8% | 8% | Final Audio Design High Resolution Headphone - Stainless Steel (E3000) Black | Price:$44.42 |
| 5.0 out of 5 stars | 3 global ratings | 100% | 0% | 0% | 0% | 0% | Final Audio Design High Resolution Headphone - Black (F3100) | Price:$189.00 |
| 4.1 out of 5 stars | 35 global ratings | 55% | 21% | 8% | 8% | 8% | Yamaha EPH-M200RE High-Performance Earphones with Remote and Mic, Red | Price:$49.00 |
| 3.6 out of 5 stars | 9 global ratings | 21% | 36% | 30% | 14% | 0% | YAMAHA EPH-M100 in-Ear Headphones Blue | NA |
| 4.0 out of 5 stars | 248 global ratings | 56% | 17% | 7% | 10% | 10% | V-MODA Zn In-Ear Modern Audiophile Headphones with microphone - 3 Button | Price:$158.39 |
| 3.9 out of 5 stars | 88 global ratings | 52% | 15% | 13% | 10% | 10% | V-MODA Forza Metallo In-Ear Headphones with 3-Button Remote & Microphone - Samsung and Android Devices, Gunmetal Black | Price:$121.28 |
| 3.8 out of 5 stars | 17 global ratings | 48% | 14% | 14% | 14% | 9% | Optoma NuForce HEM2 Reference Class Hi-Res in-Ear Headphones with Balanced Armature Drivers | Price:$129.00 |
| 3.6 out of 5 stars | 17 global ratings | 27% | 34% | 26% | 0% | 14% | NuForce Hem Dynamic in-Ear Monitors Hi-Res Audio Noise Isolating Single Micro Dynamic Driver Microphone and Remote Charcoal Black (Hem-Dynamic-Black) | Price:$92.13 |
| 2.8 out of 5 stars | 31 global ratings | 20% | 15% | 16% | 23% | 26% | NAD VISO HP20 in-Ear Headphones in Black | Price:$99.00 |
| 4.0 out of 5 stars | 15 global ratings | 51% | 28% | 0% | 10% | 10% | Monoprice MP80 Aluminum In-Ear Earphone, Balanced Armature Driver And Dynamic Driver With Three Tuning Nozzles | Price:$64.16 |
| 3.3 out of 5 stars | 70 global ratings | 39% | 10% | 13% | 17% | 22% | MEE audio Pinnacle P2 High Fidelity Audiophile In-Ear Headphones with Detachable Cables | Price:$66.00 |
| 4.2 out of 5 stars | 271 global ratings | 59% | 16% | 14% | 4% | 7% | MEE audio Pinnacle P1 High Fidelity Audiophile In-Ear Headphones with Detachable Cables - EP-P1-ZN-MEE, Pinnacle P1 (Zinc) | Price:$159.35 |
| 4.1 out of 5 stars | 25 global ratings | 58% | 16% | 8% | 8% | 9% | GRADO iGe3 Wired In Ear Headphone (earbuds) Smart Device Controller w/Microphone | Price:$99.00 |
Wirecutter Data
The Wirecutter review data is pretty basic CSV data with the product name and its “recommendation” level.
There are 4 levels of recommendations which roughly map to the following
| recommendation | meaning |
|---|---|
| Top Pick | Buy this it’s the best |
| Budget Pick | Buy this if you’re on a budget. Not as good as the Top Pick but still good |
| Other good wired earbuds | Recommended with caveats |
| the competition | Not recommended |
wirecutterDf <- read_csv(wirecutterDataRaw, skip=1) %>% as.data.frame()## Rows: 36 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Product Name, Wirecutter Rating, Amazon Name
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Combining Data Sets
Given that there are two data sets I need to combine them to do analysis. This presents a challenge because for the most part the name on Amazon does not match the name from The Wirecutter.
joinedDf <- left_join(wirecutterDf, customerReviewsDf, by = c("Amazon Name" = "Name")) %>%
mutate(RatingNum = as.numeric(str_extract(Rating,"^\\D*(\\d+(?:\\.\\d+)?)"))) %>%
mutate(TotalRatingsNum = as.numeric(gsub("[^0-9]","",TotalRatings)))## Warning: One or more parsing issues, see `problems()` for details
Analysis
To analyze my results I simple created a table with ordered by the rating out of 5 stars with the highest rated at the top.
The evidence seems to support my suspicion that The Wirecutter’s recommendations are not representative of customer’s real world experiences with products.
- The number one pick (Final E4000) ranks 6 out 36, below a similarly priced, higher rated, and more reviewed product (1More Quad Driver)
- Their budget pick (Marshall Mode) is 19 out of 36, with a rating of just 3.8. Unfortunately there was no price on this item on Amazon as it was no longer being sold but a low priced pair of earbuds (Final E3000 ) at $44.42, has a higher rating of 4.0 out of 5.0 and more reviews with 295 reviews.
joinedDf %>%
arrange(desc(RatingNum)) %>%
select(`Product Name`, `Wirecutter Rating`,RatingNum, TotalRatingsNum, Price)%>%
kable( format = "html") %>% kable_styling("striped") %>% scroll_box(width = "100%")| Product Name | Wirecutter Rating | RatingNum | TotalRatingsNum | Price |
|---|---|---|---|---|
| Final F3100 | the competition | 5.0 | 3 | Price:$189.00 |
| FiiO FH3 | the competition | 4.7 | 385 | Price:$129.99 |
| 1More Quad Driver | the competition | 4.4 | 2419 | Price:$142.07 |
| Etymotic ER2XR | the competition | 4.4 | 341 | Price:$99.95 |
| Panasonic RP-TCM125 ErgoFit | Other good wired earbuds | 4.3 | 122071 | Price:$14.90 |
| Final E4000 | Top Pick | 4.2 | 183 | Price:$149.00 |
| Beyerdynamic Soul Byrd | the competition | 4.2 | 1123 | Price:$69.00 |
| Etymotic ER3SE | the competition | 4.2 | 165 | Price:$101.75 |
| MEE Audio Pinnacle P1 | the competition | 4.2 | 271 | Price:$159.35 |
| 1More Triple Driver | the competition | 4.1 | 10010 | Price:$70.77 |
| Campfire Audio Honeydew | the competition | 4.1 | 13 | Price:$249.00 |
| Grado iGe3 | the competition | 4.1 | 25 | Price:$99.00 |
| Yamaha EPH-M200 | the competition | 4.1 | 35 | Price:$49.00 |
| Campfire Audio Satsuma | Other good wired earbuds | 4.0 | 6 | Price:$199.00 |
| Final E3000 | the competition | 4.0 | 295 | Price:$44.42 |
| Monoprice MP80 | the competition | 4.0 | 15 | Price:$64.16 |
| V-Moda Zn | the competition | 4.0 | 248 | Price:$158.39 |
| V-Moda Forza Metallo | the competition | 3.9 | 88 | Price:$121.28 |
| Marshall Mode | Budget Pick | 3.8 | 976 | NA |
| Optoma NuForce HEM2 | the competition | 3.8 | 17 | Price:$129.00 |
| Optoma NuForce HEM Dynamic | the competition | 3.6 | 17 | Price:$92.13 |
| Yamaha EPH-M100 | the competition | 3.6 | 9 | NA |
| MEE Audio Pinnacle P2 | the competition | 3.3 | 70 | Price:$66.00 |
| Brainwavz S3 | the competition | 3.1 | 96 | Price:$29.50 |
| NAD Viso HP20 | the competition | 2.8 | 31 | Price:$99.00 |
| Honeydew | Other good wired earbuds | NA | NA | NA |
| MEE Audio M6 Pro 2nd Generation | Other good wired earbuds | NA | NA | NA |
| Flare Audio Flares Jet | the competition | NA | NA | NA |
| Marshall Mode EQ | the competition | NA | NA | NA |
| Massdrop x NuForce EDC | the competition | NA | NA | NA |
| Massdrop x NuForce EDC3 | the competition | NA | NA | NA |
| Meze Audio Rai Solo | the competition | NA | NA | NA |
| Monoprice Quintet Wired In Ear Monitor | the competition | NA | NA | NA |
| Sennheiser IE 100 Pro | the competition | NA | NA | NA |
| Sennheiser IE 300 | the competition | NA | NA | NA |
| Shure Aonic 4 | the competition | NA | NA | NA |
Conclusions
While I did find some evidence to support my initial hypothesis, it is pretty weak evidence. If I truly wanted to support my claim there are a number of further steps I would need to take.
- Collect more data from more products.
- Account for number of ratings with the rating value.
References
The New York Times. (2013, October 6). The Best Wired earbuds. The New York Times. Retrieved October 4, 2021, from https://www.nytimes.com/wirecutter/reviews/the-best-200-in-ear-headphones/.↩︎