Amazon Product Sentiment Analysis with R

Set working directory

Loading neccessary librearies

library(dplyr)
library(ggplot2)
library(forcats)
library(tidyverse)
library(tidytext)
library(lubridate)

Loading CSV file

# 2. Load the dataset
reviews <- read_csv("Reviews.csv", show_col_types = FALSE)

Structure of dataset

## spc_tbl_ [568,454 × 10] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Id                    : num [1:568454] 1 2 3 4 5 6 7 8 9 10 ...
##  $ ProductId             : chr [1:568454] "B001E4KFG0" "B00813GRG4" "B000LQOCH0" "B000UA0QIQ" ...
##  $ UserId                : chr [1:568454] "A3SGXH7AUHU8GW" "A1D87F6ZCVE5NK" "ABXLMWJIXXAIN" "A395BORC6FGVXV" ...
##  $ ProfileName           : chr [1:568454] "delmartian" "dll pa" "Natalia Corres \"Natalia Corres\"" "Karl" ...
##  $ HelpfulnessNumerator  : num [1:568454] 1 0 1 3 0 0 0 0 1 0 ...
##  $ HelpfulnessDenominator: num [1:568454] 1 0 1 3 0 0 0 0 1 0 ...
##  $ Score                 : num [1:568454] 5 1 4 2 5 4 5 5 5 5 ...
##  $ Time                  : num [1:568454] 1.30e+09 1.35e+09 1.22e+09 1.31e+09 1.35e+09 ...
##  $ Summary               : chr [1:568454] "Good Quality Dog Food" "Not as Advertised" "\"Delight\" says it all" "Cough Medicine" ...
##  $ Text                  : chr [1:568454] "I have bought several of the Vitality canned dog food products and have found them all to be of good quality. T"| __truncated__ "Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if t"| __truncated__ "This is a confection that has been around a few centuries.  It is a light, pillowy citrus gelatin with nuts - i"| __truncated__ "If you are looking for the secret ingredient in Robitussin I believe I have found it.  I got this in addition t"| __truncated__ ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Id = col_double(),
##   ..   ProductId = col_character(),
##   ..   UserId = col_character(),
##   ..   ProfileName = col_character(),
##   ..   HelpfulnessNumerator = col_double(),
##   ..   HelpfulnessDenominator = col_double(),
##   ..   Score = col_double(),
##   ..   Time = col_double(),
##   ..   Summary = col_character(),
##   ..   Text = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>

The dataset consists of reviews of fine foods from Amazon spanning over a decade, up to October 2012. It contains approximately 568,454 reviews, involving 256,059 users and 74,258 products. The reviews include information such as product and user details, ratings, and plain text reviews. Additionally, it encompasses reviews from various Amazon categories, not just fine foods. The data is available in two files: Reviews.csv, which is derived from a corresponding SQLite table named Reviews in the database.sqlite file.

Performing sentiment analysis on the ‘Text’ column

The sentiment analysis shows that the vast majority of Amazon reviews, around 88.5%, exhibit positive emotion. However, there is still a significant 8.7% negative attitude and 2.8% neutral opinion. This detailed understanding of customer feedback provides Amazon with important insights into consumer views, allowing it to make targeted improvements to product quality, customer service, and overall user experience. By addressing areas of concern identified by negative sentiment, Amazon can increase consumer satisfaction, improve brand loyalty, and drive long-term growth in the highly competitive e-commerce landscape.

# Convert Time Variable to Date Format
reviews$Time <- as.POSIXct(reviews$Time, origin = "1970-01-01", tz = "UTC")

# Perform sentiment analysis on the 'Text' column
sentiment_scores <- reviews %>%
  unnest_tokens(word, Text) %>%
  inner_join(get_sentiments("afinn"), by = "word") %>%
  group_by(Id) %>%
  summarise(sentiment_score = sum(value)) %>%
  mutate(Sentiment = ifelse(sentiment_score > 0, "Positive", ifelse(sentiment_score < 0, "Negative", "Neutral")))

The function changes the ‘Time’ column from the’reviews’ dataset to POSIXct format, allowing for time-based analysis. Following that, sentiment analysis is applied to the ‘Text’ column using the AFINN lexicon. Each review is tokenized, and emotion scores are assigned to each token. Finally, sentiment scores are combined at the review level, and each review is classified as ‘Positive’, ‘Negative’, or ‘Neutral’ depending on its overall sentiment score.

Variations in sentiment scores over time indicate changes in client satisfaction. Between 2000 and 2005, sentiment skyrocketed, most likely suggesting pleasant encounters with items or services. However, sentiment dipped later, probably due to a variety of circumstances such as product modifications or greater competition. Businesses like Amazon would typically swiftly analyse and rectify these shifts in order to maintain consumer satisfaction and reputation on the platform. It is easier to make a one time buyer a client (customer) than to get a new buyer

The sentiment analysis shows that different products on Amazon elicit varying levels of sentiment from customers.
I used a random sampling of a portion of product sentiment data to guarantee that the analysis was robust and not biassed towards specific products. This approach aids in obtaining a more representative understanding of sentiment trends across a wide range of products, resulting in a more comprehensive and unbiased analysis. As a result, it aligns with the concept of “Robust Analysis” by lowering the possibility of bias and delivering a more comprehensive view of sentiment trends.

Amazon Product Sentiment Analysis with R

Emeka Anthony Nwabueze

2024-04-07

Set working directory

Loading neccessary librearies

Loading CSV file

Structure of dataset

Performing sentiment analysis on the ‘Text’ column