library(dplyr)
library(ggplot2)
library(forcats)
library(tidyverse)
library(tidytext)
library(lubridate)
# 2. Load the dataset
reviews <- read_csv("Reviews.csv", show_col_types = FALSE)
## spc_tbl_ [568,454 × 10] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Id : num [1:568454] 1 2 3 4 5 6 7 8 9 10 ...
## $ ProductId : chr [1:568454] "B001E4KFG0" "B00813GRG4" "B000LQOCH0" "B000UA0QIQ" ...
## $ UserId : chr [1:568454] "A3SGXH7AUHU8GW" "A1D87F6ZCVE5NK" "ABXLMWJIXXAIN" "A395BORC6FGVXV" ...
## $ ProfileName : chr [1:568454] "delmartian" "dll pa" "Natalia Corres \"Natalia Corres\"" "Karl" ...
## $ HelpfulnessNumerator : num [1:568454] 1 0 1 3 0 0 0 0 1 0 ...
## $ HelpfulnessDenominator: num [1:568454] 1 0 1 3 0 0 0 0 1 0 ...
## $ Score : num [1:568454] 5 1 4 2 5 4 5 5 5 5 ...
## $ Time : num [1:568454] 1.30e+09 1.35e+09 1.22e+09 1.31e+09 1.35e+09 ...
## $ Summary : chr [1:568454] "Good Quality Dog Food" "Not as Advertised" "\"Delight\" says it all" "Cough Medicine" ...
## $ Text : chr [1:568454] "I have bought several of the Vitality canned dog food products and have found them all to be of good quality. T"| __truncated__ "Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if t"| __truncated__ "This is a confection that has been around a few centuries. It is a light, pillowy citrus gelatin with nuts - i"| __truncated__ "If you are looking for the secret ingredient in Robitussin I believe I have found it. I got this in addition t"| __truncated__ ...
## - attr(*, "spec")=
## .. cols(
## .. Id = col_double(),
## .. ProductId = col_character(),
## .. UserId = col_character(),
## .. ProfileName = col_character(),
## .. HelpfulnessNumerator = col_double(),
## .. HelpfulnessDenominator = col_double(),
## .. Score = col_double(),
## .. Time = col_double(),
## .. Summary = col_character(),
## .. Text = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
# Convert Time Variable to Date Format
reviews$Time <- as.POSIXct(reviews$Time, origin = "1970-01-01", tz = "UTC")
# Perform sentiment analysis on the 'Text' column
sentiment_scores <- reviews %>%
unnest_tokens(word, Text) %>%
inner_join(get_sentiments("afinn"), by = "word") %>%
group_by(Id) %>%
summarise(sentiment_score = sum(value)) %>%
mutate(Sentiment = ifelse(sentiment_score > 0, "Positive", ifelse(sentiment_score < 0, "Negative", "Neutral")))
The sentiment analysis shows that different products on Amazon elicit varying levels of sentiment from customers.
I used a random sampling of a portion of product sentiment data to guarantee that the analysis was robust and not biassed towards specific products. This approach aids in obtaining a more representative understanding of sentiment trends across a wide range of products, resulting in a more comprehensive and unbiased analysis. As a result, it aligns with the concept of “Robust Analysis” by lowering the possibility of bias and delivering a more comprehensive view of sentiment trends.