title: “Data Review: Amazon Product Review Sentiment Analysis” author: “GOLLANAPALLI DURGA KALYANI” date: “2026-05-31” output: html_document ———————
dat <- read_csv("Book2.csv")
## Rows: 599 Columns: 24
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (16): id, name, asins, brand, categories, primaryCategories, imageURLs,...
## dbl (3): reviews.id, reviews.numHelpful, reviews.rating
## lgl (2): reviews.dateAdded, reviews.doRecommend
## dttm (3): dateAdded, dateUpdated, reviews.date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dat <- dat %>%
mutate(
sentiment = case_when(
reviews.rating >= 4 ~ "Positive",
reviews.rating == 3 ~ "Neutral",
reviews.rating <= 2 ~ "Negative"
)
)
dat
## # A tibble: 599 × 25
## id dateAdded dateUpdated name asins brand categories
## <chr> <dttm> <dttm> <chr> <chr> <chr> <chr>
## 1 AVqVGZN… 2017-03-03 16:56:05 2018-10-25 16:36:31 "Ama… B00Z… Amaz… Computers…
## 2 AVqVGZN… 2017-03-03 16:56:05 2018-10-25 16:36:31 "Ama… B00Z… Amaz… Computers…
## 3 AVqVGZN… 2017-03-03 16:56:05 2018-10-25 16:36:31 "Ama… B00Z… Amaz… Computers…
## 4 AVqVGZN… 2017-03-03 16:56:05 2018-10-25 16:36:31 "Ama… B00Z… Amaz… Computers…
## 5 AVqVGZN… 2017-03-03 16:56:05 2018-10-25 16:36:31 "Ama… B00Z… Amaz… Computers…
## 6 AVqVGZN… 2017-03-03 16:56:05 2018-10-25 16:36:31 "Ama… B00Z… Amaz… Computers…
## 7 AVqVGZN… 2017-03-03 16:56:05 2018-10-25 16:36:31 "Ama… B00Z… Amaz… Computers…
## 8 AVqVGZN… 2017-03-03 16:56:05 2018-10-25 16:36:31 "Ama… B00Z… Amaz… Computers…
## 9 AVqVGZN… 2017-03-03 16:56:05 2018-10-25 16:36:31 "Ama… B00Z… Amaz… Computers…
## 10 AVqVGZN… 2017-03-03 16:56:05 2018-10-25 16:36:31 "Ama… B00Z… Amaz… Computers…
## # ℹ 589 more rows
## # ℹ 18 more variables: primaryCategories <chr>, imageURLs <chr>, keys <chr>,
## # manufacturer <chr>, manufacturerNumber <chr>, reviews.date <dttm>,
## # reviews.dateAdded <lgl>, reviews.dateSeen <chr>, reviews.doRecommend <lgl>,
## # reviews.id <dbl>, reviews.numHelpful <dbl>, reviews.rating <dbl>,
## # reviews.sourceURLs <chr>, reviews.text <chr>, reviews.title <chr>,
## # reviews.username <chr>, sourceURLs <chr>, sentiment <chr>
For my first figure, I am going to create a bar chart showing the distribution of customer sentiment. I will use the review rating to classify reviews as positive, neutral, or negative and then display the sentiment categories.
fig_dat1 <- dat %>%
select(reviews.rating, sentiment)
fig_dat1
## # A tibble: 599 × 2
## reviews.rating sentiment
## <dbl> <chr>
## 1 3 Neutral
## 2 5 Positive
## 3 4 Positive
## 4 5 Positive
## 5 5 Positive
## 6 5 Positive
## 7 5 Positive
## 8 4 Positive
## 9 5 Positive
## 10 5 Positive
## # ℹ 589 more rows
For my second figure, I am going to create a line chart showing how customer sentiment changes over time. I will use the review date and sentiment category to examine trends in customer opinions.
fig_dat2 <- dat %>%
select(reviews.date, sentiment)
fig_dat2
## # A tibble: 599 × 2
## reviews.date sentiment
## <dttm> <chr>
## 1 2017-09-03 00:00:00 Neutral
## 2 2017-06-06 00:00:00 Positive
## 3 2018-04-20 00:00:00 Positive
## 4 2017-11-02 17:33:31 Positive
## 5 2018-04-24 00:00:00 Positive
## 6 2016-12-14 00:00:00 Positive
## 7 2017-12-20 17:38:23 Positive
## 8 2017-07-14 00:00:00 Positive
## 9 2018-05-23 00:00:00 Positive
## 10 2018-01-12 00:00:00 Positive
## # ℹ 589 more rows
For my third figure, I am going to create a boxplot comparing review ratings across sentiment categories. This will help identify how customer ratings differ between positive, neutral, and negative reviews.
fig_dat3 <- dat %>%
select(sentiment, reviews.rating)
fig_dat3
## # A tibble: 599 × 2
## sentiment reviews.rating
## <chr> <dbl>
## 1 Neutral 3
## 2 Positive 5
## 3 Positive 4
## 4 Positive 5
## 5 Positive 5
## 6 Positive 5
## 7 Positive 5
## 8 Positive 4
## 9 Positive 5
## 10 Positive 5
## # ℹ 589 more rows