Data_Review.knit

title: “Data Review: Amazon Product Review Sentiment Analysis” author: “GOLLANAPALLI DURGA KALYANI” date: “2026-05-31” output: html_document ———————

Import Your Data

dat <- read_csv("Book2.csv")

## Rows: 599 Columns: 24
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (16): id, name, asins, brand, categories, primaryCategories, imageURLs,...
## dbl   (3): reviews.id, reviews.numHelpful, reviews.rating
## lgl   (2): reviews.dateAdded, reviews.doRecommend
## dttm  (3): dateAdded, dateUpdated, reviews.date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

dat <- dat %>%
  mutate(
    sentiment = case_when(
      reviews.rating >= 4 ~ "Positive",
      reviews.rating == 3 ~ "Neutral",
      reviews.rating <= 2 ~ "Negative"
    )
  )

dat

## # A tibble: 599 × 25
##    id       dateAdded           dateUpdated         name  asins brand categories
##    <chr>    <dttm>              <dttm>              <chr> <chr> <chr> <chr>     
##  1 AVqVGZN… 2017-03-03 16:56:05 2018-10-25 16:36:31 "Ama… B00Z… Amaz… Computers…
##  2 AVqVGZN… 2017-03-03 16:56:05 2018-10-25 16:36:31 "Ama… B00Z… Amaz… Computers…
##  3 AVqVGZN… 2017-03-03 16:56:05 2018-10-25 16:36:31 "Ama… B00Z… Amaz… Computers…
##  4 AVqVGZN… 2017-03-03 16:56:05 2018-10-25 16:36:31 "Ama… B00Z… Amaz… Computers…
##  5 AVqVGZN… 2017-03-03 16:56:05 2018-10-25 16:36:31 "Ama… B00Z… Amaz… Computers…
##  6 AVqVGZN… 2017-03-03 16:56:05 2018-10-25 16:36:31 "Ama… B00Z… Amaz… Computers…
##  7 AVqVGZN… 2017-03-03 16:56:05 2018-10-25 16:36:31 "Ama… B00Z… Amaz… Computers…
##  8 AVqVGZN… 2017-03-03 16:56:05 2018-10-25 16:36:31 "Ama… B00Z… Amaz… Computers…
##  9 AVqVGZN… 2017-03-03 16:56:05 2018-10-25 16:36:31 "Ama… B00Z… Amaz… Computers…
## 10 AVqVGZN… 2017-03-03 16:56:05 2018-10-25 16:36:31 "Ama… B00Z… Amaz… Computers…
## # ℹ 589 more rows
## # ℹ 18 more variables: primaryCategories <chr>, imageURLs <chr>, keys <chr>,
## #   manufacturer <chr>, manufacturerNumber <chr>, reviews.date <dttm>,
## #   reviews.dateAdded <lgl>, reviews.dateSeen <chr>, reviews.doRecommend <lgl>,
## #   reviews.id <dbl>, reviews.numHelpful <dbl>, reviews.rating <dbl>,
## #   reviews.sourceURLs <chr>, reviews.text <chr>, reviews.title <chr>,
## #   reviews.username <chr>, sourceURLs <chr>, sentiment <chr>

Part 1

For my first figure, I am going to create a bar chart showing the distribution of customer sentiment. I will use the review rating to classify reviews as positive, neutral, or negative and then display the sentiment categories.

fig_dat1 <- dat %>%
  select(reviews.rating, sentiment)

fig_dat1

## # A tibble: 599 × 2
##    reviews.rating sentiment
##             <dbl> <chr>    
##  1              3 Neutral  
##  2              5 Positive 
##  3              4 Positive 
##  4              5 Positive 
##  5              5 Positive 
##  6              5 Positive 
##  7              5 Positive 
##  8              4 Positive 
##  9              5 Positive 
## 10              5 Positive 
## # ℹ 589 more rows

Part 2

For my second figure, I am going to create a line chart showing how customer sentiment changes over time. I will use the review date and sentiment category to examine trends in customer opinions.

fig_dat2 <- dat %>%
  select(reviews.date, sentiment)

fig_dat2

## # A tibble: 599 × 2
##    reviews.date        sentiment
##    <dttm>              <chr>    
##  1 2017-09-03 00:00:00 Neutral  
##  2 2017-06-06 00:00:00 Positive 
##  3 2018-04-20 00:00:00 Positive 
##  4 2017-11-02 17:33:31 Positive 
##  5 2018-04-24 00:00:00 Positive 
##  6 2016-12-14 00:00:00 Positive 
##  7 2017-12-20 17:38:23 Positive 
##  8 2017-07-14 00:00:00 Positive 
##  9 2018-05-23 00:00:00 Positive 
## 10 2018-01-12 00:00:00 Positive 
## # ℹ 589 more rows

Part 3

For my third figure, I am going to create a boxplot comparing review ratings across sentiment categories. This will help identify how customer ratings differ between positive, neutral, and negative reviews.

fig_dat3 <- dat %>%
  select(sentiment, reviews.rating)

fig_dat3

## # A tibble: 599 × 2
##    sentiment reviews.rating
##    <chr>              <dbl>
##  1 Neutral                3
##  2 Positive               5
##  3 Positive               4
##  4 Positive               5
##  5 Positive               5
##  6 Positive               5
##  7 Positive               5
##  8 Positive               4
##  9 Positive               5
## 10 Positive               5
## # ℹ 589 more rows