I decided to analyze customer reviews for Amazon products.
The dataset was downloaded from Kaggle.
The following packages were used for data manipulation, text mining, and visualization:
tidyverse: for cleaning, reshaping, and transforming data.
dplyr: Provides flexible data frame manipulation functions:
filter()
, select()
, mutate()
,
arrange()
, and summarize()
stringr: Enables string (text) manipulation:
str_detect()
, str_replace()
,
str_extract()
, and str_split()
DT: Generates interactive tables, allowing for easy data exploration.
tidytext: Facilitates text mining in a tidy data format:
unnest_tokens()
, for tokenizing text into words or
other unitstidyr: Used for data reshaping, making it easier to handle wide or nested data:
gather()
, spread()
,
pivot_longer()
, and pivot_wider()
wordcloud: Creates word clouds, where word size reflects frequency or sentiment scores.
ggplot2: A powerful tool for data visualization, enabling a variety of plots and customization.
reshape2:
acast()
:transforms data from “long” to “wide” format,
creating matrices or data frames.#install.packages("stringr")
#install.packages("wordcloud")
#install.packages("reshape2")
library(tidyverse)
library(dplyr)
library(stringr)
library(DT)
library(tidytext)
library(tidyr)
library(ggplot2)
library(wordcloud)
library(reshape2)
The dataset downloaded from Kaggle has been placed in the
data
folder. I have selected only the columns that may be
useful for future analysis.
amazon_reviews <- read_csv("data/amazon_reviews.csv")
amazon_reviews <- amazon_reviews|>
select(product_name = name,
categories,
reviews.doRecommend,
reviews.rating,
review_text = reviews.text,
reviews.title)
datatable(amazon_reviews,
options = list(pageLength = 3,
scrollX = TRUE))