I decided to analyze customer reviews for Amazon products.
The dataset was downloaded from Kaggle.
The following packages were used for data manipulation, text mining, and visualization:
tidyverse: for cleaning, reshaping, and transforming data.
dplyr: Provides flexible data frame manipulation functions:
filter(), select(), mutate(),
arrange(), and summarize()stringr: Enables string (text) manipulation:
str_detect(), str_replace(),
str_extract(), and str_split()DT: Generates interactive tables, allowing for easy data exploration.
tidytext: Facilitates text mining in a tidy data format:
unnest_tokens(), for tokenizing text into words or
other unitstidyr: Used for data reshaping, making it easier to handle wide or nested data:
gather(), spread(),
pivot_longer(), and pivot_wider()wordcloud: Creates word clouds, where word size reflects frequency or sentiment scores.
ggplot2: A powerful tool for data visualization, enabling a variety of plots and customization.
reshape2:
acast():transforms data from “long” to “wide” format,
creating matrices or data frames.#install.packages("stringr")
#install.packages("wordcloud")
#install.packages("reshape2")
library(tidyverse)
library(dplyr)
library(stringr)
library(DT)
library(tidytext)
library(tidyr)
library(ggplot2)
library(wordcloud)
library(reshape2)
The dataset downloaded from Kaggle has been placed in the
data folder. I have selected only the columns that may be
useful for future analysis.
amazon_reviews <- read_csv("data/amazon_reviews.csv")
amazon_reviews <- amazon_reviews|>
select(product_name = name,
categories,
reviews.doRecommend,
reviews.rating,
review_text = reviews.text,
reviews.title)
datatable(amazon_reviews,
options = list(pageLength = 3,
scrollX = TRUE))