This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.
Amazon Product Sales and Customer Review Analysis
This project explores trends in Amazon product pricing, discounts, customer ratings, and product popularity using an Amazon e-commerce dataset. The visualizations aim to identify relationships between customer satisfaction, product pricing, and category-level performance. The project demonstrates the use of multiple visualization techniques in R to communicate insights clearly and effectively.
---
title: "Amazon Product Sales and Customer Review Analysis"
author: "Anjali Singh"
output: html_document
---
library(tidyverse)
library(plotly)
library(gganimate)
library(scales)
library(lubridate)
library(viridis)
amazon <- read.csv("amazon.csv")
Bar Chart
Compare average customer ratings across product categories.
amazon %>%
group_by(category) %>%
summarise(avg_rating = mean(rating, na.rm = TRUE)) %>%
slice_max(avg_rating, n = 10) %>%
ggplot(aes(x = reorder(category, avg_rating), y = avg_rating)) +
geom_col(fill = "steelblue") +
coord_flip() +
labs(
title = "Top Product Categories by Average Rating",
x = "Category",
y = "Average Rating"
)
Histogram
Understand the spread of customer ratings.
amazon %>%
ggplot(aes(x = rating)) +
geom_histogram(binwidth = 0.1, fill = "darkorange", color = "white") +
labs(
title = "Distribution of Product Ratings",
x = "Rating",
y = "Count"
)
Box Plot
Compare discount patterns across categories.
amazon %>%
ggplot(aes(x = reorder(category, discount_percentage),
y = discount_percentage)) +
geom_boxplot(fill = "purple") +
coord_flip() +
labs(
title = "Discount Percentage by Category",
x = "Category",
y = "Discount Percentage"
)
Scatter Plot
Determine whether heavily discounted products receive higher ratings.
amazon %>%
ggplot(aes(x = discount_percentage, y = rating)) +
geom_point(alpha = 0.5, color = "darkgreen") +
geom_smooth(method = "lm", color = "red") +
labs(
title = "Relationship Between Discount and Product Rating",
x = "Discount Percentage",
y = "Rating"
)
Horizontal Bar Chart
Compare product prices across categories.
amazon %>%
group_by(category) %>%
summarise(avg_price = mean(actual_price, na.rm = TRUE)) %>%
slice_max(avg_price, n = 10) %>%
ggplot(aes(x = reorder(category, avg_price), y = avg_price)) +
geom_col(fill = "tomato") +
coord_flip() +
scale_y_continuous(labels = scales::comma) +
labs(
title = "Average Product Price by Category",
x = "Category",
y = "Average Price"
)
Heatmap
Visualize combinations of ratings and discounts.
amazon %>%
mutate(
rating_group = cut(rating, breaks = 5),
discount_group = cut(discount_percentage, breaks = 5)
) %>%
count(rating_group, discount_group) %>%
ggplot(aes(rating_group, discount_group, fill = n)) +
geom_tile() +
scale_fill_viridis_c() +
labs(
title = "Heatmap of Ratings and Discounts",
x = "Rating Group",
y = "Discount Group"
)
Lollipop Chart
Identify products with the largest number of reviews.
amazon %>%
arrange(desc(rating_count)) %>%
slice(1:10) %>%
ggplot(aes(x = reorder(product_name, rating_count), y = rating_count)) +
geom_segment(aes(xend = product_name, y = 0, yend = rating_count),
color = "gray") +
geom_point(size = 4, color = "blue") +
coord_flip() +
labs(
title = "Top Reviewed Products",
x = "Product",
y = "Number of Reviews"
)
Interactive Scatter Plot
Create an interactive visualization showing price and ratings.
p <- amazon %>%
ggplot(aes(x = actual_price,
y = rating,
text = product_name,
color = category)) +
geom_point(alpha = 0.7) +
labs(
title = "Interactive Price vs Rating Visualization",
x = "Actual Price",
y = "Rating"
)
plotly::ggplotly(p)
This project demonstrated how data visualization can reveal meaningful insights from Amazon e-commerce data. By using a variety of chart types, including bar charts, box plots, scatter plots, heatmaps, histograms, lollipop charts, and interactive visualizations, the analysis explored relationships among pricing, discounts, ratings, and customer engagement. The project highlights the importance of selecting appropriate visualizations to communicate data clearly and effectively.