R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Amazon E-Commerce Data Visualization Final Project

Project Title

Amazon Product Sales and Customer Review Analysis


Introduction

This project explores trends in Amazon product pricing, discounts, customer ratings, and product popularity using an Amazon e-commerce dataset. The visualizations aim to identify relationships between customer satisfaction, product pricing, and category-level performance. The project demonstrates the use of multiple visualization techniques in R to communicate insights clearly and effectively.


Suggested R Markdown Structure

---
title: "Amazon Product Sales and Customer Review Analysis"
author: "Anjali Singh"
output: html_document
---
library(tidyverse)
library(plotly)
library(gganimate)
library(scales)
library(lubridate)
library(viridis)
amazon <- read.csv("amazon.csv")

Visualization 1 — Top Product Categories by Average Rating

Figure Type

Bar Chart

Goal

Compare average customer ratings across product categories.

amazon %>%
  group_by(category) %>%
  summarise(avg_rating = mean(rating, na.rm = TRUE)) %>%
  slice_max(avg_rating, n = 10) %>%
  ggplot(aes(x = reorder(category, avg_rating), y = avg_rating)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(
    title = "Top Product Categories by Average Rating",
    x = "Category",
    y = "Average Rating"
  )

Visualization 2 — Distribution of Product Ratings

Figure Type

Histogram

Goal

Understand the spread of customer ratings.

amazon %>%
  ggplot(aes(x = rating)) +
  geom_histogram(binwidth = 0.1, fill = "darkorange", color = "white") +
  labs(
    title = "Distribution of Product Ratings",
    x = "Rating",
    y = "Count"
  )

Visualization 3 — Discount Percentage by Category

Figure Type

Box Plot

Goal

Compare discount patterns across categories.

amazon %>%
  ggplot(aes(x = reorder(category, discount_percentage),
             y = discount_percentage)) +
  geom_boxplot(fill = "purple") +
  coord_flip() +
  labs(
    title = "Discount Percentage by Category",
    x = "Category",
    y = "Discount Percentage"
  )

Visualization 4 — Relationship Between Rating and Discount

Figure Type

Scatter Plot

Goal

Determine whether heavily discounted products receive higher ratings.

amazon %>%
  ggplot(aes(x = discount_percentage, y = rating)) +
  geom_point(alpha = 0.5, color = "darkgreen") +
  geom_smooth(method = "lm", color = "red") +
  labs(
    title = "Relationship Between Discount and Product Rating",
    x = "Discount Percentage",
    y = "Rating"
  )

Visualization 5 — Average Actual Price by Category

Figure Type

Horizontal Bar Chart

Goal

Compare product prices across categories.

amazon %>%
  group_by(category) %>%
  summarise(avg_price = mean(actual_price, na.rm = TRUE)) %>%
  slice_max(avg_price, n = 10) %>%
  ggplot(aes(x = reorder(category, avg_price), y = avg_price)) +
  geom_col(fill = "tomato") +
  coord_flip() +
  scale_y_continuous(labels = scales::comma) +
  labs(
    title = "Average Product Price by Category",
    x = "Category",
    y = "Average Price"
  )

Visualization 6 — Heatmap of Ratings and Discounts

Figure Type

Heatmap

Goal

Visualize combinations of ratings and discounts.

amazon %>%
  mutate(
    rating_group = cut(rating, breaks = 5),
    discount_group = cut(discount_percentage, breaks = 5)
  ) %>%
  count(rating_group, discount_group) %>%
  ggplot(aes(rating_group, discount_group, fill = n)) +
  geom_tile() +
  scale_fill_viridis_c() +
  labs(
    title = "Heatmap of Ratings and Discounts",
    x = "Rating Group",
    y = "Discount Group"
  )

Visualization 7 — Top Reviewed Products

Figure Type

Lollipop Chart

Goal

Identify products with the largest number of reviews.

amazon %>%
  arrange(desc(rating_count)) %>%
  slice(1:10) %>%
  ggplot(aes(x = reorder(product_name, rating_count), y = rating_count)) +
  geom_segment(aes(xend = product_name, y = 0, yend = rating_count),
               color = "gray") +
  geom_point(size = 4, color = "blue") +
  coord_flip() +
  labs(
    title = "Top Reviewed Products",
    x = "Product",
    y = "Number of Reviews"
  )

Visualization 8 — Interactive Plotly Visualization

Figure Type

Interactive Scatter Plot

Goal

Create an interactive visualization showing price and ratings.

p <- amazon %>%
  ggplot(aes(x = actual_price,
             y = rating,
             text = product_name,
             color = category)) +
  geom_point(alpha = 0.7) +
  labs(
    title = "Interactive Price vs Rating Visualization",
    x = "Actual Price",
    y = "Rating"
  )

plotly::ggplotly(p)

Conclusion

This project demonstrated how data visualization can reveal meaningful insights from Amazon e-commerce data. By using a variety of chart types, including bar charts, box plots, scatter plots, heatmaps, histograms, lollipop charts, and interactive visualizations, the analysis explored relationships among pricing, discounts, ratings, and customer engagement. The project highlights the importance of selecting appropriate visualizations to communicate data clearly and effectively.