In this project, a detailed exploration is conducted of an e-commerce dataset focused on the “Computers & Accessories” category. The project encompasses data cleaning, preparation, analysis, and visualization.
Data Cleaning and Preparation: We begin by rectifying inconsistencies, handling missing values, and standardizing the dataset, particularly focusing on product pricing and customer ratings. This ensures a solid foundation for accurate analysis.
Data Analysis: The analysis phase delves into understanding pricing strategies, customer satisfaction, and product categorization. We explore the relationships between pricing, discounts, and customer ratings to uncover patterns that inform business strategy.
Data Visualization: Various visual tools like histograms, bar plots, and scatter plots are employed to illustrate key data trends and insights, making the analysis comprehensible and engaging.
The project aims to extract actionable insights from the dataset, demonstrating the value of a data-driven approach in understanding and optimizing e-commerce strategies, especially in product pricing and customer engagement.
This dataset comprises a selection of products from specific categories on Amazon India, available at https://www.amazon.in/. It encompasses details about the products, including their prices, discounts, and customer ratings.
Columns in the Dataset This dataset consists of 16 columns, but I will primarily concentrate on the specific columns listed below for my analysis.
Product_Id: A unique identifier for each product.
This can be useful for tracking and referring to specific products.
product_name: The name or title of the product.
category: Provides a hierarchical categorization of
each product.
discounted_price: The price of the
product after applying the discount.
actual_price:
The original price of the product before any discounts.
discount_percentage: The percentage of discount applied
to the product.
rating: The average customer rating
for the product. This is crucial for understanding customer satisfaction
and product quality.
rating_count: The number of
ratings a product has received. This can indicate the popularity or
customer engagement with the product.
user_id: list
of unique identifiers for users who have reviewed the product.
user_name: Names of users who have reviewed the
product.
img_link: A link to an image of the
product.
#Read the data in the file
data1 <- read.csv('./Amazon_kaggle.csv')
#Displays the first few rows
head(data1, 3)
## product_id
## 1 B07JW9H4J1
## 2 B098NS6PVG
## 3 B096MSW6CT
## product_name
## 1 Wayona Nylon Braided USB to Lightning Fast Charging and Data Sync Cable Compatible for iPhone 13, 12,11, X, 8, 7, 6, 5, iPad Air, Pro, Mini (3 FT Pack of 1, Grey)
## 2 Ambrane Unbreakable 60W / 3A Fast Charging 1.5m Braided Type C Cable for Smartphones, Tablets, Laptops & other Type C devices, PD Technology, 480Mbps Data Sync, Quick Charge 3.0 (RCT15A, Black)
## 3 Sounce Fast Phone Charging Cable & Data Sync USB Cable Compatible for iPhone 13, 12,11, X, 8, 7, 6, 5, iPad Air, Pro, Mini & iOS Devices
## category
## 1 Computers&Accessories|Accessories&Peripherals|Cables&Accessories|Cables|USBCables
## 2 Computers&Accessories|Accessories&Peripherals|Cables&Accessories|Cables|USBCables
## 3 Computers&Accessories|Accessories&Peripherals|Cables&Accessories|Cables|USBCables
## discounted_price actual_price discount_percentage rating rating_count
## 1 ₹399 ₹1,099 64% 4.2 24,269
## 2 ₹199 ₹349 43% 4 43,994
## 3 ₹199 ₹1,899 90% 3.9 7,928
## about_product
## 1 High Compatibility : Compatible With iPhone 12, 11, X/XsMax/Xr ,iPhone 8/8 Plus,iPhone 7/7 Plus,iPhone 6s/6s Plus,iPhone 6/6 Plus,iPhone 5/5s/5c/se,iPad Pro,iPad Air 1/2,iPad mini 1/2/3,iPod nano7,iPod touch and more apple devices.|Fast Charge&Data Sync : It can charge and sync simultaneously at a rapid speed, Compatible with any charging adaptor, multi-port charging station or power bank.|Durability : Durable nylon braided design with premium aluminum housing and toughened nylon fiber wound tightly around the cord lending it superior durability and adding a bit to its flexibility.|High Security Level : It is designed to fully protect your device from damaging excessive current.Copper core thick+Multilayer shielding, Anti-interference, Protective circuit equipment.|WARRANTY: 12 months warranty and friendly customer services, ensures the long-time enjoyment of your purchase. If you meet any question or problem, please don't hesitate to contact us.
## 2 Compatible with all Type C enabled devices, be it an android smartphone (Mi, Samsung, Oppo, Vivo, Realme, OnePlus, etc), tablet, laptop (Macbook, Chromebook, etc)|Supports Quick Charging (2.0/3.0)|Unbreakable – Made of special braided outer with rugged interior bindings, it is ultra-durable cable that won’t be affected by daily rough usage|Ideal Length – It has ideal length of 1.5 meters which is neither too short like your typical 1meter cable or too long like a 2meters cable|Supports maximum 3A fast charging and 480 Mbps data transfer speed|6 months manufacturer warranty from the date of purchase
## 3 【 Fast Charger& Data Sync】-With built-in safety proctections and four-core copper wires promote maximum signal quality and strength and enhance charging & data transfer speed with up to 480 mb/s transferring speed.|【 Compatibility】-Compatible with iPhone 13, 12,11, X, 8, 7, 6, 5, iPad Air, Pro, Mini & iOS devices.|【 Sturdy & Durable】-The jacket and enforced connector made of TPE and premium copper, are resistant to repeatedly bending and coiling.|【 Ultra High Quality】: According to the experimental results, the fishbone design can accept at least 20,000 bending and insertion tests for extra protection and durability. Upgraded 3D aluminum connector and exclusive laser welding technology, which to ensure the metal part won't break and also have a tighter connection which fits well even with a protective case on and will never loose connection.|【 Good After Sales Service】-Our friendly and reliable customer service will respond to you within 24 hours ! you can purchase with confidence,and every sale includes a 365-day worry-free Service to prove the importance we set on quality.
## user_id
## 1 AG3D6O4STAQKAY2UVGEUV46KN35Q,AHMY5CWJMMK5BJRBBSNLYT3ONILA,AHCTC6ULH4XB6YHDY6PCH2R772LQ,AGYHHIERNXKA6P5T7CZLXKVPT7IQ,AG4OGOFWXJZTQ2HKYIOCOY3KXF2Q,AENGU523SXMOS7JPDTW52PNNVWGQ,AEQJHCVTNINBS4FKTBGQRQTGTE5Q,AFC3FFC5PKFF5PMA52S3VCHOZ5FQ
## 2 AECPFYFQVRUWC3KGNLJIOREFP5LQ,AGYYVPDD7YG7FYNBXNGXZJT525AQ,AHONIZU3ICIEHQIGQ6R2VFRSBXOQ,AFPHD2CRPDZMWMBL7WXRSVYWS5JA,AEZ346GX3HJ4O4XNRPHCNHXQURMQ,AEPSWFPNECKO34PUC7I56ITGXR6Q,AHWVEHR5DYLVFTO2KF3IZATFQSWQ,AH4QT33M55677I7ISQOAKEQWACYQ
## 3 AGU3BBQ2V2DDAMOAKGFAWDDQ6QHA,AESFLDV2PT363T2AQLWQOWZ4N3OA,AHTPQRIMGUD4BYR5YIHBH3CCGEFQ,AEUVWXYP5LT7PZLLZENEO2NODPBQ,AHC7MPW55DOO6WNCOQVA2VHOD26A,AFDI6FRPFBTNBG7BAEB7JDJSMKDQ,AFQKCEEEKXCOHTDG4WUN3XPPHJQQ,AHKUUFNMBZIDLSSPA4FEHIO2EC7Q
## user_name
## 1 Manav,Adarsh gupta,Sundeep,S.Sayeed Ahmed,jaspreet singh,Khaja moin,Anand,S.ARUMUGAM
## 2 ArdKn,Nirbhay kumar,Sagar Viswanathan,Asp,Placeholder,BharanI,sonia,Niam
## 3 Kunal,Himanshu,viswanath,sai niharka,saqib malik,Aashiq,Ramu Challa,Sanjay gupta
## review_id
## 1 R3HXWT0LRP0NMF,R2AJM3LFTLZHFO,R6AQJGUP6P86,R1KD19VHEDV0OR,R3C02RMYQMK6FC,R39GQRVBUZBWGY,R2K9EDOE15QIRJ,R3OI7YT648TL8I
## 2 RGIQEG07R9HS2,R1SMWZQ86XIN8U,R2J3Y1WL29GWDE,RYGGS0M09S3KY,R17KQRUTAN5DKS,R3AAQGS6HP2QUK,R1HDNOG6TO2CCA,R3PHKXYA5AFEOU
## 3 R3J3EQQ9TZI5ZJ,R3E7WBGK7ID0KV,RWU79XKQ6I1QF,R25X4TBMPY91LX,R27OK7G99VK0TR,R207CYDCHJJTCJ,R3PCU8XMU173BT,R1IMONDOWRNU5V
## review_title
## 1 Satisfied,Charging is really fast,Value for money,Product review,Good quality,Good product,Good Product,As of now seems good
## 2 A Good Braided Cable for Your Type C Device,Good quality product from ambrane,Super cable,As,Good quality,Good product,its good,Good quality for the price but one issue with my unit
## 3 Good speed for earlier versions,Good Product,Working good,Good for the price,Good,Worth for money,Working nice,it's a really nice product
## review_content
## 1 Looks durable Charging is fine tooNo complains,Charging is really fast, good product.,Till now satisfied with the quality.,This is a good product . The charging speed is slower than the original iPhone cable,Good quality, would recommend,https://m.media-amazon.com/images/W/WEBP_402378-T1/images/I/81---F1ZgHL._SY88.jpg,Product had worked well till date and was having no issue.Cable is also sturdy enough...Have asked for replacement and company is doing the same...,Value for money
## 2 I ordered this cable to connect my phone to Android Auto of car. The cable is really strong and the connection ports are really well made. I already has a Micro USB cable from Ambrane and it's still in good shape. I connected my phone to the car using the cable and it got connected well and no issues. I also connected it to the charging port and yes it has Fast Charging support.,It quality is good at this price and the main thing is that i didn't ever thought that this cable will be so long it's good one and charging power is too good and also supports fast charging,Value for money, with extra length👍,Good, working fine,Product quality is good,Good,very good,Bought for my daughter's old phone.Brand new cable it was not charging, I already repacked and requested for replacement.I checked again, and there was some green colour paste/fungus inside the micro USB connector. I cleaned with an alcoholic and starts working again.Checked the ampere of charging speed got around 1400ma-1500ma - not bad, came with braided 1.5m long cable, pretty impressive for the price.Can't blame the manufacturer.But quality issues by the distributor, they might have stored in very humid place.
## 3 Not quite durable and sturdy,https://m.media-amazon.com/images/W/WEBP_402378-T1/images/I/71rIggrbUCL._SY88.jpg,Working good,https://m.media-amazon.com/images/W/WEBP_402378-T1/images/I/61bKp9YO6wL._SY88.jpg,Product,Very nice product,Working well,It's a really nice product
## img_link
## 1 https://m.media-amazon.com/images/W/WEBP_402378-T1/images/I/51UsScvHQNL._SX300_SY300_QL70_FMwebp_.jpg
## 2 https://m.media-amazon.com/images/W/WEBP_402378-T2/images/I/31zOsqQOAOL._SY445_SX342_QL70_FMwebp_.jpg
## 3 https://m.media-amazon.com/images/W/WEBP_402378-T1/images/I/31IvNJZnmdL._SY445_SX342_QL70_FMwebp_.jpg
## product_link
## 1 https://www.amazon.in/Wayona-Braided-WN3LG1-Syncing-Charging/dp/B07JW9H4J1/ref=sr_1_1?qid=1672909124&s=electronics&sr=1-1
## 2 https://www.amazon.in/Ambrane-Unbreakable-Charging-Braided-Cable/dp/B098NS6PVG/ref=sr_1_2?qid=1672909124&s=electronics&sr=1-2
## 3 https://www.amazon.in/Sounce-iPhone-Charging-Compatible-Devices/dp/B096MSW6CT/ref=sr_1_3?qid=1672909124&s=electronics&sr=1-3
#Displays the column names
colnames(data1)
## [1] "product_id" "product_name" "category"
## [4] "discounted_price" "actual_price" "discount_percentage"
## [7] "rating" "rating_count" "about_product"
## [10] "user_id" "user_name" "review_id"
## [13] "review_title" "review_content" "img_link"
## [16] "product_link"
#Displays rows and columns
dimensions <- dim(data1)
dimensions
## [1] 1465 16
#Checking Data Types for each Column
sapply(data1, class)
## product_id product_name category discounted_price
## "character" "character" "character" "character"
## actual_price discount_percentage rating rating_count
## "character" "character" "character" "character"
## about_product user_id user_name review_id
## "character" "character" "character" "character"
## review_title review_content img_link product_link
## "character" "character" "character" "character"
All the columns are in the ‘character’ data type. Therefore, we need to convert the numerical columns to their appropriate data types and format them accordingly.
# Replacing "," with nothing in the "Actual Price" and "Discounted Price" columns
data1$actual_price <- gsub(",", "", data1$actual_price)
data1$discounted_price <- gsub(",", "", data1$discounted_price)
#Lets convert the data type of "Discounted Price" and "Actual Price" from "chr" to "num"
data1$discounted_price <- as.numeric(sub("₹", "", data1$discounted_price))
data1$actual_price <- as.numeric(sub("₹", "", data1$actual_price))
#Lets convert the datatype of "Discount Percentage" to "num"
# Replaces '%' and convert to numeric
data1$discount_percentage <- as.numeric(gsub('%', '', data1$discount_percentage))
sapply(data1, class)
## product_id product_name category discounted_price
## "character" "character" "character" "numeric"
## actual_price discount_percentage rating rating_count
## "numeric" "numeric" "character" "character"
## about_product user_id user_name review_id
## "character" "character" "character" "character"
## review_title review_content img_link product_link
## "character" "character" "character" "character"
# Divide by 100
data1$discount_percentage <- data1$discount_percentage / 100
# Display the 'discount_percentage' column
head(data1$discount_percentage)
## [1] 0.64 0.43 0.90 0.53 0.61 0.85
Lets inspect the rating column
#Finds the count of all the different ratings in the "Rating" column
rating_counts <- table(data1$rating)
print(rating_counts)
##
## | 2 2.3 2.6 2.8 2.9 3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4 4.1 4.2 4.3
## 1 1 1 1 2 1 4 4 2 16 10 26 35 42 86 123 181 244 228 230
## 4.4 4.5 4.6 4.7 4.8 5
## 123 75 17 6 3 3
#Inspecting the strange row in the rating column having "|" as a rating
Strange_row <- data1[data1$rating == "|", ]
Strange_row
## product_id
## 1280 B08L12N5H1
## product_name
## 1280 Eureka Forbes car Vac 100 Watts Powerful Suction Vacuum Cleaner with Washable HEPA Filter, 3 Accessories,Compact,Light Weight & Easy to use (Black and Red)
## category
## 1280 Home&Kitchen|Kitchen&HomeAppliances|Vacuum,Cleaning&Ironing|Vacuums&FloorCare|Vacuums|HandheldVacuums
## discounted_price actual_price discount_percentage rating rating_count
## 1280 2099 2499 0.16 | 992
## about_product
## 1280 No Installation is provided for this product|100 Watts Powerful Motor|Powerful Suction|In-Built LED Torch|Range of accessories for different cleaning needs|Fit type: Universal Fit
## user_id
## 1280 AGTDSNT2FKVYEPDPXAA673AIS44A,AER2XFSWNN4LAUCJ55IY5SOMF7WA,AE3MSW6H3AL6F3ZGR5LCN5AHJO6A,AG5OL5WIIPJBY25HISJLM5K2UBTQ,AGHFSIBYVYXUGSNYUDAHBGOIZ3KQ,AHYH6AZT3U3U44CDW5Y563UYIIUA,AFLOAOURRZZZGFBF7F6IKGXRB6NQ,AGNWBYEVAIII4MPQNKN3LFVOHYZQ
## user_name
## 1280 Divya,Dr Nefario,Deekshith,Preeti,Prasanth R,Pradeep kashiram Tetgure.,Abhijin Janardhan,Prashant
## review_id
## 1280 R2KKTKM4M9RDVJ,R1O692MZOBTE79,R2WRSEWL56SOS4,R3VZRQJOKCBSH4,R2QI4626ASSCIT,R1TFFJ5ON6ATEO,R14JK9VQCXXEKU,R1V4J4B7RXHG8T
## review_title
## 1280 Decent product,doesn't pick up sand,Ok ok,Must Buy,Good one for basic use with normal suction power,Super,First review,Perfect product for my car
## review_content
## 1280 Does the job well,doesn't work on sand. though the suction is very good, the sand stays back no matter which lead you use.,Ok ok,Easy to use product (I used it for the first time without knowing the know-how of its use). Cleans almost 90% of your mess inside the car. Highly recommended.,Pros: look and feel, easy to hold and handle, Cord length, useful led light during night, normal suction power but good enough for general useCons: noise is bit loud, suction power could be better,Good,Hmm,Strudy design, quality accessories,reputated brand.I have seen some car vaccum cleaners especially the low cost one's where dust comes out of machine due to poor locking mechanism, so ofcourse you don't want the dust to be falling back when the sole purpose is to remove it!!!You get what you pay for, I have home vaccum clear from same company with has a good suction and does the job well, I wanted a battery pluggable clearner where I can use it anywhere.Believe me no vaccum cleaner is good enough for heavily soiled cars, so if you are expecting a commercial level cleaning the don't buy any.Car vaccum cleaner is a portable device which gives decent satisfaction but not a perfect, I have seen some people who do not take out mats and just shove the cleaner in it and say it's not good.Mats need to be take out which actually collects big dust and derbies, this poor small machine can't handle a 100% cleaning on that.If the car is heavily soiled the you must get it cleaned at a professional clearner then start using the small one.I found this to be doing the job when compared to my house vaccum clearner.One thing I observed was the wires was getting hot and has a soft component which makes it even thinner when it's hit but I think I can handle the job, accessories quality is good and fir and finishe is good.I did not seen any dust seeping through the compartment and it was well contained in the dust chamber.Suction is decent and you can give almost a clean and tidy look to your car with this machine, again a professional grade cleaners can doa good job which cost nothing less tha 15K.I would say worth a buy it used delligently by not expecting too much.I only hope if the company can provide a bag to carry it for the price or find one in your own which can fit your needs.,Perfect
## img_link
## 1280 https://m.media-amazon.com/images/W/WEBP_402378-T1/images/I/41lZEy8e9DL._SX300_SY300_QL70_FMwebp_.jpg
## product_link
## 1280 https://www.amazon.in/Eureka-Forbes-Vacuum-Cleaner-Washable/dp/B08L12N5H1/ref=sr_1_295?qid=1672923607&s=kitchen&sr=1-295
After conducting a search for the product on the Amazon website, I discovered that the rating is 3.9. As a result, I intend to make a replacement.
# Replacing "|" with "3.9" and converting it to numeric
data1$rating <- as.numeric(gsub("\\|", "3.9", data1$rating))
Let’s remove the “,” and convert the data type of rating_count column
data1$rating_count <- as.numeric(sub(",", "", data1$rating_count))
## Warning: NAs introduced by coercion
Check for missing values
missing_values <- colSums(is.na(data1))
missing_values
## product_id product_name category discounted_price
## 0 0 0 0
## actual_price discount_percentage rating rating_count
## 0 0 0 49
## about_product user_id user_name review_id
## 0 0 0 0
## review_title review_content img_link product_link
## 0 0 0 0
We can see that there are no missing values. Now I want to create a new dataframe df1 that contains only the vital columns required for my analysis.
# Created a new data frame 'df1' with selected columns
df1 <- subset(data1, select = c(product_id, product_name, category, discounted_price, actual_price, discount_percentage, rating, rating_count))
# Splitting the strings in the category column
category_split <- strsplit(data1$category, "|", fixed = TRUE)
category_split_df <- as.data.frame(do.call(rbind, category_split))
## Warning in (function (..., deparse.level = 1) : number of columns of result is
## not a multiple of vector length (arg 1)
# Renaming the "Category Column" with only the required hirarchial category columns
colnames(category_split_df) <- c('Category_1', 'Category_2', 'Category_3')
head(category_split_df)
## Category_1 Category_2 Category_3 NA
## 1 Computers&Accessories Accessories&Peripherals Cables&Accessories Cables
## 2 Computers&Accessories Accessories&Peripherals Cables&Accessories Cables
## 3 Computers&Accessories Accessories&Peripherals Cables&Accessories Cables
## 4 Computers&Accessories Accessories&Peripherals Cables&Accessories Cables
## 5 Computers&Accessories Accessories&Peripherals Cables&Accessories Cables
## 6 Computers&Accessories Accessories&Peripherals Cables&Accessories Cables
## NA NA NA
## 1 USBCables Computers&Accessories Accessories&Peripherals
## 2 USBCables Computers&Accessories Accessories&Peripherals
## 3 USBCables Computers&Accessories Accessories&Peripherals
## 4 USBCables Computers&Accessories Accessories&Peripherals
## 5 USBCables Computers&Accessories Accessories&Peripherals
## 6 USBCables Computers&Accessories Accessories&Peripherals
# Adding the categories to the new dataframe
df1$Category_1 <- category_split_df$Category_1
df1$Category_2 <- category_split_df$Category_2
df1$Category_3 <- category_split_df$Category_3
# Removing the 'category' column from 'df1'
df1$category <- NULL
#Display the new dataframe
head(df1)
## product_id
## 1 B07JW9H4J1
## 2 B098NS6PVG
## 3 B096MSW6CT
## 4 B08HDJ86NZ
## 5 B08CF3B7N1
## 6 B08Y1TFSP6
## product_name
## 1 Wayona Nylon Braided USB to Lightning Fast Charging and Data Sync Cable Compatible for iPhone 13, 12,11, X, 8, 7, 6, 5, iPad Air, Pro, Mini (3 FT Pack of 1, Grey)
## 2 Ambrane Unbreakable 60W / 3A Fast Charging 1.5m Braided Type C Cable for Smartphones, Tablets, Laptops & other Type C devices, PD Technology, 480Mbps Data Sync, Quick Charge 3.0 (RCT15A, Black)
## 3 Sounce Fast Phone Charging Cable & Data Sync USB Cable Compatible for iPhone 13, 12,11, X, 8, 7, 6, 5, iPad Air, Pro, Mini & iOS Devices
## 4 boAt Deuce USB 300 2 in 1 Type-C & Micro USB Stress Resistant, Tangle-Free, Sturdy Cable with 3A Fast Charging & 480mbps Data Transmission, 10000+ Bends Lifespan and Extended 1.5m Length(Martian Red)
## 5 Portronics Konnect L 1.2M Fast Charging 3A 8 Pin USB Cable with Charge & Sync Function for iPhone, iPad (Grey)
## 6 pTron Solero TB301 3A Type-C Data and Fast Charging Cable, Made in India, 480Mbps Data Sync, Strong and Durable 1.5-Meter Nylon Braided USB Cable for Type-C Devices for Charging Adapter (Black)
## discounted_price actual_price discount_percentage rating rating_count
## 1 399 1099 0.64 4.2 24269
## 2 199 349 0.43 4.0 43994
## 3 199 1899 0.90 3.9 7928
## 4 329 699 0.53 4.2 94363
## 5 154 399 0.61 4.2 16905
## 6 149 1000 0.85 3.9 24871
## Category_1 Category_2 Category_3
## 1 Computers&Accessories Accessories&Peripherals Cables&Accessories
## 2 Computers&Accessories Accessories&Peripherals Cables&Accessories
## 3 Computers&Accessories Accessories&Peripherals Cables&Accessories
## 4 Computers&Accessories Accessories&Peripherals Cables&Accessories
## 5 Computers&Accessories Accessories&Peripherals Cables&Accessories
## 6 Computers&Accessories Accessories&Peripherals Cables&Accessories
Now, lets fix the strings in all the category columns
# Checking the unique values in "Category 1"
category_1_counts <- table(df1$Category_1)
category_1_counts
##
## Car&Motorbike Computers&Accessories Electronics
## 1 453 526
## Health&PersonalCare Home&Kitchen HomeImprovement
## 1 448 2
## MusicalInstruments OfficeProducts Toys&Games
## 2 31 1
# install.packages("stringr")
library(stringr)
# Fixing Strings in the Category_1 Column
df1$Category_1 <- str_replace_all(df1$Category_1, c('&' = ' & ',
'OfficeProducts' = 'Office Products',
'MusicalInstruments' = 'Musical Instruments',
'HomeImprovement' = 'Home Improvement'))
# Checking the unique values in "Category 2"
category_2_counts <- table(df1$Category_2)
category_2_counts
##
## Accessories Accessories&Peripherals
## 14 381
## Arts&Crafts Cameras&Photography
## 1 16
## CarAccessories Components
## 1 5
## CraftMaterials Electrical
## 7 2
## ExternalDevices&DataStorage GeneralPurposeBatteries&BatteryChargers
## 18 14
## Headphones,Earbuds&Accessories Heating,Cooling&AirQuality
## 66 116
## HomeAudio HomeMedicalSupplies&Equipment
## 16 1
## HomeStorage&Organization HomeTheater,TV&Video
## 16 162
## Kitchen&Dining Kitchen&HomeAppliances
## 1 308
## Laptops Microphones
## 1 2
## Mobiles&Accessories Monitors
## 161 2
## NetworkingDevices OfficeElectronics
## 34 4
## OfficePaperProducts PowerAccessories
## 27 1
## Printers,Inks&Accessories Tablets
## 11 1
## WearableTechnology
## 76
#Fixing Strings in Category_2 column
df1$Category_2 <- gsub('&', ' & ', df1$Category_2)
df1$Category_2 <- gsub(',', ', ', df1$Category_2)
df1$Category_2 <- gsub('HomeAppliances', 'Home Appliances', df1$Category_2)
df1$Category_2 <- gsub('AirQuality', 'Air Quality', df1$Category_2)
df1$Category_2 <- gsub('WearableTechnology', 'Wearable Technology', df1$Category_2)
df1$Category_2 <- gsub('NetworkingDevices', 'Networking Devices', df1$Category_2)
df1$Category_2 <- gsub('OfficePaperProducts', 'Office Paper Products', df1$Category_2)
df1$Category_2 <- gsub('ExternalDevices', 'External Devices', df1$Category_2)
df1$Category_2 <- gsub('DataStorage', 'Data Storage', df1$Category_2)
df1$Category_2 <- gsub('HomeStorage', 'Home Storage', df1$Category_2)
df1$Category_2 <- gsub('HomeAudio', 'Home Audio', df1$Category_2)
df1$Category_2 <- gsub('GeneralPurposeBatteries', 'General Purpose Batteries', df1$Category_2)
df1$Category_2 <- gsub('BatteryChargers', 'Battery Chargers', df1$Category_2)
df1$Category_2 <- gsub('CraftMaterials', 'Craft Materials', df1$Category_2)
df1$Category_2 <- gsub('OfficeElectronics', 'Office Electronics', df1$Category_2)
df1$Category_2 <- gsub('PowerAccessories', 'Power Accessories', df1$Category_2)
df1$Category_2 <- gsub('CarAccessories', 'Car Accessories', df1$Category_2)
df1$Category_2 <- gsub('HomeMedicalSupplies', 'Home Medical Supplies', df1$Category_2)
df1$Category_2 <- gsub('HomeTheater', 'Home Theater', df1$Category_2)
# Checking the unique values in "Category 3"
category_3_counts <- table(df1$Category_3)
#Fixing Strings in Category_3 column
df1$Category_3 <- gsub('&', ' & ', df1$Category_3)
df1$Category_3 <- gsub(',', ', ', df1$Category_3)
df1$Category_3 <- gsub("([a-z])([A-Z])", "\\1 \\2", df1$Category_3)
df1$Category_3 <- gsub('PCGaming Peripherals', 'PC Gaming Peripherals', df1$Category_3)
df1$Category_3 <- gsub('USBHubs', 'USB Hubs', df1$Category_3)
df1$Category_3 <- gsub('USBGadgets', 'USB Gadgets', df1$Category_3)
head(df1)
## product_id
## 1 B07JW9H4J1
## 2 B098NS6PVG
## 3 B096MSW6CT
## 4 B08HDJ86NZ
## 5 B08CF3B7N1
## 6 B08Y1TFSP6
## product_name
## 1 Wayona Nylon Braided USB to Lightning Fast Charging and Data Sync Cable Compatible for iPhone 13, 12,11, X, 8, 7, 6, 5, iPad Air, Pro, Mini (3 FT Pack of 1, Grey)
## 2 Ambrane Unbreakable 60W / 3A Fast Charging 1.5m Braided Type C Cable for Smartphones, Tablets, Laptops & other Type C devices, PD Technology, 480Mbps Data Sync, Quick Charge 3.0 (RCT15A, Black)
## 3 Sounce Fast Phone Charging Cable & Data Sync USB Cable Compatible for iPhone 13, 12,11, X, 8, 7, 6, 5, iPad Air, Pro, Mini & iOS Devices
## 4 boAt Deuce USB 300 2 in 1 Type-C & Micro USB Stress Resistant, Tangle-Free, Sturdy Cable with 3A Fast Charging & 480mbps Data Transmission, 10000+ Bends Lifespan and Extended 1.5m Length(Martian Red)
## 5 Portronics Konnect L 1.2M Fast Charging 3A 8 Pin USB Cable with Charge & Sync Function for iPhone, iPad (Grey)
## 6 pTron Solero TB301 3A Type-C Data and Fast Charging Cable, Made in India, 480Mbps Data Sync, Strong and Durable 1.5-Meter Nylon Braided USB Cable for Type-C Devices for Charging Adapter (Black)
## discounted_price actual_price discount_percentage rating rating_count
## 1 399 1099 0.64 4.2 24269
## 2 199 349 0.43 4.0 43994
## 3 199 1899 0.90 3.9 7928
## 4 329 699 0.53 4.2 94363
## 5 154 399 0.61 4.2 16905
## 6 149 1000 0.85 3.9 24871
## Category_1 Category_2 Category_3
## 1 Computers & Accessories Accessories & Peripherals Cables & Accessories
## 2 Computers & Accessories Accessories & Peripherals Cables & Accessories
## 3 Computers & Accessories Accessories & Peripherals Cables & Accessories
## 4 Computers & Accessories Accessories & Peripherals Cables & Accessories
## 5 Computers & Accessories Accessories & Peripherals Cables & Accessories
## 6 Computers & Accessories Accessories & Peripherals Cables & Accessories
Lets remove whitespaces from Product_id column
df1$product_id <- trimws(df1$product_id)
head(df1)
## product_id
## 1 B07JW9H4J1
## 2 B098NS6PVG
## 3 B096MSW6CT
## 4 B08HDJ86NZ
## 5 B08CF3B7N1
## 6 B08Y1TFSP6
## product_name
## 1 Wayona Nylon Braided USB to Lightning Fast Charging and Data Sync Cable Compatible for iPhone 13, 12,11, X, 8, 7, 6, 5, iPad Air, Pro, Mini (3 FT Pack of 1, Grey)
## 2 Ambrane Unbreakable 60W / 3A Fast Charging 1.5m Braided Type C Cable for Smartphones, Tablets, Laptops & other Type C devices, PD Technology, 480Mbps Data Sync, Quick Charge 3.0 (RCT15A, Black)
## 3 Sounce Fast Phone Charging Cable & Data Sync USB Cable Compatible for iPhone 13, 12,11, X, 8, 7, 6, 5, iPad Air, Pro, Mini & iOS Devices
## 4 boAt Deuce USB 300 2 in 1 Type-C & Micro USB Stress Resistant, Tangle-Free, Sturdy Cable with 3A Fast Charging & 480mbps Data Transmission, 10000+ Bends Lifespan and Extended 1.5m Length(Martian Red)
## 5 Portronics Konnect L 1.2M Fast Charging 3A 8 Pin USB Cable with Charge & Sync Function for iPhone, iPad (Grey)
## 6 pTron Solero TB301 3A Type-C Data and Fast Charging Cable, Made in India, 480Mbps Data Sync, Strong and Durable 1.5-Meter Nylon Braided USB Cable for Type-C Devices for Charging Adapter (Black)
## discounted_price actual_price discount_percentage rating rating_count
## 1 399 1099 0.64 4.2 24269
## 2 199 349 0.43 4.0 43994
## 3 199 1899 0.90 3.9 7928
## 4 329 699 0.53 4.2 94363
## 5 154 399 0.61 4.2 16905
## 6 149 1000 0.85 3.9 24871
## Category_1 Category_2 Category_3
## 1 Computers & Accessories Accessories & Peripherals Cables & Accessories
## 2 Computers & Accessories Accessories & Peripherals Cables & Accessories
## 3 Computers & Accessories Accessories & Peripherals Cables & Accessories
## 4 Computers & Accessories Accessories & Peripherals Cables & Accessories
## 5 Computers & Accessories Accessories & Peripherals Cables & Accessories
## 6 Computers & Accessories Accessories & Peripherals Cables & Accessories
In the rating column, we could further segregate the ratings based on different categories. Lets create a new column called Rating_Score and display the categories for ratings.
# Adding Categories to the "Rating" Column
df1$Rating_Score <- ifelse(df1$rating < 2.0, 'Poor',
ifelse(df1$rating < 3.0, 'Below Average',
ifelse(df1$rating < 4.0, 'Average',
ifelse(df1$rating < 5.0, 'Above Average',
ifelse(df1$rating == 5.0, 'Excellent', NA)))))
head(df1)
## product_id
## 1 B07JW9H4J1
## 2 B098NS6PVG
## 3 B096MSW6CT
## 4 B08HDJ86NZ
## 5 B08CF3B7N1
## 6 B08Y1TFSP6
## product_name
## 1 Wayona Nylon Braided USB to Lightning Fast Charging and Data Sync Cable Compatible for iPhone 13, 12,11, X, 8, 7, 6, 5, iPad Air, Pro, Mini (3 FT Pack of 1, Grey)
## 2 Ambrane Unbreakable 60W / 3A Fast Charging 1.5m Braided Type C Cable for Smartphones, Tablets, Laptops & other Type C devices, PD Technology, 480Mbps Data Sync, Quick Charge 3.0 (RCT15A, Black)
## 3 Sounce Fast Phone Charging Cable & Data Sync USB Cable Compatible for iPhone 13, 12,11, X, 8, 7, 6, 5, iPad Air, Pro, Mini & iOS Devices
## 4 boAt Deuce USB 300 2 in 1 Type-C & Micro USB Stress Resistant, Tangle-Free, Sturdy Cable with 3A Fast Charging & 480mbps Data Transmission, 10000+ Bends Lifespan and Extended 1.5m Length(Martian Red)
## 5 Portronics Konnect L 1.2M Fast Charging 3A 8 Pin USB Cable with Charge & Sync Function for iPhone, iPad (Grey)
## 6 pTron Solero TB301 3A Type-C Data and Fast Charging Cable, Made in India, 480Mbps Data Sync, Strong and Durable 1.5-Meter Nylon Braided USB Cable for Type-C Devices for Charging Adapter (Black)
## discounted_price actual_price discount_percentage rating rating_count
## 1 399 1099 0.64 4.2 24269
## 2 199 349 0.43 4.0 43994
## 3 199 1899 0.90 3.9 7928
## 4 329 699 0.53 4.2 94363
## 5 154 399 0.61 4.2 16905
## 6 149 1000 0.85 3.9 24871
## Category_1 Category_2 Category_3
## 1 Computers & Accessories Accessories & Peripherals Cables & Accessories
## 2 Computers & Accessories Accessories & Peripherals Cables & Accessories
## 3 Computers & Accessories Accessories & Peripherals Cables & Accessories
## 4 Computers & Accessories Accessories & Peripherals Cables & Accessories
## 5 Computers & Accessories Accessories & Peripherals Cables & Accessories
## 6 Computers & Accessories Accessories & Peripherals Cables & Accessories
## Rating_Score
## 1 Above Average
## 2 Above Average
## 3 Average
## 4 Above Average
## 5 Above Average
## 6 Average
# Checking the data type of "Rating Score" column
column_type <- class(df1$Rating_Score)
column_type
## [1] "character"
I want to create a new column that tells us the difference in price between actual and discounted prices. Lets call it Price_Difference
df1$Price_difference <- df1$actual_price - df1$discounted_price
Finalizing the df1 dataframe
# Reorder columns by specifying the desired order
df1 <- df1[, c("product_id", "Category_1", "Category_2", "Category_3", "product_name", "discounted_price", "actual_price", "Price_difference", "discount_percentage", "rating", "Rating_Score", "rating_count")]
# Renaming all the columns
colnames(df1) <- c("Product_Id", "Category_1", "Category_2", "Category_3", "Product_Name",
"Discounted_Price", "Actual_Price", "Price_difference", "Discount_Percentage", "Rating",
"Rating_Score", "Rating_Count")
# Display the re-ordered columns
head(df1)
## Product_Id Category_1 Category_2
## 1 B07JW9H4J1 Computers & Accessories Accessories & Peripherals
## 2 B098NS6PVG Computers & Accessories Accessories & Peripherals
## 3 B096MSW6CT Computers & Accessories Accessories & Peripherals
## 4 B08HDJ86NZ Computers & Accessories Accessories & Peripherals
## 5 B08CF3B7N1 Computers & Accessories Accessories & Peripherals
## 6 B08Y1TFSP6 Computers & Accessories Accessories & Peripherals
## Category_3
## 1 Cables & Accessories
## 2 Cables & Accessories
## 3 Cables & Accessories
## 4 Cables & Accessories
## 5 Cables & Accessories
## 6 Cables & Accessories
## Product_Name
## 1 Wayona Nylon Braided USB to Lightning Fast Charging and Data Sync Cable Compatible for iPhone 13, 12,11, X, 8, 7, 6, 5, iPad Air, Pro, Mini (3 FT Pack of 1, Grey)
## 2 Ambrane Unbreakable 60W / 3A Fast Charging 1.5m Braided Type C Cable for Smartphones, Tablets, Laptops & other Type C devices, PD Technology, 480Mbps Data Sync, Quick Charge 3.0 (RCT15A, Black)
## 3 Sounce Fast Phone Charging Cable & Data Sync USB Cable Compatible for iPhone 13, 12,11, X, 8, 7, 6, 5, iPad Air, Pro, Mini & iOS Devices
## 4 boAt Deuce USB 300 2 in 1 Type-C & Micro USB Stress Resistant, Tangle-Free, Sturdy Cable with 3A Fast Charging & 480mbps Data Transmission, 10000+ Bends Lifespan and Extended 1.5m Length(Martian Red)
## 5 Portronics Konnect L 1.2M Fast Charging 3A 8 Pin USB Cable with Charge & Sync Function for iPhone, iPad (Grey)
## 6 pTron Solero TB301 3A Type-C Data and Fast Charging Cable, Made in India, 480Mbps Data Sync, Strong and Durable 1.5-Meter Nylon Braided USB Cable for Type-C Devices for Charging Adapter (Black)
## Discounted_Price Actual_Price Price_difference Discount_Percentage Rating
## 1 399 1099 700 0.64 4.2
## 2 199 349 150 0.43 4.0
## 3 199 1899 1700 0.90 3.9
## 4 329 699 370 0.53 4.2
## 5 154 399 245 0.61 4.2
## 6 149 1000 851 0.85 3.9
## Rating_Score Rating_Count
## 1 Above Average 24269
## 2 Above Average 43994
## 3 Average 7928
## 4 Above Average 94363
## 5 Above Average 16905
## 6 Average 24871
Category_1, Category_2, Category_3, Price_Difference, Rating_Score are the new variables creating using existing columns.
1. Box Plot representing the product distribution across ‘Category 1’
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Get top 5 main categories
most_main_items <- df1 %>%
count(Category_1) %>%
top_n(5, n) %>%
arrange(desc(n))
# Get top 10 sub categories
most_sub_items <- df1 %>%
count(Category_2) %>%
top_n(10, n) %>%
arrange(desc(n))
# Color palette
color_palette <- c("orange", "pink", "yellow", "skyblue", "brown", "green")
# Plotting
p1 <- ggplot(most_main_items, aes(x = reorder(Category_1, n), y = n, fill = Category_1)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_fill_manual(values = color_palette) +
labs(title = "Most Products by Main Category", x = "Count", y = "Product Main Category")
p1
2. Box Plot representing the product distribution across Category 2
# Get top 5 main categories
most_main_items <- df1 %>%
count(Category_1) %>%
top_n(5, n) %>%
arrange(desc(n))
# Get top 10 sub categories
most_sub_items <- df1 %>%
count(Category_2) %>%
top_n(10, n) %>%
arrange(desc(n))
# Color palette
color_palette <- c("blue", "green", "red", "purple", "orange", "pink", "yellow", "skyblue", "brown", "grey")
# Plotting
p2 <- ggplot(most_sub_items, aes(x = reorder(Category_2, n), y = n, fill = Category_2)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_fill_manual(values = color_palette) +
labs(title = "Most Products by Sub-Category", x = "Count", y = "Product Sub-Category")
p2
It is evident that the “Electronics” category contains the highest number of products, including Accessories & Peripherals and Kitchen & Home Appliances subcategories.
1. Distribution of Rattings
ggplot(df1, aes(x=Rating)) +
geom_histogram(aes(y=..density..), # Histogram with density on y-axis
binwidth=0.3, # Adjust binwidth as needed
fill="yellow",
color="black") +
geom_density(col="blue", lwd=2) + # Density plot
theme_minimal() +
labs(title="Histogram with Density Plot of Ratings", x="Ratings", y="Density")
## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
The histogram shows the distribution of product ratings. The ratings are fairly normally distributed, with a slight skew towards higher ratings. This indicates that a majority of products have ratings in the higher range.
2. Popular Products based on Rating Count
Let’s analyze the popularity of products based on the number of ratings received. We’ll identify and visualize the top products in terms of rating count, which can provide insights into the most engaged-with products by customers.
# Sorting the dataframe by Rating_Count
df_sorted_by_rating_count <- df1 %>%
arrange(desc(Rating_Count))
# Selecting the top products
top_10_products <- head(df_sorted_by_rating_count, 10)
# Define your custom labels here
custom_labels <- c("Prod A", "Prod B", "Prod C", "Prod D", "Prod E", "Prod F", "Prod G", "Prod H", "Prod I", "Prod J")
# Plotting the top 10 products based on Rating Count with custom labels
ggplot(top_10_products, aes(x = reorder(Product_Name, Rating_Count), y = Rating_Count)) +
geom_bar(stat = "identity", fill = "pink") +
coord_flip() + # Flip the coordinates to match the seaborn plot
scale_x_discrete(labels = custom_labels) +
labs(title = 'Top 10 Products by Rating Count',
x = 'Product Name',
y = 'Rating Count') +
theme_minimal() +
ylim(c(0, 100000))
## Warning: Removed 4 rows containing missing values (`geom_bar()`).
where, Prod A = boAt Deuce USB 300 2 in 1 Type-C & Micro USB
Stress Resistant, Sturdy Cable with 3A Fast Charging & 480mbps Data
Transmission, 10000+ Bends Lifespan and Extended 1.5m Length(Mercurial
Black)
Prod B = boAt Rugged v3 Extra Tough Unbreakable Braided
Micro USB Cable 1.5 Meter (Black)
Prod C = boAt Deuce USB 300 2 in
1 Type-C & Micro USB Stress Resistant, Tangle-Free, Sturdy Cable
with 3A Fast Charging & 480mbps Data Transmission, 10000+ Bends
Lifespan and Extended 1.5m Length(Martian Red)
Prod D = TP-Link USB
Bluetooth Adapter for PC, 5.0 Bluetooth Dongle Receiver (UB500) Supports
Windows 11/10/8.1/7 for Desktop, Laptop, Mouse, Keyboard, Printers,
Headsets, Speakers, PS4/ Xbox Controllers
Prod E = boAt Rockerz 400
Bluetooth On Ear Headphones With Mic With Upto 8 Hours Playback &
Soft Padded Ear Cushions(Grey/Green)
Prod F = Sennheiser CX 80S
in-Ear Wired Headphones with in-line One-Button Smart Remote with
Microphone Black
The bar chart above displays the top products by rating count. This visualization helps in identifying which products are most popular among customers in terms of engagement, as reflected by the number of ratings received.
let’s analyze pricing and discount data.Our analysis could
include:
1. The distribution of actual and discounted prices.
2.
The relationship between discount percentage and price.
3. How
ratings impact pricing.
1. The distribution of actual and discounted prices.
I will plot the distribution of actual and discounted prices with
histograms. This will give us an understanding of the range and common
price points for these products
# Creating separate dataframes for actual and discounted prices with the same column names
actual_prices <- df1 %>% select(Price = Actual_Price) %>% mutate(Price_Type = "Actual Price")
discounted_prices <- df1 %>% select(Price = Discounted_Price) %>% mutate(Price_Type = "Discounted Price")
# Combining the dataframes
combined_prices <- rbind(actual_prices, discounted_prices)
# Creating the histogram
ggplot(combined_prices, aes(x = Price, fill = Price_Type)) +
geom_histogram(aes(y = ..count..), position = "identity", binwidth = 650) + # Adjust binwidth as needed
scale_fill_manual(values = c("Actual Price" = "skyblue", "Discounted Price" = "pink")) +
labs(title = "Distribution of Actual and Discounted Prices",
x = "Price",
y = "Frequency") +
theme_minimal() +
theme(legend.position = "right") +
xlim(c(0, 10000))
## Warning: Removed 311 rows containing non-finite values (`stat_bin()`).
## Warning: Removed 4 rows containing missing values (`geom_bar()`).
Above is the Histogram that shows the distribution of both actual and discounted prices for the products. It indicates the typical price ranges and how they differ between the actual and discounted prices. The discounted prices are generally lower, as expected.
2. Co-relation between actual price and discounted price
library(ggplot2)
library(corrplot)
## corrplot 0.92 loaded
# Create the scatter plot of Actual Price vs Discounted Price with defined axis limits
ggplot(df1, aes(x = Actual_Price, y = Discounted_Price)) +
geom_point(color = "brown") +
labs(x = 'Actual Price (Rupee India)', y = 'Discounted Price (Rupee India)',
title = 'Correlation between Actual Price & Discounted Price') +
theme_minimal() +
theme(plot.title = element_text(face = "bold")) +
xlim(0, 100000) +
ylim(0, 100000)
## Warning: Removed 1 rows containing missing values (`geom_point()`).
The scatter plot illustrates the relationship between the actual price and discounted price. There’s a general trend that as the actual price increases, the discounted price also increases. This plot helps to visualize how discounts are applied across different price ranges.
3. Price and Discount Analysis
Next, I will
investigate the relationship between the discount percentage and the
actual price of the products, overlaying the ratings, by using a scatter
plot.
x_range <- c(0, 150000)
y_range <- c(0, max(df1$Discount_Percentage, na.rm = TRUE))
# Creating the scatter plot with adjusted x and y axis limits
ggplot(df1, aes(x = Actual_Price, y = Discount_Percentage, color = Rating)) +
geom_point(alpha = 0.4) +
scale_color_gradient(low = "blue", high = "red") +
labs(title = 'Discount Percentage vs Actual Price (colored by Rating)',
x = 'Actual Price',
y = 'Discount Percentage') +
theme_minimal() +
theme(legend.position = "right") +
xlim(x_range) +
ylim(y_range)
The scatter plot illustrates the relationship between the actual
price of products, their discount percentage, and customer ratings.
These are the observations:
# Set the size of the plotting area
par(mfrow=c(1,1))
par(mar=c(5,5,2,5))
selected_columns <- df1[, c("Rating", "Actual_Price", "Discounted_Price", "Discount_Percentage")]
# Calculating the correlation matrix
correlation_matrix <- cor(selected_columns, use="complete.obs")
# Plotting the correlation matrix
corrplot(correlation_matrix, method="color", type="upper", tl.col="black", tl.srt=45,
cl.pos="n", addCoef.col="black", number.cex=0.8, tl.cex=0.65)
title("Correlation Matrix for Rating, Price, and Discount")
The correlation matrix reveals a minimal impact of both actual and discounted prices on customer satisfaction, indicating that price is not a primary driver of happiness for customers. Despite a strong relationship between actual and discounted prices, suggesting proportional discounts, these factors don’t significantly influence how customers rate their satisfaction. The data hints at a deeper story where satisfaction is less about the discount’s depth and more about perceived quality and overall value. This suggests businesses should focus less on aggressive pricing strategies and more on the quality and service that underpin true customer satisfaction.