The scraped product is “Samsung DC-E Series Commercial LED Displays 32-Inch Screen LED-Lit Monitor (DC32E)” from Amazon India as it allows for price scraping.
#the used Reference is "An introduction to web scraping using R"
# Link "https://www.freecodecamp.org/news/an-introduction-to-web-scraping-using-r-40284110c848/"
Loading the necessary packages
library(xml2) #Used in web scraping
library(rvest) # "rvest is used in extracting the information we need from web pages."
library(stringr) #Used in data cleaning
Creating url variable + Read html content of the url
url = "https://www.amazon.in/Samsung-Commercial-Displays-32-Inch-DC32E/dp/B01BJOF46K/ref=sr_1_3?keywords=samsung+tv+screen&qid=1637409358&s=electronics&sr=1-3"
webpage<- read_html(url)
Scrap the product details from the Amazon
The scraped data is as follows,
1 - Title: The title of the product.
2 - Price: The price of the product.
3 - Description: The description of the product.
4 - Rating: The user rating of the product.”
#scrape title of the product
title_html <- html_nodes(webpage, "h1#title")
title <- html_text(title_html)
head(title)
## [1] "\n\n\n\n\n\n\n\n\nSamsung DC-E Series Commercial LED Displays 32-Inch Screen LED-Lit Monitor (DC32E)\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
#Removing /n and spaces,
title <- str_replace_all(title,"[\r\n]" , "")
title
## [1] "Samsung DC-E Series Commercial LED Displays 32-Inch Screen LED-Lit Monitor (DC32E)"
A. Price
#Price of the product
price_html <- html_nodes(webpage,"span#priceblock_ourprice")
price <- html_text(price_html)
price <- str_replace_all(price, "[\r\n]" , "")
price
## [1] "₹20,900.00"
B. Description
desc_html <- html_nodes(webpage, "div#productDescription")
desc <- html_text(desc_html)
desc <- str_replace_all(desc, "[\r\n\t]" , "")
desc <- str_trim(desc)
head(desc)
## [1] "Digital signage that offers 16/7 operation, 3 Years warranty with 2 HDMI 1.4 ports and components (CVBS common) to connect FHD set of boxes. Engage customers and vividly display business messaging with Samsung DCE Series SMART Signage, featuring high picture quality with 330 nit brightness, slim, sleek design for extended indoor use in retail stores, Quick Service Restaurants (QSRs), small businesses and corporate conference rooms and offices, DCE Series displays stunningly present a range of immersive and timely information with captivating picture quality. Easy and simple content management with embedded magicinfo Lite. Super clear coating, Temperature sensor,Pivot display,Button lock,Clock battery(168Hrs clock keeping),Built-in speaker, S/W,Auto source switching&recovery,RS232c,RJ45 MDC,Plug and play(DDC2B),Firmware update by network."
C. Rating
rate_html <- html_nodes(webpage, "span#acrPopover")
rate <- html_text(rate_html)
rate <- str_replace_all(rate, "[\r\n]" , "")
rate <- str_trim(rate)
rate
## [1] "4.5 out of 5 stars" "4.5 out of 5 stars"
Creating an R Data Frame
#Creating a dataframe
DataFrame<- data.frame("Title" = title, "Price" = price,"Description" = desc, "Rating" = rate)
DataFrame <- DataFrame[-2,] # Removing the dublicated row
DataFrame #Showing the data frame
## Title
## 1 Samsung DC-E Series Commercial LED Displays 32-Inch Screen LED-Lit Monitor (DC32E)
## Price
## 1 ₹20,900.00
## Description
## 1 Digital signage that offers 16/7 operation, 3 Years warranty with 2 HDMI 1.4 ports and components (CVBS common) to connect FHD set of boxes. Engage customers and vividly display business messaging with Samsung DCE Series SMART Signage, featuring high picture quality with 330 nit brightness, slim, sleek design for extended indoor use in retail stores, Quick Service Restaurants (QSRs), small businesses and corporate conference rooms and offices, DCE Series displays stunningly present a range of immersive and timely information with captivating picture quality. Easy and simple content management with embedded magicinfo Lite. Super clear coating, Temperature sensor,Pivot display,Button lock,Clock battery(168Hrs clock keeping),Built-in speaker, S/W,Auto source switching&recovery,RS232c,RJ45 MDC,Plug and play(DDC2B),Firmware update by network.
## Rating
## 1 4.5 out of 5 stars
Export to CSV file for further analysis
write.csv(DataFrame,"/Users/salahkaf/Documents/Amazon_product.csv", row.names = FALSE)