Category,Item Name,Item ID,Brand,Price,Variation ID,Variation Details Electronics,Smartphone,101,TechBrand,699.99,101-A,Color: Black, Storage: 64GB Electronics,Smartphone,101,TechBrand,699.99,101-B,Color: White, Storage: 128GB Electronics,Laptop,102,CompuBrand,1099.99,102-A,Color: Silver, Storage: 256GB Electronics,Laptop,102,CompuBrand,1099.99,102-B,Color: Space Gray, Storage: 512GB Home Appliances,Refrigerator,201,HomeCool,899.99,201-A,Color: Stainless Steel, Capacity: 20 cu ft Home Appliances,Refrigerator,201,HomeCool,899.99,201-B,Color: White, Capacity: 18 cu ft Home Appliances,Washing Machine,202,CleanTech,499.99,202-A,Type: Front Load, Capacity: 4.5 cu ft Home Appliances,Washing Machine,202,CleanTech,499.99,202-B,Type: Top Load, Capacity: 5.0 cu ft Clothing,T-Shirt,301,FashionCo,19.99,301-A,Color: Blue, Size: S Clothing,T-Shirt,301,FashionCo,19.99,301-B,Color: Red, Size: M Clothing,T-Shirt,301,FashionCo,19.99,301-C,Color: Green, Size: L Clothing,Jeans,302,DenimWorks,49.99,302-A,Color: Dark Blue, Size: 32 Clothing,Jeans,302,DenimWorks,49.99,302-B,Color: Light Blue, Size: 34 Books,Fiction Novel,401,-,14.99,401-A,Format: Hardcover, Language: English Books,Fiction Novel,401,-,14.99,401-B,Format: Paperback, Language: Spanish Books,Non-Fiction Guide,402,-,24.99,402-A,Format: eBook, Language: English Books,Non-Fiction Guide,402,-,24.99,402-B,Format: Paperback, Language: French Sports Equipment,Basketball,501,SportsGear,29.99,501-A,Size: Size 7, Color: Orange Sports Equipment,Tennis Racket,502,RacketPro,89.99,502-A,Material: Graphite, Color: Black Sports Equipment,Tennis Racket,502,RacketPro,89.99,502-B,Material: Aluminum, Color: Silver
#The inventory data was structured for inventory analysis, containing columns such as Category, Item Name, Item ID, Brand, Price, Variation ID, and Variation Details. # This dataset has been converted into different formats such as JSON, HTML, and XML. # These files were saved, exported, and printed in this RMarkdown file. # They can also be viewed and used through the following link: https://github.com/Jomifum/Assignment7D607
##Transforming the dataset in JSON format:
# Load necessary library
library(jsonlite)
# Create a data frame with the inventory data
inventory_data <- data.frame(
Category = c("Electronics", "Electronics", "Home Appliances", "Home Appliances", "Clothing", "Clothing", "Clothing", "Clothing", "Books", "Books", "Books", "Books", "Sports Equipment", "Sports Equipment"),
ItemName = c("Smartphone", "Smartphone", "Refrigerator", "Washing Machine", "T-Shirt", "T-Shirt", "T-Shirt", "Jeans", "Fiction Novel", "Fiction Novel", "Non-Fiction Guide", "Non-Fiction Guide", "Basketball", "Tennis Racket"),
ItemID = c(101, 101, 201, 202, 301, 301, 301, 302, 401, 401, 402, 402, 501, 502),
Brand = c("TechBrand", "TechBrand", "HomeCool", "CleanTech", "FashionCo", "FashionCo", "FashionCo", "DenimWorks", "-", "-", "-", "-", "SportsGear", "RacketPro"),
Price = c(699.99, 699.99, 899.99, 499.99, 19.99, 19.99, 19.99, 49.99, 14.99, 14.99, 24.99, 24.99, 29.99, 89.99),
VariationID = c("101-A", "101-B", "201-A", "202-A", "301-A", "301-B", "301-C", "302-A", "401-A", "401-B", "402-A", "402-B", "501-A", "502-A"),
VariationDetails = c("Color: Black, Storage: 64GB", "Color: White, Storage: 128GB", "Color: Stainless Steel, Capacity: 20 cu ft", "Type: Front Load, Capacity: 4.5 cu ft", "Color: Blue, Size: S", "Color: Red, Size: M", "Color: Green, Size: L", "Color: Dark Blue, Size: 32", "Format: Hardcover, Language: English", "Format: Paperback, Language: Spanish", "Format: eBook, Language: English", "Format: Paperback, Language: French", "Size: Size 7, Color: Orange", "Material: Graphite, Color: Black")
)
# Export the data to JSON format
write_json(inventory_data, "C:/Users/Dell/Downloads/jsonformatA7.js", pretty = TRUE)
# Print JSON for inspection
print(toJSON(inventory_data, pretty = TRUE))
## [
## {
## "Category": "Electronics",
## "ItemName": "Smartphone",
## "ItemID": 101,
## "Brand": "TechBrand",
## "Price": 699.99,
## "VariationID": "101-A",
## "VariationDetails": "Color: Black, Storage: 64GB"
## },
## {
## "Category": "Electronics",
## "ItemName": "Smartphone",
## "ItemID": 101,
## "Brand": "TechBrand",
## "Price": 699.99,
## "VariationID": "101-B",
## "VariationDetails": "Color: White, Storage: 128GB"
## },
## {
## "Category": "Home Appliances",
## "ItemName": "Refrigerator",
## "ItemID": 201,
## "Brand": "HomeCool",
## "Price": 899.99,
## "VariationID": "201-A",
## "VariationDetails": "Color: Stainless Steel, Capacity: 20 cu ft"
## },
## {
## "Category": "Home Appliances",
## "ItemName": "Washing Machine",
## "ItemID": 202,
## "Brand": "CleanTech",
## "Price": 499.99,
## "VariationID": "202-A",
## "VariationDetails": "Type: Front Load, Capacity: 4.5 cu ft"
## },
## {
## "Category": "Clothing",
## "ItemName": "T-Shirt",
## "ItemID": 301,
## "Brand": "FashionCo",
## "Price": 19.99,
## "VariationID": "301-A",
## "VariationDetails": "Color: Blue, Size: S"
## },
## {
## "Category": "Clothing",
## "ItemName": "T-Shirt",
## "ItemID": 301,
## "Brand": "FashionCo",
## "Price": 19.99,
## "VariationID": "301-B",
## "VariationDetails": "Color: Red, Size: M"
## },
## {
## "Category": "Clothing",
## "ItemName": "T-Shirt",
## "ItemID": 301,
## "Brand": "FashionCo",
## "Price": 19.99,
## "VariationID": "301-C",
## "VariationDetails": "Color: Green, Size: L"
## },
## {
## "Category": "Clothing",
## "ItemName": "Jeans",
## "ItemID": 302,
## "Brand": "DenimWorks",
## "Price": 49.99,
## "VariationID": "302-A",
## "VariationDetails": "Color: Dark Blue, Size: 32"
## },
## {
## "Category": "Books",
## "ItemName": "Fiction Novel",
## "ItemID": 401,
## "Brand": "-",
## "Price": 14.99,
## "VariationID": "401-A",
## "VariationDetails": "Format: Hardcover, Language: English"
## },
## {
## "Category": "Books",
## "ItemName": "Fiction Novel",
## "ItemID": 401,
## "Brand": "-",
## "Price": 14.99,
## "VariationID": "401-B",
## "VariationDetails": "Format: Paperback, Language: Spanish"
## },
## {
## "Category": "Books",
## "ItemName": "Non-Fiction Guide",
## "ItemID": 402,
## "Brand": "-",
## "Price": 24.99,
## "VariationID": "402-A",
## "VariationDetails": "Format: eBook, Language: English"
## },
## {
## "Category": "Books",
## "ItemName": "Non-Fiction Guide",
## "ItemID": 402,
## "Brand": "-",
## "Price": 24.99,
## "VariationID": "402-B",
## "VariationDetails": "Format: Paperback, Language: French"
## },
## {
## "Category": "Sports Equipment",
## "ItemName": "Basketball",
## "ItemID": 501,
## "Brand": "SportsGear",
## "Price": 29.99,
## "VariationID": "501-A",
## "VariationDetails": "Size: Size 7, Color: Orange"
## },
## {
## "Category": "Sports Equipment",
## "ItemName": "Tennis Racket",
## "ItemID": 502,
## "Brand": "RacketPro",
## "Price": 89.99,
## "VariationID": "502-A",
## "VariationDetails": "Material: Graphite, Color: Black"
## }
## ]
##Exporting the data from JSON file:
# Load necessary library
library(stringi)
# Read raw JSON data
raw_json <- readLines("C:/Users/Dell/Downloads/jsonformatA7.js", warn = FALSE)
# Remove non-UTF-8 characters
cleaned_json <- stri_replace_all_fixed(raw_json, "\u0097", "", vectorize = TRUE) # Removing specific invalid character
cleaned_json <- iconv(cleaned_json, from = "UTF-8", to = "UTF-8//IGNORE") # Ignore invalid UTF-8
# Convert cleaned text back to a single string
cleaned_json <- paste(cleaned_json, collapse = "\n")
# Print cleaned JSON for inspection
print(head(cleaned_json, 20))
## [1] "[\n {\n \"Category\": \"Electronics\",\n \"ItemName\": \"Smartphone\",\n \"ItemID\": 101,\n \"Brand\": \"TechBrand\",\n \"Price\": 699.99,\n \"VariationID\": \"101-A\",\n \"VariationDetails\": \"Color: Black, Storage: 64GB\"\n },\n {\n \"Category\": \"Electronics\",\n \"ItemName\": \"Smartphone\",\n \"ItemID\": 101,\n \"Brand\": \"TechBrand\",\n \"Price\": 699.99,\n \"VariationID\": \"101-B\",\n \"VariationDetails\": \"Color: White, Storage: 128GB\"\n },\n {\n \"Category\": \"Home Appliances\",\n \"ItemName\": \"Refrigerator\",\n \"ItemID\": 201,\n \"Brand\": \"HomeCool\",\n \"Price\": 899.99,\n \"VariationID\": \"201-A\",\n \"VariationDetails\": \"Color: Stainless Steel, Capacity: 20 cu ft\"\n },\n {\n \"Category\": \"Home Appliances\",\n \"ItemName\": \"Washing Machine\",\n \"ItemID\": 202,\n \"Brand\": \"CleanTech\",\n \"Price\": 499.99,\n \"VariationID\": \"202-A\",\n \"VariationDetails\": \"Type: Front Load, Capacity: 4.5 cu ft\"\n },\n {\n \"Category\": \"Clothing\",\n \"ItemName\": \"T-Shirt\",\n \"ItemID\": 301,\n \"Brand\": \"FashionCo\",\n \"Price\": 19.99,\n \"VariationID\": \"301-A\",\n \"VariationDetails\": \"Color: Blue, Size: S\"\n },\n {\n \"Category\": \"Clothing\",\n \"ItemName\": \"T-Shirt\",\n \"ItemID\": 301,\n \"Brand\": \"FashionCo\",\n \"Price\": 19.99,\n \"VariationID\": \"301-B\",\n \"VariationDetails\": \"Color: Red, Size: M\"\n },\n {\n \"Category\": \"Clothing\",\n \"ItemName\": \"T-Shirt\",\n \"ItemID\": 301,\n \"Brand\": \"FashionCo\",\n \"Price\": 19.99,\n \"VariationID\": \"301-C\",\n \"VariationDetails\": \"Color: Green, Size: L\"\n },\n {\n \"Category\": \"Clothing\",\n \"ItemName\": \"Jeans\",\n \"ItemID\": 302,\n \"Brand\": \"DenimWorks\",\n \"Price\": 49.99,\n \"VariationID\": \"302-A\",\n \"VariationDetails\": \"Color: Dark Blue, Size: 32\"\n },\n {\n \"Category\": \"Books\",\n \"ItemName\": \"Fiction Novel\",\n \"ItemID\": 401,\n \"Brand\": \"-\",\n \"Price\": 14.99,\n \"VariationID\": \"401-A\",\n \"VariationDetails\": \"Format: Hardcover, Language: English\"\n },\n {\n \"Category\": \"Books\",\n \"ItemName\": \"Fiction Novel\",\n \"ItemID\": 401,\n \"Brand\": \"-\",\n \"Price\": 14.99,\n \"VariationID\": \"401-B\",\n \"VariationDetails\": \"Format: Paperback, Language: Spanish\"\n },\n {\n \"Category\": \"Books\",\n \"ItemName\": \"Non-Fiction Guide\",\n \"ItemID\": 402,\n \"Brand\": \"-\",\n \"Price\": 24.99,\n \"VariationID\": \"402-A\",\n \"VariationDetails\": \"Format: eBook, Language: English\"\n },\n {\n \"Category\": \"Books\",\n \"ItemName\": \"Non-Fiction Guide\",\n \"ItemID\": 402,\n \"Brand\": \"-\",\n \"Price\": 24.99,\n \"VariationID\": \"402-B\",\n \"VariationDetails\": \"Format: Paperback, Language: French\"\n },\n {\n \"Category\": \"Sports Equipment\",\n \"ItemName\": \"Basketball\",\n \"ItemID\": 501,\n \"Brand\": \"SportsGear\",\n \"Price\": 29.99,\n \"VariationID\": \"501-A\",\n \"VariationDetails\": \"Size: Size 7, Color: Orange\"\n },\n {\n \"Category\": \"Sports Equipment\",\n \"ItemName\": \"Tennis Racket\",\n \"ItemID\": 502,\n \"Brand\": \"RacketPro\",\n \"Price\": 89.99,\n \"VariationID\": \"502-A\",\n \"VariationDetails\": \"Material: Graphite, Color: Black\"\n }\n]"
##Transforming the dataset in HTML format:
# Install library if necessary, put as command:
#install.packages("htmlTable")
# Load the necessary library
library(htmlTable)
# Create HTML table from the data frame
html_table <- htmlTable(inventory_data)
# Export HTML table to a file
html_output <- paste("<html><body>", html_table, "</body></html>", sep = "")
write(html_output, file = "C:/Users/Dell/Downloads/htmlformatA7")
# Print the first few lines of the HTML table for inspection
print(substr(html_output, 1, 500))
## [1] "<html><body><table class='gmisc_table' style='border-collapse: collapse; margin-top: 1em; margin-bottom: 1em;' >\n<thead>\n<tr><th style='border-bottom: 1px solid grey; border-top: 2px solid grey;'></th>\n<th style='font-weight: 900; border-bottom: 1px solid grey; border-top: 2px solid grey; text-align: center;'>Category</th>\n<th style='font-weight: 900; border-bottom: 1px solid grey; border-top: 2px solid grey; text-align: center;'>ItemName</th>\n<th style='font-weight: 900; border-bottom: 1px soli"
#Exporting the HTML format:
# Load the rvest package
library(rvest)
# Import HTML data
html_data <- read_html("C:/Users/Dell/Downloads/htmlformatA7.html")
# Extract tables (if applicable)
tables <- html_data %>% html_table()
# Check if there are any tables extracted
if (length(tables) > 0) {
# Print the first table
print(tables[[1]])
} else {
print("No tables found in the HTML file.")
}
## # A tibble: 20 × 7
## Category `Item Name` `Item ID` Brand Price `Variation ID`
## <chr> <chr> <int> <chr> <dbl> <chr>
## 1 Electronics Smartphone 101 TechBrand 700. 101-A
## 2 Electronics Smartphone 101 TechBrand 700. 101-B
## 3 Electronics Laptop 102 CompuBrand 1100. 102-A
## 4 Electronics Laptop 102 CompuBrand 1100. 102-B
## 5 Home Appliances Refrigerator 201 HomeCool 900. 201-A
## 6 Home Appliances Refrigerator 201 HomeCool 900. 201-B
## 7 Home Appliances Washing Machine 202 CleanTech 500. 202-A
## 8 Home Appliances Washing Machine 202 CleanTech 500. 202-B
## 9 Clothing T-Shirt 301 FashionCo 20.0 301-A
## 10 Clothing T-Shirt 301 FashionCo 20.0 301-B
## 11 Clothing T-Shirt 301 FashionCo 20.0 301-C
## 12 Clothing Jeans 302 DenimWorks 50.0 302-A
## 13 Clothing Jeans 302 DenimWorks 50.0 302-B
## 14 Books Fiction Novel 401 - 15.0 401-A
## 15 Books Fiction Novel 401 - 15.0 401-B
## 16 Books Non-Fiction Guide 402 - 25.0 402-A
## 17 Books Non-Fiction Guide 402 - 25.0 402-B
## 18 Sports Equipment Basketball 501 SportsGear 30.0 501-A
## 19 Sports Equipment Tennis Racket 502 RacketPro 90.0 502-A
## 20 Sports Equipment Tennis Racket 502 RacketPro 90.0 502-B
## # ℹ 1 more variable: `Variation Details` <chr>
#Transforming the dataset in XML format:
# Load necessary library
library(xml2)
# Create an XML document
inventory_xml <- xml_new_root("Inventory")
for (i in 1:nrow(inventory_data)) {
item_node <- xml_add_child(inventory_xml, "Item")
xml_add_child(item_node, "Category", inventory_data$Category[i])
xml_add_child(item_node, "ItemName", inventory_data$ItemName[i])
xml_add_child(item_node, "ItemID", as.character(inventory_data$ItemID[i]))
xml_add_child(item_node, "Brand", inventory_data$Brand[i])
xml_add_child(item_node, "Price", as.character(inventory_data$Price[i]))
variation_node <- xml_add_child(item_node, "Variation")
xml_add_child(variation_node, "VariationID", inventory_data$VariationID[i])
xml_add_child(variation_node, "VariationDetails", inventory_data$VariationDetails[i])
}
# Save the XML file
write_xml(inventory_xml, "C:/Users/Dell/Downloads/xmlformatA7.html")
# Print a portion of the XML for inspection
cat(as.character(inventory_xml)[1:500])
## <?xml version="1.0" encoding="UTF-8"?>
## <Inventory>
## <Item>
## <Category>Electronics</Category>
## <ItemName>Smartphone</ItemName>
## <ItemID>101</ItemID>
## <Brand>TechBrand</Brand>
## <Price>699.99</Price>
## <Variation>
## <VariationID>101-A</VariationID>
## <VariationDetails>Color: Black, Storage: 64GB</VariationDetails>
## </Variation>
## </Item>
## <Item>
## <Category>Electronics</Category>
## <ItemName>Smartphone</ItemName>
## <ItemID>101</ItemID>
## <Brand>TechBrand</Brand>
## <Price>699.99</Price>
## <Variation>
## <VariationID>101-B</VariationID>
## <VariationDetails>Color: White, Storage: 128GB</VariationDetails>
## </Variation>
## </Item>
## <Item>
## <Category>Home Appliances</Category>
## <ItemName>Refrigerator</ItemName>
## <ItemID>201</ItemID>
## <Brand>HomeCool</Brand>
## <Price>899.99</Price>
## <Variation>
## <VariationID>201-A</VariationID>
## <VariationDetails>Color: Stainless Steel, Capacity: 20 cu ft</VariationDetails>
## </Variation>
## </Item>
## <Item>
## <Category>Home Appliances</Category>
## <ItemName>Washing Machine</ItemName>
## <ItemID>202</ItemID>
## <Brand>CleanTech</Brand>
## <Price>499.99</Price>
## <Variation>
## <VariationID>202-A</VariationID>
## <VariationDetails>Type: Front Load, Capacity: 4.5 cu ft</VariationDetails>
## </Variation>
## </Item>
## <Item>
## <Category>Clothing</Category>
## <ItemName>T-Shirt</ItemName>
## <ItemID>301</ItemID>
## <Brand>FashionCo</Brand>
## <Price>19.99</Price>
## <Variation>
## <VariationID>301-A</VariationID>
## <VariationDetails>Color: Blue, Size: S</VariationDetails>
## </Variation>
## </Item>
## <Item>
## <Category>Clothing</Category>
## <ItemName>T-Shirt</ItemName>
## <ItemID>301</ItemID>
## <Brand>FashionCo</Brand>
## <Price>19.99</Price>
## <Variation>
## <VariationID>301-B</VariationID>
## <VariationDetails>Color: Red, Size: M</VariationDetails>
## </Variation>
## </Item>
## <Item>
## <Category>Clothing</Category>
## <ItemName>T-Shirt</ItemName>
## <ItemID>301</ItemID>
## <Brand>FashionCo</Brand>
## <Price>19.99</Price>
## <Variation>
## <VariationID>301-C</VariationID>
## <VariationDetails>Color: Green, Size: L</VariationDetails>
## </Variation>
## </Item>
## <Item>
## <Category>Clothing</Category>
## <ItemName>Jeans</ItemName>
## <ItemID>302</ItemID>
## <Brand>DenimWorks</Brand>
## <Price>49.99</Price>
## <Variation>
## <VariationID>302-A</VariationID>
## <VariationDetails>Color: Dark Blue, Size: 32</VariationDetails>
## </Variation>
## </Item>
## <Item>
## <Category>Books</Category>
## <ItemName>Fiction Novel</ItemName>
## <ItemID>401</ItemID>
## <Brand>-</Brand>
## <Price>14.99</Price>
## <Variation>
## <VariationID>401-A</VariationID>
## <VariationDetails>Format: Hardcover, Language: English</VariationDetails>
## </Variation>
## </Item>
## <Item>
## <Category>Books</Category>
## <ItemName>Fiction Novel</ItemName>
## <ItemID>401</ItemID>
## <Brand>-</Brand>
## <Price>14.99</Price>
## <Variation>
## <VariationID>401-B</VariationID>
## <VariationDetails>Format: Paperback, Language: Spanish</VariationDetails>
## </Variation>
## </Item>
## <Item>
## <Category>Books</Category>
## <ItemName>Non-Fiction Guide</ItemName>
## <ItemID>402</ItemID>
## <Brand>-</Brand>
## <Price>24.99</Price>
## <Variation>
## <VariationID>402-A</VariationID>
## <VariationDetails>Format: eBook, Language: English</VariationDetails>
## </Variation>
## </Item>
## <Item>
## <Category>Books</Category>
## <ItemName>Non-Fiction Guide</ItemName>
## <ItemID>402</ItemID>
## <Brand>-</Brand>
## <Price>24.99</Price>
## <Variation>
## <VariationID>402-B</VariationID>
## <VariationDetails>Format: Paperback, Language: French</VariationDetails>
## </Variation>
## </Item>
## <Item>
## <Category>Sports Equipment</Category>
## <ItemName>Basketball</ItemName>
## <ItemID>501</ItemID>
## <Brand>SportsGear</Brand>
## <Price>29.99</Price>
## <Variation>
## <VariationID>501-A</VariationID>
## <VariationDetails>Size: Size 7, Color: Orange</VariationDetails>
## </Variation>
## </Item>
## <Item>
## <Category>Sports Equipment</Category>
## <ItemName>Tennis Racket</ItemName>
## <ItemID>502</ItemID>
## <Brand>RacketPro</Brand>
## <Price>89.99</Price>
## <Variation>
## <VariationID>502-A</VariationID>
## <VariationDetails>Material: Graphite, Color: Black</VariationDetails>
## </Variation>
## </Item>
## </Inventory>
## NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
#Exporting the data from XML file:
# Load the xml2 and dplyr packages
library(xml2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(purrr)
##
## Attaching package: 'purrr'
## The following object is masked from 'package:jsonlite':
##
## flatten
# Import XML data
xml_data <- read_xml("C:/Users/Dell/Downloads/xmlformatA7.xml")
# Convert to a data frame (if needed)
xml_df <- xml_data %>% xml_find_all(".//Item") %>%
map_dfr(~{
# Extract values
category <- xml_text(xml_find_first(.x, "Category"))
item_name <- xml_text(xml_find_first(.x, "ItemName"))
item_id <- xml_text(xml_find_first(.x, "ItemID"))
brand <- xml_text(xml_find_first(.x, "Brand"))
price <- as.numeric(xml_text(xml_find_first(.x, "Price")))
# Extract variations
variations <- xml_find_all(.x, "Variation")
variation_ids <- map_chr(variations, ~ xml_text(xml_find_first(.x, "VariationID")))
variation_details <- map_chr(variations, ~ xml_text(xml_find_first(.x, "VariationDetails")))
# Create tibble with a list column for variations
tibble(
Category = category,
ItemName = item_name,
ItemID = item_id,
Brand = brand,
Price = price,
VariationID = list(variation_ids),
VariationDetails = list(variation_details)
)
})
# Print the XML data frame
print(xml_df)
## # A tibble: 10 × 7
## Category ItemName ItemID Brand Price VariationID VariationDetails
## <chr> <chr> <chr> <chr> <dbl> <list> <list>
## 1 Electronics Smartphone 101 Tech… 700. <chr [2]> <chr [2]>
## 2 Electronics Laptop 102 Comp… 1100. <chr [2]> <chr [2]>
## 3 Home Appliances Refrigerat… 201 Home… 900. <chr [2]> <chr [2]>
## 4 Home Appliances Washing Ma… 202 Clea… 500. <chr [2]> <chr [2]>
## 5 Clothing T-Shirt 301 Fash… 20.0 <chr [3]> <chr [3]>
## 6 Clothing Jeans 302 Deni… 50.0 <chr [2]> <chr [2]>
## 7 Books Fiction No… 401 - 15.0 <chr [2]> <chr [2]>
## 8 Books Non-Fictio… 402 - 25.0 <chr [2]> <chr [2]>
## 9 Sports Equipment Basketball 501 Spor… 30.0 <chr [1]> <chr [1]>
## 10 Sports Equipment Tennis Rac… 502 Rack… 90.0 <chr [2]> <chr [2]>
#Creating a parquet format
# Install the arrow package if it's not already installed
if (!requireNamespace("arrow", quietly = TRUE)) {
install.packages("arrow")
}
# Load necessary libraries
library(arrow)
##
## Attaching package: 'arrow'
## The following object is masked from 'package:utils':
##
## timestamp
# Create a data frame with the inventory data
inventory_data <- data.frame(
Category = c("Electronics", "Electronics", "Home Appliances", "Home Appliances", "Clothing", "Clothing", "Clothing", "Clothing", "Books", "Books", "Books", "Books", "Sports Equipment", "Sports Equipment"),
ItemName = c("Smartphone", "Smartphone", "Refrigerator", "Washing Machine", "T-Shirt", "T-Shirt", "T-Shirt", "Jeans", "Fiction Novel", "Fiction Novel", "Non-Fiction Guide", "Non-Fiction Guide", "Basketball", "Tennis Racket"),
ItemID = c(101, 101, 201, 202, 301, 301, 301, 302, 401, 401, 402, 402, 501, 502),
Brand = c("TechBrand", "TechBrand", "HomeCool", "CleanTech", "FashionCo", "FashionCo", "FashionCo", "DenimWorks", "-", "-", "-", "-", "SportsGear", "RacketPro"),
Price = c(699.99, 699.99, 899.99, 499.99, 19.99, 19.99, 19.99, 49.99, 14.99, 14.99, 24.99, 24.99, 29.99, 89.99),
VariationID = c("101-A", "101-B", "201-A", "202-A", "301-A", "301-B", "301-C", "302-A", "401-A", "401-B", "402-A", "402-B", "501-A", "502-A"),
VariationDetails = c("Color: Black, Storage: 64GB", "Color: White, Storage: 128GB", "Color: Stainless Steel, Capacity: 20 cu ft", "Type: Front Load, Capacity: 4.5 cu ft", "Color: Blue, Size: S", "Color: Red, Size: M", "Color: Green, Size: L", "Color: Dark Blue, Size: 32", "Format: Hardcover, Language: English", "Format: Paperback, Language: Spanish", "Format: eBook, Language: English", "Format: Paperback, Language: French", "Size: Size 7, Color: Orange", "Material: Graphite, Color: Black")
)
# Specify the path to save the Parquet file
parquet_file_path <- "C:/Users/Dell/Downloads/CUNYMartInventory.parquet"
# Write the data frame to a Parquet file
write_parquet(inventory_data, parquet_file_path)
# Read the Parquet file back to confirm it was saved correctly
parquet_data <- read_parquet(parquet_file_path)
# Print the Parquet data
print(parquet_data)
## # A tibble: 14 × 7
## Category ItemName ItemID Brand Price VariationID VariationDetails
## <chr> <chr> <dbl> <chr> <dbl> <chr> <chr>
## 1 Electronics Smartphone 101 Tech… 700. 101-A Color: Black, S…
## 2 Electronics Smartphone 101 Tech… 700. 101-B Color: White, S…
## 3 Home Appliances Refrigerator 201 Home… 900. 201-A Color: Stainles…
## 4 Home Appliances Washing Mac… 202 Clea… 500. 202-A Type: Front Loa…
## 5 Clothing T-Shirt 301 Fash… 20.0 301-A Color: Blue, Si…
## 6 Clothing T-Shirt 301 Fash… 20.0 301-B Color: Red, Siz…
## 7 Clothing T-Shirt 301 Fash… 20.0 301-C Color: Green, S…
## 8 Clothing Jeans 302 Deni… 50.0 302-A Color: Dark Blu…
## 9 Books Fiction Nov… 401 - 15.0 401-A Format: Hardcov…
## 10 Books Fiction Nov… 401 - 15.0 401-B Format: Paperba…
## 11 Books Non-Fiction… 402 - 25.0 402-A Format: eBook, …
## 12 Books Non-Fiction… 402 - 25.0 402-B Format: Paperba…
## 13 Sports Equipment Basketball 501 Spor… 30.0 501-A Size: Size 7, C…
## 14 Sports Equipment Tennis Rack… 502 Rack… 90.0 502-A Material: Graph…
JSON is human-readable, supports complex structures, and is widely used in web applications, though it has a larger file size and isn’t optimized for analytics. HTML is good for visual presentation and easily displayed in browsers but isn’t suitable for data analysis and can be verbose. XML supports hierarchical data and has a well-defined structure but is verbose and less human-readable. Parquet is efficient for large datasets, supports nested structures, and is optimized for analytics, but is not human-readable and requires specific libraries. If the goal is efficient data storage and analytics, Parquet is the best choice because its optimized performance for large datasets and complex structures outweighs its lack of human readability, making it ideal for inventory analysis and other data-driven tasks.
Note that the echo = FALSE
parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.