R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

** Data 607 Assignment: working with JSON, HTML, XML, and Parquet in R You have received the following data from CUNYMart, located at 123 Example Street, Anytown, USA. Category,Item Name,Item ID,Brand,Price,Variation ID,Variation Details Electronics,Smartphone,101,TechBrand,699.99,101-A,Color: Black, Storage: 64GB Electronics,Smartphone,101,TechBrand,699.99,101-B,Color: White, Storage: 128GB Electronics,Laptop,102,CompuBrand,1099.99,102-A,Color: Silver, Storage: 256GB Electronics,Laptop,102,CompuBrand,1099.99,102-B,Color: Space Gray, Storage: 512GB

Home Appliances,Refrigerator,201,HomeCool,899.99,201-A,Color: Stainless Steel, Capacity: 20 cu ft Home Appliances,Refrigerator,201,HomeCool,899.99,201-B,Color: White, Capacity: 18 cu ft Home Appliances,Washing Machine,202,CleanTech,499.99,202-A,Type: Front Load, Capacity: 4.5 cu ft Home Appliances,Washing Machine,202,CleanTech,499.99,202-B,Type: Top Load, Capacity: 5.0 cu ft

Clothing,T-Shirt,301,FashionCo,19.99,301-A,Color: Blue, Size: S Clothing,T-Shirt,301,FashionCo,19.99,301-B,Color: Red, Size: M Clothing,T-Shirt,301,FashionCo,19.99,301-C,Color: Green, Size: L Clothing,Jeans,302,DenimWorks,49.99,302-A,Color: Dark Blue, Size: 32 Clothing,Jeans,302,DenimWorks,49.99,302-B,Color: Light Blue, Size: 34 Books,Fiction Novel,401,-,14.99,401-A,Format: Hardcover, Language: English Books,Fiction Novel,401,-,14.99,401-B,Format: Paperback, Language: Spanish Books,Non-Fiction Guide,402,-,24.99,402-A,Format: eBook, Language: English Books,Non-Fiction Guide,402,-,24.99,402-B,Format: Paperback, Language: French Sports Equipment,Basketball,501,SportsGear,29.99,501-A,Size: Size 7, Color: Orange Sports Equipment,Tennis Racket,502,RacketPro,89.99,502-A,Material: Graphite, Color: Black Sports Equipment,Tennis Racket,502,RacketPro,89.99,502-B,Material: Aluminum, Color: Silver

This data will be used for inventory analysis at the retailer. You are required to prepare the data for analysis by formatting it in JSON, HTML, XML, and Parquet. Additionally, provide the pros and cons of each format. Your must include R code for generating and importing the data into R. **

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(XML)
library(rvest)
## 
## Attaching package: 'rvest'
## 
## The following object is masked from 'package:readr':
## 
##     guess_encoding
library(RCurl)
## 
## Attaching package: 'RCurl'
## 
## The following object is masked from 'package:tidyr':
## 
##     complete
library(jsonlite)
## 
## Attaching package: 'jsonlite'
## 
## The following object is masked from 'package:purrr':
## 
##     flatten
library(httr)
library(XML)
library(plyr)
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
## 
## Attaching package: 'plyr'
## 
## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## 
## The following object is masked from 'package:purrr':
## 
##     compact
library(dplyr)
library(xml2)
library(tidyverse)

#Create a html:

Category Item Name Item ID Brand Price Variation ID Color Variation Details
1 Electronics Smartphone 101 TechBrand 699.99 101-A Color: Black Storage: 64GB
2 Electronics Smartphone 101 TechBrand 699.99 101-B Color: White Storage: 128GB
3 Electronics Laptop 102 CompuBrand 1099.99 102-A Color: Silver Storage: 256GB
4 Electronics Laptop 102 CompuBrand 1099.99 102-B Color: Space Gray Storage: 512GB
5 Home Appliances Refrigerator 201 HomeCool 899.99 201-A Color: Stainless Steel Capacity: 20 cu ft
6 Home Appliances Refrigerator 201 HomeCool 899.99 201-B Color: White Capacity: 18 cu ft
7 Home Appliances Washing Machine 202 CleanTech 499.99 202-A Type: Front Load Capacity: 4.5 cu ft
8 Home Appliances Washing Machine 202 CleanTech 499.99 202-B Type: Top Load Capacity: 5.0 cu ft
9 Clothing T-Shirt 301 FashionCo 19.99 301-A Color: Blue Size: S
10 Clothing T-Shirt 301 FashionCo 19.99 301-B Color: Red Size: M
11 Clothing T-Shirt 301 FashionCo 19.99 301-C Color: Green Size: L
12 Clothing Jeans 302 DenimWorks 49.99 302-A Color: Dark Blue Size: 32
13 Clothing Jeans 302 DenimWorks 49.99 302-B Color: Light Blue Size: 34
14 Books Fiction Novel 401 - 14.99 401-A Format: Hardcover Language: English
15 Books Fiction Novel 401 - 14.99 401-B Format: Paperback Language: Spanish
16 Books Non-Fiction Guide 402 - 24.99 402-A Format: eBook Language: English
17 Books Non-Fiction Guide 402 - 24.99 402-B Format: Paperback Language: French
18 Sports Equipment Basketball 501 SportsGear 29.99 501-A Size: Size 7 Color: Orange
19 Sports Equipment Tennis Racket 502 RacketPro 89.99 502-A Material: Graphite Color: Black
20 Sports Equipment Tennis Racket 502 RacketPro 89.99 502-B Material: Aluminum Color: Silver
url <- getURL('https://raw.githubusercontent.com/asadny82/Data607/refs/heads/main/week7Assignment.html')
data_HTML <- url %>%
  read_html(encoding = 'UTF-8') %>%
  html_table(header = NA, trim = TRUE) %>%
  .[[1]]

data_HTML
## # A tibble: 20 × 9
##       `` Category        `Item Name` `Item ID` Brand  Price `Variation ID` Color
##    <int> <chr>           <chr>           <int> <chr>  <dbl> <chr>          <chr>
##  1     1 Electronics     Smartphone        101 Tech…  700.  101-A          Black
##  2     2 Electronics     Smartphone        101 Tech…  700.  101-B          White
##  3     3 Electronics     Laptop            102 Comp… 1100.  102-A          Silv…
##  4     4 Electronics     Laptop            102 Comp… 1100.  102-B          Spac…
##  5     5 Home Appliances Refrigerat…       201 Home…  900.  201-A          Stai…
##  6     6 Home Appliances Refrigerat…       201 Home…  900.  201-B          White
##  7     7 Home Appliances Washing Ma…       202 Clea…  500.  202-A          Fron…
##  8     8 Home Appliances Washing Ma…       202 Clea…  500.  202-B          Type…
##  9     9 Clothing        T-Shirt           301 Fash…   20.0 301-A          Blue 
## 10    10 Clothing        T-Shirt           301 Fash…   20.0 301-B          Red  
## 11    11 Clothing        T-Shirt           301 Fash…   20.0 301-C          Green
## 12    12 Clothing        Jeans             302 Deni…   50.0 302-A          Dark…
## 13    13 Clothing        Jeans             302 Deni…   50.0 302-B          Ligh…
## 14    14 Books           Fiction No…       401 -       15.0 401-A          Hard…
## 15    15 Books           Fiction No…       401 -       15.0 401-B          Pape…
## 16    16 Books           Non-Fictio…       402 -       25.0 402-A          eBook
## 17    17 Books           Non-Fictio…       402 -       25.0 402-B          Form…
## 18    18 Sports Equipme… Basketball        501 Spor…   30.0 501-A          Size…
## 19    19 Sports Equipme… Tennis Rac…       502 Rack…   90.0 502-A          Black
## 20    20 Sports Equipme… Tennis Rac…       502 Rack…   90.0 502-B          Silv…
## # ℹ 1 more variable: `Variation Details` <chr>
names(data_HTML)[1] <- 'x'
  names(data_HTML)[2] <- 'catagory'
names(data_HTML)[3] <-'itemName'
names(data_HTML)[4] <- 'itemID'
 names(data_HTML)[5] <- 'brand'
 names(data_HTML)[6] <- 'price'
 names(data_HTML)[7] <- 'variationID'
 names(data_HTML)[8] <- 'color'
  names(data_HTML)[9] <- 'variationDetails'
data_HTML
## # A tibble: 20 × 9
##        x catagory         itemName         itemID brand  price variationID color
##    <int> <chr>            <chr>             <int> <chr>  <dbl> <chr>       <chr>
##  1     1 Electronics      Smartphone          101 Tech…  700.  101-A       Black
##  2     2 Electronics      Smartphone          101 Tech…  700.  101-B       White
##  3     3 Electronics      Laptop              102 Comp… 1100.  102-A       Silv…
##  4     4 Electronics      Laptop              102 Comp… 1100.  102-B       Spac…
##  5     5 Home Appliances  Refrigerator        201 Home…  900.  201-A       Stai…
##  6     6 Home Appliances  Refrigerator        201 Home…  900.  201-B       White
##  7     7 Home Appliances  Washing Machine     202 Clea…  500.  202-A       Fron…
##  8     8 Home Appliances  Washing Machine     202 Clea…  500.  202-B       Type…
##  9     9 Clothing         T-Shirt             301 Fash…   20.0 301-A       Blue 
## 10    10 Clothing         T-Shirt             301 Fash…   20.0 301-B       Red  
## 11    11 Clothing         T-Shirt             301 Fash…   20.0 301-C       Green
## 12    12 Clothing         Jeans               302 Deni…   50.0 302-A       Dark…
## 13    13 Clothing         Jeans               302 Deni…   50.0 302-B       Ligh…
## 14    14 Books            Fiction Novel       401 -       15.0 401-A       Hard…
## 15    15 Books            Fiction Novel       401 -       15.0 401-B       Pape…
## 16    16 Books            Non-Fiction Gui…    402 -       25.0 402-A       eBook
## 17    17 Books            Non-Fiction Gui…    402 -       25.0 402-B       Form…
## 18    18 Sports Equipment Basketball          501 Spor…   30.0 501-A       Size…
## 19    19 Sports Equipment Tennis Racket       502 Rack…   90.0 502-A       Black
## 20    20 Sports Equipment Tennis Racket       502 Rack…   90.0 502-B       Silv…
## # ℹ 1 more variable: variationDetails <chr>
data_HTML <- na.omit(data_HTML) %>%  
mutate(color = na_if(color,'')) %>%
fill(color, .direction = 'down') %>%
mutate(variationDetails = str_replace(variationDetails, 'Size: S','Storage: 64GB'),
variationDetails = str_replace(variationDetails, 'Size: M','Storage: 64GB')) %>%
  
group_by(itemName) 
data
## function (..., list = character(), package = NULL, lib.loc = NULL, 
##     verbose = getOption("verbose"), envir = .GlobalEnv, overwrite = TRUE) 
## {
##     fileExt <- function(x) {
##         db <- grepl("\\.[^.]+\\.(gz|bz2|xz)$", x)
##         ans <- sub(".*\\.", "", x)
##         ans[db] <- sub(".*\\.([^.]+\\.)(gz|bz2|xz)$", "\\1\\2", 
##             x[db])
##         ans
##     }
##     my_read_table <- function(...) {
##         lcc <- Sys.getlocale("LC_COLLATE")
##         on.exit(Sys.setlocale("LC_COLLATE", lcc))
##         Sys.setlocale("LC_COLLATE", "C")
##         read.table(...)
##     }
##     stopifnot(is.character(list))
##     names <- c(as.character(substitute(list(...))[-1L]), list)
##     if (!is.null(package)) {
##         if (!is.character(package)) 
##             stop("'package' must be a character vector or NULL")
##     }
##     paths <- find.package(package, lib.loc, verbose = verbose)
##     if (is.null(lib.loc)) 
##         paths <- c(path.package(package, TRUE), if (!length(package)) getwd(), 
##             paths)
##     paths <- unique(normalizePath(paths[file.exists(paths)]))
##     paths <- paths[dir.exists(file.path(paths, "data"))]
##     dataExts <- tools:::.make_file_exts("data")
##     if (length(names) == 0L) {
##         db <- matrix(character(), nrow = 0L, ncol = 4L)
##         for (path in paths) {
##             entries <- NULL
##             packageName <- if (file_test("-f", file.path(path, 
##                 "DESCRIPTION"))) 
##                 basename(path)
##             else "."
##             if (file_test("-f", INDEX <- file.path(path, "Meta", 
##                 "data.rds"))) {
##                 entries <- readRDS(INDEX)
##             }
##             else {
##                 dataDir <- file.path(path, "data")
##                 entries <- tools::list_files_with_type(dataDir, 
##                   "data")
##                 if (length(entries)) {
##                   entries <- unique(tools::file_path_sans_ext(basename(entries)))
##                   entries <- cbind(entries, "")
##                 }
##             }
##             if (NROW(entries)) {
##                 if (is.matrix(entries) && ncol(entries) == 2L) 
##                   db <- rbind(db, cbind(packageName, dirname(path), 
##                     entries))
##                 else warning(gettextf("data index for package %s is invalid and will be ignored", 
##                   sQuote(packageName)), domain = NA, call. = FALSE)
##             }
##         }
##         colnames(db) <- c("Package", "LibPath", "Item", "Title")
##         footer <- if (missing(package)) 
##             paste0("Use ", sQuote(paste("data(package =", ".packages(all.available = TRUE))")), 
##                 "\n", "to list the data sets in all *available* packages.")
##         else NULL
##         y <- list(title = "Data sets", header = NULL, results = db, 
##             footer = footer)
##         class(y) <- "packageIQR"
##         return(y)
##     }
##     paths <- file.path(paths, "data")
##     for (name in names) {
##         found <- FALSE
##         for (p in paths) {
##             tmp_env <- if (overwrite) 
##                 envir
##             else new.env()
##             if (file_test("-f", file.path(p, "Rdata.rds"))) {
##                 rds <- readRDS(file.path(p, "Rdata.rds"))
##                 if (name %in% names(rds)) {
##                   found <- TRUE
##                   if (verbose) 
##                     message(sprintf("name=%s:\t found in Rdata.rds", 
##                       name), domain = NA)
##                   thispkg <- sub(".*/([^/]*)/data$", "\\1", p)
##                   thispkg <- sub("_.*$", "", thispkg)
##                   thispkg <- paste0("package:", thispkg)
##                   objs <- rds[[name]]
##                   lazyLoad(file.path(p, "Rdata"), envir = tmp_env, 
##                     filter = function(x) x %in% objs)
##                   break
##                 }
##                 else if (verbose) 
##                   message(sprintf("name=%s:\t NOT found in names() of Rdata.rds, i.e.,\n\t%s\n", 
##                     name, paste(names(rds), collapse = ",")), 
##                     domain = NA)
##             }
##             files <- list.files(p, full.names = TRUE)
##             files <- files[grep(name, files, fixed = TRUE)]
##             if (length(files) > 1L) {
##                 o <- match(fileExt(files), dataExts, nomatch = 100L)
##                 paths0 <- dirname(files)
##                 paths0 <- factor(paths0, levels = unique(paths0))
##                 files <- files[order(paths0, o)]
##             }
##             if (length(files)) {
##                 for (file in files) {
##                   if (verbose) 
##                     message("name=", name, ":\t file= ...", .Platform$file.sep, 
##                       basename(file), "::\t", appendLF = FALSE, 
##                       domain = NA)
##                   ext <- fileExt(file)
##                   if (basename(file) != paste0(name, ".", ext)) 
##                     found <- FALSE
##                   else {
##                     found <- TRUE
##                     switch(ext, R = , r = {
##                       library("utils")
##                       sys.source(file, chdir = TRUE, envir = tmp_env)
##                     }, RData = , rdata = , rda = load(file, envir = tmp_env), 
##                       TXT = , txt = , tab = , tab.gz = , tab.bz2 = , 
##                       tab.xz = , txt.gz = , txt.bz2 = , txt.xz = assign(name, 
##                         my_read_table(file, header = TRUE, as.is = FALSE), 
##                         envir = tmp_env), CSV = , csv = , csv.gz = , 
##                       csv.bz2 = , csv.xz = assign(name, my_read_table(file, 
##                         header = TRUE, sep = ";", as.is = FALSE), 
##                         envir = tmp_env), found <- FALSE)
##                   }
##                   if (found) 
##                     break
##                 }
##                 if (verbose) 
##                   message(if (!found) 
##                     "*NOT* ", "found", domain = NA)
##             }
##             if (found) 
##                 break
##         }
##         if (!found) {
##             warning(gettextf("data set %s not found", sQuote(name)), 
##                 domain = NA)
##         }
##         else if (!overwrite) {
##             for (o in ls(envir = tmp_env, all.names = TRUE)) {
##                 if (exists(o, envir = envir, inherits = FALSE)) 
##                   warning(gettextf("an object named %s already exists and will not be overwritten", 
##                     sQuote(o)))
##                 else assign(o, get(o, envir = tmp_env, inherits = FALSE), 
##                   envir = envir)
##             }
##             rm(tmp_env)
##         }
##     }
##     invisible(names)
## }
## <bytecode: 0x000001f2e8c02960>
## <environment: namespace:utils>

Json: [ { “Category”: ” Electronics”, “Item Name”: ” Smartphone”, “Item ID”: “101”, “Brand”: ” TechBrand”, “Price”: “699.99”, ” Variation ID”: ” 101-A”, ” Color”: ” Color: Black”, ” Variation Details”: ” Storage: 64GB” }, { “Category”: ” Electronics”, “Item Name”: “Smartphone”, “Item ID”: “101”, “Brand”: “TechBrand”, “Price”: “699.99”, ” Variation ID”: “101-B”, ” Color”: “Color: White”, ” Variation Details”: ” Storage: 128GB” }, { “Category”: ” Electronics”, “Item Name”: “Laptop”, “Item ID”: “102”, “Brand”: “CompuBrand”, “Price”: “1099.99”, ” Variation ID”: “102-A”, ” Color”: “Color: Silver”, ” Variation Details”: ” Storage: 256GB” }, { “Category”: ” Electronics”, “Item Name”: “Laptop”, “Item ID”: “102”, “Brand”: “CompuBrand”, “Price”: “1099.99”, ” Variation ID”: “102-B”, ” Color”: “Color: Space Gray”, ” Variation Details”: ” Storage: 512GB” }, { “Category”: ” Home Appliances”, “Item Name”: “Refrigerator”, “Item ID”: “201”, “Brand”: “HomeCool”, “Price”: “899.99”, ” Variation ID”: “201-A”, ” Color”: “Color: Stainless Steel”, ” Variation Details”: ” Capacity: 20 cu ft” }, { “Category”: ” Home Appliances”, “Item Name”: “Refrigerator”, “Item ID”: “201”, “Brand”: “HomeCool”, “Price”: “899.99”, ” Variation ID”: “201-B”, ” Color”: “Color: White”, ” Variation Details”: ” Capacity: 18 cu ft” }, { “Category”: ” Home Appliances”, “Item Name”: “Washing Machine”, “Item ID”: “202”, “Brand”: “CleanTech”, “Price”: “499.99”, ” Variation ID”: “202-A”, ” Color”: “Type: Front Load”, ” Variation Details”: ” Capacity: 4.5 cu ft” }, { “Category”: ” Home Appliances”, “Item Name”: “Washing Machine”, “Item ID”: “202”, “Brand”: “CleanTech”, “Price”: “499.99”, ” Variation ID”: “202-B”, ” Color”: “Type: Top Load”, ” Variation Details”: ” Capacity: 5.0 cu ft” }, { “Category”: ” Clothing”, “Item Name”: “T-Shirt”, “Item ID”: “301”, “Brand”: “FashionCo”, “Price”: “19.99”, ” Variation ID”: “301-A”, ” Color”: “Color: Blue”, ” Variation Details”: ” Size: S” }, { “Category”: ” Clothing”, “Item Name”: “T-Shirt”, “Item ID”: “301”, “Brand”: “FashionCo”, “Price”: “19.99”, ” Variation ID”: “301-B”, ” Color”: “Color: Red”, ” Variation Details”: ” Size: M” }, { “Category”: ” Clothing”, “Item Name”: “T-Shirt”, “Item ID”: “301”, “Brand”: “FashionCo”, “Price”: “19.99”, ” Variation ID”: “301-C”, ” Color”: “Color: Green”, ” Variation Details”: ” Size: L” }, { “Category”: ” Clothing”, “Item Name”: “Jeans”, “Item ID”: “302”, “Brand”: “DenimWorks”, “Price”: “49.99”, ” Variation ID”: “302-A”, ” Color”: “Color: Dark Blue”, ” Variation Details”: ” Size: 32” }, { “Category”: ” Clothing”, “Item Name”: “Jeans”, “Item ID”: “302”, “Brand”: “DenimWorks”, “Price”: “49.99”, ” Variation ID”: “302-B”, ” Color”: “Color: Light Blue”, ” Variation Details”: ” Size: 34” }, { “Category”: ” Books”, “Item Name”: “Fiction Novel”, “Item ID”: “401”, “Brand”: “-”, “Price”: “14.99”, ” Variation ID”: “401-A”, ” Color”: “Format: Hardcover”, ” Variation Details”: ” Language: English” }, { “Category”: ” Books”, “Item Name”: “Fiction Novel”, “Item ID”: “401”, “Brand”: “-”, “Price”: “14.99”, ” Variation ID”: “401-B”, ” Color”: “Format: Paperback”, ” Variation Details”: ” Language: Spanish” }, { “Category”: ” Books”, “Item Name”: “Non-Fiction Guide”, “Item ID”: “402”, “Brand”: “-”, “Price”: “24.99”, ” Variation ID”: “402-A”, ” Color”: “Format: eBook”, ” Variation Details”: ” Language: English” }, { “Category”: ” Books”, “Item Name”: “Non-Fiction Guide”, “Item ID”: “402”, “Brand”: “-”, “Price”: “24.99”, ” Variation ID”: “402-B”, ” Color”: “Format: Paperback”, ” Variation Details”: ” Language: French” }, { “Category”: ” Sports Equipment”, “Item Name”: “Basketball”, “Item ID”: “501”, “Brand”: “SportsGear”, “Price”: “29.99”, ” Variation ID”: “501-A”, ” Color”: “Size: Size 7”, ” Variation Details”: ” Color: Orange” }, { “Category”: ” Sports Equipment”, “Item Name”: “Tennis Racket”, “Item ID”: “502”, “Brand”: “RacketPro”, “Price”: “89.99”, ” Variation ID”: “502-A”, ” Color”: “Material: Graphite”, ” Variation Details”: ” Color: Black” }, { “Category”: ” Sports Equipment”, “Item Name”: “Tennis Racket”, “Item ID”: “502”, “Brand”: “RacketPro”, “Price”: “89.99”, ” Variation ID”: “502-B”, ” Color”: “Material: Aluminum”, ” Variation Details”: ” Color: Silver” }]

dataJson <- read_json("https://raw.githubusercontent.com/asadny82/Data607/refs/heads/main/week7Assignment.json")
 
dataJson
## [[1]]
## [[1]]$Category
## [1] " Electronics"
## 
## [[1]]$`Item Name`
## [1] " Smartphone"
## 
## [[1]]$`Item ID`
## [1] "101"
## 
## [[1]]$Brand
## [1] " TechBrand"
## 
## [[1]]$Price
## [1] "699.99"
## 
## [[1]]$` Variation ID`
## [1] " 101-A"
## 
## [[1]]$` Color`
## [1] "Black"
## 
## [[1]]$` Variation Details`
## [1] "  Storage: 64GB"
## 
## 
## [[2]]
## [[2]]$Category
## [1] " Electronics"
## 
## [[2]]$`Item Name`
## [1] "Smartphone"
## 
## [[2]]$`Item ID`
## [1] "101"
## 
## [[2]]$Brand
## [1] "TechBrand"
## 
## [[2]]$Price
## [1] "699.99"
## 
## [[2]]$` Variation ID`
## [1] "101-B"
## 
## [[2]]$` Color`
## [1] "White"
## 
## [[2]]$` Variation Details`
## [1] " Storage: 128GB"
## 
## 
## [[3]]
## [[3]]$Category
## [1] " Electronics"
## 
## [[3]]$`Item Name`
## [1] "Laptop"
## 
## [[3]]$`Item ID`
## [1] "102"
## 
## [[3]]$Brand
## [1] "CompuBrand"
## 
## [[3]]$Price
## [1] "1099.99"
## 
## [[3]]$` Variation ID`
## [1] "102-A"
## 
## [[3]]$` Color`
## [1] "Silver"
## 
## [[3]]$` Variation Details`
## [1] " Storage: 256GB"
## 
## 
## [[4]]
## [[4]]$Category
## [1] " Electronics"
## 
## [[4]]$`Item Name`
## [1] "Laptop"
## 
## [[4]]$`Item ID`
## [1] "102"
## 
## [[4]]$Brand
## [1] "CompuBrand"
## 
## [[4]]$Price
## [1] "1099.99"
## 
## [[4]]$` Variation ID`
## [1] "102-B"
## 
## [[4]]$` Color`
## [1] "Space Gray"
## 
## [[4]]$` Variation Details`
## [1] " Storage: 512GB"
## 
## 
## [[5]]
## [[5]]$Category
## [1] " Home Appliances"
## 
## [[5]]$`Item Name`
## [1] "Refrigerator"
## 
## [[5]]$`Item ID`
## [1] "201"
## 
## [[5]]$Brand
## [1] "HomeCool"
## 
## [[5]]$Price
## [1] "899.99"
## 
## [[5]]$` Variation ID`
## [1] "201-A"
## 
## [[5]]$` Color`
## [1] "Stainless Steel"
## 
## [[5]]$` Variation Details`
## [1] " Capacity: 20 cu ft"
## 
## 
## [[6]]
## [[6]]$Category
## [1] " Home Appliances"
## 
## [[6]]$`Item Name`
## [1] "Refrigerator"
## 
## [[6]]$`Item ID`
## [1] "201"
## 
## [[6]]$Brand
## [1] "HomeCool"
## 
## [[6]]$Price
## [1] "899.99"
## 
## [[6]]$` Variation ID`
## [1] "201-B"
## 
## [[6]]$` Color`
## [1] "White"
## 
## [[6]]$` Variation Details`
## [1] " Capacity: 18 cu ft"
## 
## 
## [[7]]
## [[7]]$Category
## [1] " Home Appliances"
## 
## [[7]]$`Item Name`
## [1] "Washing Machine"
## 
## [[7]]$`Item ID`
## [1] "202"
## 
## [[7]]$Brand
## [1] "CleanTech"
## 
## [[7]]$Price
## [1] "499.99"
## 
## [[7]]$` Variation ID`
## [1] "202-A"
## 
## [[7]]$` Color`
## [1] "Front Load"
## 
## [[7]]$` Variation Details`
## [1] " Capacity: 4.5 cu ft"
## 
## 
## [[8]]
## [[8]]$Category
## [1] " Home Appliances"
## 
## [[8]]$`Item Name`
## [1] "Washing Machine"
## 
## [[8]]$`Item ID`
## [1] "202"
## 
## [[8]]$Brand
## [1] "CleanTech"
## 
## [[8]]$Price
## [1] "499.99"
## 
## [[8]]$` Variation ID`
## [1] "202-B"
## 
## [[8]]$` Color`
## [1] "Top Load"
## 
## [[8]]$` Variation Details`
## [1] " Capacity: 5.0 cu ft"
## 
## 
## [[9]]
## [[9]]$Category
## [1] " Clothing"
## 
## [[9]]$`Item Name`
## [1] "T-Shirt"
## 
## [[9]]$`Item ID`
## [1] "301"
## 
## [[9]]$Brand
## [1] "FashionCo"
## 
## [[9]]$Price
## [1] "19.99"
## 
## [[9]]$` Variation ID`
## [1] "301-A"
## 
## [[9]]$` Color`
## [1] "Blue"
## 
## [[9]]$` Variation Details`
## [1] " Size: S"
## 
## 
## [[10]]
## [[10]]$Category
## [1] " Clothing"
## 
## [[10]]$`Item Name`
## [1] "T-Shirt"
## 
## [[10]]$`Item ID`
## [1] "301"
## 
## [[10]]$Brand
## [1] "FashionCo"
## 
## [[10]]$Price
## [1] "19.99"
## 
## [[10]]$` Variation ID`
## [1] "301-B"
## 
## [[10]]$` Color`
## [1] "Red"
## 
## [[10]]$` Variation Details`
## [1] " Size: M"
## 
## 
## [[11]]
## [[11]]$Category
## [1] " Clothing"
## 
## [[11]]$`Item Name`
## [1] "T-Shirt"
## 
## [[11]]$`Item ID`
## [1] "301"
## 
## [[11]]$Brand
## [1] "FashionCo"
## 
## [[11]]$Price
## [1] "19.99"
## 
## [[11]]$` Variation ID`
## [1] "301-C"
## 
## [[11]]$` Color`
## [1] "Green"
## 
## [[11]]$` Variation Details`
## [1] " Size: L"
## 
## 
## [[12]]
## [[12]]$Category
## [1] " Clothing"
## 
## [[12]]$`Item Name`
## [1] "Jeans"
## 
## [[12]]$`Item ID`
## [1] "302"
## 
## [[12]]$Brand
## [1] "DenimWorks"
## 
## [[12]]$Price
## [1] "49.99"
## 
## [[12]]$` Variation ID`
## [1] "302-A"
## 
## [[12]]$` Color`
## [1] "Dark Blue"
## 
## [[12]]$` Variation Details`
## [1] " Size: 32"
## 
## 
## [[13]]
## [[13]]$Category
## [1] " Clothing"
## 
## [[13]]$`Item Name`
## [1] "Jeans"
## 
## [[13]]$`Item ID`
## [1] "302"
## 
## [[13]]$Brand
## [1] "DenimWorks"
## 
## [[13]]$Price
## [1] "49.99"
## 
## [[13]]$` Variation ID`
## [1] "302-B"
## 
## [[13]]$` Color`
## [1] "Light Blue"
## 
## [[13]]$` Variation Details`
## [1] " Size: 34"
## 
## 
## [[14]]
## [[14]]$Category
## [1] " Books"
## 
## [[14]]$`Item Name`
## [1] "Fiction Novel"
## 
## [[14]]$`Item ID`
## [1] "401"
## 
## [[14]]$Brand
## [1] "-"
## 
## [[14]]$Price
## [1] "14.99"
## 
## [[14]]$` Variation ID`
## [1] "401-A"
## 
## [[14]]$` Color`
## [1] "Hardcover"
## 
## [[14]]$` Variation Details`
## [1] " Language: English"
## 
## 
## [[15]]
## [[15]]$Category
## [1] " Books"
## 
## [[15]]$`Item Name`
## [1] "Fiction Novel"
## 
## [[15]]$`Item ID`
## [1] "401"
## 
## [[15]]$Brand
## [1] "-"
## 
## [[15]]$Price
## [1] "14.99"
## 
## [[15]]$` Variation ID`
## [1] "401-B"
## 
## [[15]]$` Color`
## [1] "Paperback"
## 
## [[15]]$` Variation Details`
## [1] " Language: Spanish"
## 
## 
## [[16]]
## [[16]]$Category
## [1] " Books"
## 
## [[16]]$`Item Name`
## [1] "Non-Fiction Guide"
## 
## [[16]]$`Item ID`
## [1] "402"
## 
## [[16]]$Brand
## [1] "-"
## 
## [[16]]$Price
## [1] "24.99"
## 
## [[16]]$` Variation ID`
## [1] "402-A"
## 
## [[16]]$` Color`
## [1] "eBook"
## 
## [[16]]$` Variation Details`
## [1] " Language: English"
## 
## 
## [[17]]
## [[17]]$Category
## [1] " Books"
## 
## [[17]]$`Item Name`
## [1] "Non-Fiction Guide"
## 
## [[17]]$`Item ID`
## [1] "402"
## 
## [[17]]$Brand
## [1] "-"
## 
## [[17]]$Price
## [1] "24.99"
## 
## [[17]]$` Variation ID`
## [1] "402-B"
## 
## [[17]]$` Color`
## [1] "Paperback"
## 
## [[17]]$` Variation Details`
## [1] " Language: French"
## 
## 
## [[18]]
## [[18]]$Category
## [1] " Sports Equipment"
## 
## [[18]]$`Item Name`
## [1] "Basketball"
## 
## [[18]]$`Item ID`
## [1] "501"
## 
## [[18]]$Brand
## [1] "SportsGear"
## 
## [[18]]$Price
## [1] "29.99"
## 
## [[18]]$` Variation ID`
## [1] "501-A"
## 
## [[18]]$` Color`
## [1] "Orange"
## 
## [[18]]$` Variation Details`
## [1] "Size: Size 7"
## 
## 
## [[19]]
## [[19]]$Category
## [1] " Sports Equipment"
## 
## [[19]]$`Item Name`
## [1] "Tennis Racket"
## 
## [[19]]$`Item ID`
## [1] "502"
## 
## [[19]]$Brand
## [1] "RacketPro"
## 
## [[19]]$Price
## [1] "89.99"
## 
## [[19]]$` Variation ID`
## [1] "502-A"
## 
## [[19]]$` Color`
## [1] "Black"
## 
## [[19]]$` Variation Details`
## [1] "Material: Graphite"
## 
## 
## [[20]]
## [[20]]$Category
## [1] " Sports Equipment"
## 
## [[20]]$`Item Name`
## [1] "Tennis Racket"
## 
## [[20]]$`Item ID`
## [1] "502"
## 
## [[20]]$Brand
## [1] "RacketPro"
## 
## [[20]]$Price
## [1] "89.99"
## 
## [[20]]$` Variation ID`
## [1] "502-B"
## 
## [[20]]$` Color`
## [1] "Silver "
## 
## [[20]]$` Variation Details`
## [1] "Material: Aluminum"
json_dirty <- sapply(dataJson, `[`)
knitr::kable(json_dirty)
Category Electronics Electronics Electronics Electronics Home Appliances Home Appliances Home Appliances Home Appliances Clothing Clothing Clothing Clothing Clothing Books Books Books Books Sports Equipment Sports Equipment Sports Equipment
Item Name Smartphone Smartphone Laptop Laptop Refrigerator Refrigerator Washing Machine Washing Machine T-Shirt T-Shirt T-Shirt Jeans Jeans Fiction Novel Fiction Novel Non-Fiction Guide Non-Fiction Guide Basketball Tennis Racket Tennis Racket
Item ID 101 101 102 102 201 201 202 202 301 301 301 302 302 401 401 402 402 501 502 502
Brand TechBrand TechBrand CompuBrand CompuBrand HomeCool HomeCool CleanTech CleanTech FashionCo FashionCo FashionCo DenimWorks DenimWorks - - - - SportsGear RacketPro RacketPro
Price 699.99 699.99 1099.99 1099.99 899.99 899.99 499.99 499.99 19.99 19.99 19.99 49.99 49.99 14.99 14.99 24.99 24.99 29.99 89.99 89.99
Variation ID 101-A 101-B 102-A 102-B 201-A 201-B 202-A 202-B 301-A 301-B 301-C 302-A 302-B 401-A 401-B 402-A 402-B 501-A 502-A 502-B
Color Black White Silver Space Gray Stainless Steel White Front Load Top Load Blue Red Green Dark Blue Light Blue Hardcover Paperback eBook Paperback Orange Black Silver
Variation Details Storage: 64GB Storage: 128GB Storage: 256GB Storage: 512GB Capacity: 20 cu ft Capacity: 18 cu ft Capacity: 4.5 cu ft Capacity: 5.0 cu ft Size: S Size: M Size: L Size: 32 Size: 34 Language: English Language: Spanish Language: English Language: French Size: Size 7 Material: Graphite Material: Aluminum
url <- getURL("https://raw.githubusercontent.com/asadny82/Data607/refs/heads/main/week7Assignment.json")
dataJson <- url %>%
fromJSON() %>%
as.data.frame()  
dataJson
##             Category         Item Name Item ID      Brand   Price  Variation ID
## 1        Electronics        Smartphone     101  TechBrand  699.99         101-A
## 2        Electronics        Smartphone     101  TechBrand  699.99         101-B
## 3        Electronics            Laptop     102 CompuBrand 1099.99         102-A
## 4        Electronics            Laptop     102 CompuBrand 1099.99         102-B
## 5    Home Appliances      Refrigerator     201   HomeCool  899.99         201-A
## 6    Home Appliances      Refrigerator     201   HomeCool  899.99         201-B
## 7    Home Appliances   Washing Machine     202  CleanTech  499.99         202-A
## 8    Home Appliances   Washing Machine     202  CleanTech  499.99         202-B
## 9           Clothing           T-Shirt     301  FashionCo   19.99         301-A
## 10          Clothing           T-Shirt     301  FashionCo   19.99         301-B
## 11          Clothing           T-Shirt     301  FashionCo   19.99         301-C
## 12          Clothing             Jeans     302 DenimWorks   49.99         302-A
## 13          Clothing             Jeans     302 DenimWorks   49.99         302-B
## 14             Books     Fiction Novel     401          -   14.99         401-A
## 15             Books     Fiction Novel     401          -   14.99         401-B
## 16             Books Non-Fiction Guide     402          -   24.99         402-A
## 17             Books Non-Fiction Guide     402          -   24.99         402-B
## 18  Sports Equipment        Basketball     501 SportsGear   29.99         501-A
## 19  Sports Equipment     Tennis Racket     502  RacketPro   89.99         502-A
## 20  Sports Equipment     Tennis Racket     502  RacketPro   89.99         502-B
##              Color    Variation Details
## 1            Black        Storage: 64GB
## 2            White       Storage: 128GB
## 3           Silver       Storage: 256GB
## 4       Space Gray       Storage: 512GB
## 5  Stainless Steel   Capacity: 20 cu ft
## 6            White   Capacity: 18 cu ft
## 7       Front Load  Capacity: 4.5 cu ft
## 8         Top Load  Capacity: 5.0 cu ft
## 9             Blue              Size: S
## 10             Red              Size: M
## 11           Green              Size: L
## 12       Dark Blue             Size: 32
## 13      Light Blue             Size: 34
## 14       Hardcover    Language: English
## 15       Paperback    Language: Spanish
## 16           eBook    Language: English
## 17       Paperback     Language: French
## 18          Orange         Size: Size 7
## 19           Black   Material: Graphite
## 20         Silver    Material: Aluminum
str(dataJson)
## 'data.frame':    20 obs. of  8 variables:
##  $ Category          : chr  " Electronics" " Electronics" " Electronics" " Electronics" ...
##  $ Item Name         : chr  " Smartphone" "Smartphone" "Laptop" "Laptop" ...
##  $ Item ID           : chr  "101" "101" "102" "102" ...
##  $ Brand             : chr  " TechBrand" "TechBrand" "CompuBrand" "CompuBrand" ...
##  $ Price             : chr  "699.99" "699.99" "1099.99" "1099.99" ...
##  $  Variation ID     : chr  " 101-A" "101-B" "102-A" "102-B" ...
##  $  Color            : chr  "Black" "White" "Silver" "Space Gray" ...
##  $  Variation Details: chr  "  Storage: 64GB" " Storage: 128GB" " Storage: 256GB" " Storage: 512GB" ...

XML:

Electronics Smartphone 101 TechBrand 699.99 101-A Black Storage: 64GB Electronics Smartphone 101 TechBrand 699.99 101-B White Storage: 128GB Electronics Laptop 102 CompuBrand 1099.99 102-A Silver Storage: 256GB Electronics Laptop 102 CompuBrand 1099.99 102-B Space Gray Storage: 512GB Home Appliances Refrigerator 201 HomeCool 899.99 201-A Stainless Steel Capacity: 20 cu ft Home Appliances Refrigerator 201 HomeCool 899.99 201-B Color: White Capacity: 18 cu ft Home Appliances Washing Machine 202 CleanTech 499.99 202-A Type: Front Load Capacity: 4.5 cu ft Home Appliances Washing Machine 202 CleanTech 499.99 202-B Type: Top Load Capacity: 5.0 cu ft Clothing T-Shirt 301 FashionCo 19.99 301-A Color: Blue Size: S Clothing T-Shirt 301 FashionCo 19.99 301-B Color: Red Size: M Clothing T-Shirt 301 FashionCo 19.99 301-C Color: Green Size: L Clothing Jeans 302 DenimWorks 49.99 302-A Color: Dark Blue Size: 32 Clothing Jeans 302 DenimWorks 49.99 302-B Color: Light Blue Size: 34 Books Fiction Novel 401 - 14.99 401-A Format: Hardcover Language: English Books Fiction Novel 401 - 14.99 401-B Format: Paperback Language: Spanish Books Non-Fiction Guide 402 - 24.99 402-A Format: eBook Language: English Books Non-Fiction Guide 402 - 24.99 402-B Format: Paperback Language: French Sports Equipment Basketball 501 SportsGear 29.99 501-A Size: Size 7 Color: Orange Sports Equipment Tennis Racket 502 RacketPro 89.99 502-A Material: Graphite Color: Black Sports Equipment Tennis Racket 502 RacketPro 89.99 502-B Material: Aluminum Color: Silver

XmlUrl <- getURL('https://raw.githubusercontent.com/asadny82/Data607/refs/heads/main/week7.xml')
data_XML <- XmlUrl %>%
  xmlParse() %>%
  xmlRoot()
data_XML
## <CUNYMart>
##   <Category id="1">
##     <Category> Electronics</Category>
##     <ItemName> Smartphone</ItemName>
##     <ItemID>101</ItemID>
##     <Brand> TechBrand</Brand>
##     <Price>699.99</Price>
##     <VariationID> 101-A</VariationID>
##     <Color>Black</Color>
##     <VariationDetails>  Storage: 64GB</VariationDetails>
##   </Category>
##   <Category id="2">
##     <Category> Electronics</Category>
##     <ItemName>Smartphone</ItemName>
##     <ItemID>101</ItemID>
##     <Brand>TechBrand</Brand>
##     <Price>699.99</Price>
##     <VariationID>101-B</VariationID>
##     <Color> White</Color>
##     <VariationDetails> Storage: 128GB</VariationDetails>
##   </Category>
##   <Category id="3">
##     <Category> Electronics</Category>
##     <ItemName>Laptop</ItemName>
##     <ItemID>102</ItemID>
##     <Brand>CompuBrand</Brand>
##     <Price>1099.99</Price>
##     <VariationID>102-A</VariationID>
##     <Color>Silver</Color>
##     <VariationDetails> Storage: 256GB</VariationDetails>
##   </Category>
##   <Category id="4">
##     <Category> Electronics</Category>
##     <ItemName>Laptop</ItemName>
##     <ItemID>102</ItemID>
##     <Brand>CompuBrand</Brand>
##     <Price>1099.99</Price>
##     <VariationID>102-B</VariationID>
##     <Color>Space Gray</Color>
##     <VariationDetails> Storage: 512GB</VariationDetails>
##   </Category>
##   <Category id="5">
##     <Category> Home Appliances</Category>
##     <ItemName>Refrigerator</ItemName>
##     <ItemID>201</ItemID>
##     <Brand>HomeCool</Brand>
##     <Price>899.99</Price>
##     <VariationID>201-A</VariationID>
##     <Color/>
##     <VariationDetails> Capacity: 20 cu ft</VariationDetails>
##   </Category>
##   <Category id="6">
##     <Category> Home Appliances</Category>
##     <ItemName>Refrigerator</ItemName>
##     <ItemID>201</ItemID>
##     <Brand>HomeCool</Brand>
##     <Price>899.99</Price>
##     <VariationID>201-B</VariationID>
##     <Color>White</Color>
##     <VariationDetails> Capacity: 18 cu ft</VariationDetails>
##   </Category>
##   <Category id="7">
##     <Category> Home Appliances</Category>
##     <ItemName>Washing Machine</ItemName>
##     <ItemID>202</ItemID>
##     <Brand>CleanTech</Brand>
##     <Price>499.99</Price>
##     <VariationID>202-A</VariationID>
##     <Color/>
##     <VariationDetails> Capacity: 4.5 cu ft</VariationDetails>
##   </Category>
##   <Category id="8">
##     <Category> Home Appliances</Category>
##     <ItemName>Washing Machine</ItemName>
##     <ItemID>202</ItemID>
##     <Brand>CleanTech</Brand>
##     <Price>499.99</Price>
##     <VariationID>202-B</VariationID>
##     <Color/>
##     <VariationDetails> Capacity: 5.0 cu ft</VariationDetails>
##   </Category>
##   <Category id="9">
##     <Category> Clothing</Category>
##     <ItemName>T-Shirt</ItemName>
##     <ItemID>301</ItemID>
##     <Brand>FashionCo</Brand>
##     <Price>19.99</Price>
##     <VariationID>301-A</VariationID>
##     <Color>Blue</Color>
##     <VariationDetails> Size: S</VariationDetails>
##   </Category>
##   <Category id="10">
##     <Category> Clothing</Category>
##     <ItemName>T-Shirt</ItemName>
##     <ItemID>301</ItemID>
##     <Brand>FashionCo</Brand>
##     <Price>19.99</Price>
##     <VariationID>301-B</VariationID>
##     <Color>Red</Color>
##     <VariationDetails> Size: M</VariationDetails>
##   </Category>
##   <Category id="11">
##     <Category> Clothing</Category>
##     <ItemName>T-Shirt</ItemName>
##     <ItemID>301</ItemID>
##     <Brand>FashionCo</Brand>
##     <Price>19.99</Price>
##     <VariationID>301-C</VariationID>
##     <Color>Green</Color>
##     <VariationDetails> Size: L</VariationDetails>
##   </Category>
##   <Category id="12">
##     <Category> Clothing</Category>
##     <ItemName>Jeans</ItemName>
##     <ItemID>302</ItemID>
##     <Brand>DenimWorks</Brand>
##     <Price>49.99</Price>
##     <VariationID>302-A</VariationID>
##     <Color>Dark Blue</Color>
##     <VariationDetails> Size: 32</VariationDetails>
##   </Category>
##   <Category id="13">
##     <Category> Clothing</Category>
##     <ItemName>Jeans</ItemName>
##     <ItemID>302</ItemID>
##     <Brand>DenimWorks</Brand>
##     <Price>49.99</Price>
##     <VariationID>302-B</VariationID>
##     <Color>Light Blue</Color>
##     <VariationDetails> Size: 34</VariationDetails>
##   </Category>
##   <Category id="14">
##     <Category> Books</Category>
##     <ItemName>Fiction Novel</ItemName>
##     <ItemID>401</ItemID>
##     <Brand>-</Brand>
##     <Price>14.99</Price>
##     <VariationID>401-A</VariationID>
##     <Color/>
##     <VariationDetails> Language: English</VariationDetails>
##   </Category>
##   <Category id="15">
##     <Category> Books</Category>
##     <ItemName>Fiction Novel</ItemName>
##     <ItemID>401</ItemID>
##     <Brand>-</Brand>
##     <Price>14.99</Price>
##     <VariationID>401-B</VariationID>
##     <Color/>
##     <VariationDetails> Language: Spanish</VariationDetails>
##   </Category>
##   <Category id="16">
##     <Category> Books</Category>
##     <ItemName>Non-Fiction Guide</ItemName>
##     <ItemID>402</ItemID>
##     <Brand>-</Brand>
##     <Price>24.99</Price>
##     <VariationID>402-A</VariationID>
##     <Color/>
##     <VariationDetails> Language: English</VariationDetails>
##   </Category>
##   <Category id="17">
##     <Category> Books</Category>
##     <ItemName>Non-Fiction Guide</ItemName>
##     <ItemID>402</ItemID>
##     <Brand>-</Brand>
##     <Price>24.99</Price>
##     <VariationID>402-B</VariationID>
##     <Color/>
##     <VariationDetails> Language: French</VariationDetails>
##   </Category>
##   <Category id="18">
##     <Category> Sports Equipment</Category>
##     <ItemName>Basketball</ItemName>
##     <ItemID>501</ItemID>
##     <Brand>SportsGear</Brand>
##     <Price>29.99</Price>
##     <VariationID>501-A</VariationID>
##     <Color>Orange</Color>
##     <VariationDetails> Size 7</VariationDetails>
##   </Category>
##   <Category id="19">
##     <Category> Sports Equipment</Category>
##     <ItemName>Tennis Racket</ItemName>
##     <ItemID>502</ItemID>
##     <Brand>RacketPro</Brand>
##     <Price>89.99</Price>
##     <VariationID>502-A</VariationID>
##     <Color>Black</Color>
##     <VariationDetails>Material: Graphite</VariationDetails>
##   </Category>
##   <Category id="20">
##     <Category> Sports Equipment</Category>
##     <ItemName>Tennis Racket</ItemName>
##     <ItemID>502</ItemID>
##     <Brand>RacketPro</Brand>
##     <Price>89.99</Price>
##     <VariationID>502-B</VariationID>
##     <Color>Silver</Color>
##     <VariationDetails>Aluminum</VariationDetails>
##   </Category>
## </CUNYMart>
XmlUrl <- getURL('https://raw.githubusercontent.com/asadny82/Data607/refs/heads/main/week7.xml')
data_XML <- XmlUrl %>%
  xmlParse() %>%
  xmlRoot() %>%
  xmlToDataFrame(stringsAsFactors = FALSE)
data_XML
##             Category          ItemName ItemID      Brand   Price VariationID
## 1        Electronics        Smartphone    101  TechBrand  699.99       101-A
## 2        Electronics        Smartphone    101  TechBrand  699.99       101-B
## 3        Electronics            Laptop    102 CompuBrand 1099.99       102-A
## 4        Electronics            Laptop    102 CompuBrand 1099.99       102-B
## 5    Home Appliances      Refrigerator    201   HomeCool  899.99       201-A
## 6    Home Appliances      Refrigerator    201   HomeCool  899.99       201-B
## 7    Home Appliances   Washing Machine    202  CleanTech  499.99       202-A
## 8    Home Appliances   Washing Machine    202  CleanTech  499.99       202-B
## 9           Clothing           T-Shirt    301  FashionCo   19.99       301-A
## 10          Clothing           T-Shirt    301  FashionCo   19.99       301-B
## 11          Clothing           T-Shirt    301  FashionCo   19.99       301-C
## 12          Clothing             Jeans    302 DenimWorks   49.99       302-A
## 13          Clothing             Jeans    302 DenimWorks   49.99       302-B
## 14             Books     Fiction Novel    401          -   14.99       401-A
## 15             Books     Fiction Novel    401          -   14.99       401-B
## 16             Books Non-Fiction Guide    402          -   24.99       402-A
## 17             Books Non-Fiction Guide    402          -   24.99       402-B
## 18  Sports Equipment        Basketball    501 SportsGear   29.99       501-A
## 19  Sports Equipment     Tennis Racket    502  RacketPro   89.99       502-A
## 20  Sports Equipment     Tennis Racket    502  RacketPro   89.99       502-B
##         Color     VariationDetails
## 1       Black        Storage: 64GB
## 2       White       Storage: 128GB
## 3      Silver       Storage: 256GB
## 4  Space Gray       Storage: 512GB
## 5               Capacity: 20 cu ft
## 6       White   Capacity: 18 cu ft
## 7              Capacity: 4.5 cu ft
## 8              Capacity: 5.0 cu ft
## 9        Blue              Size: S
## 10        Red              Size: M
## 11      Green              Size: L
## 12  Dark Blue             Size: 32
## 13 Light Blue             Size: 34
## 14               Language: English
## 15               Language: Spanish
## 16               Language: English
## 17                Language: French
## 18     Orange               Size 7
## 19      Black   Material: Graphite
## 20     Silver             Aluminum
names(data_XML)[1] <- 'catagory'
names(data_XML)[2] <-'itemName'
names(data_XML)[3] <- 'itemID'
names(data_XML)[4] <- 'brand'
names(data_XML)[5] <- 'price'
names(data_XML)[6] <- 'variationID'
names(data_XML)[7] <- 'color'
names(data_XML)[8] <- 'variationDetails'
data_HTML
## # A tibble: 20 × 9
## # Groups:   itemName [10]
##        x catagory         itemName         itemID brand  price variationID color
##    <int> <chr>            <chr>             <int> <chr>  <dbl> <chr>       <chr>
##  1     1 Electronics      Smartphone          101 Tech…  700.  101-A       Black
##  2     2 Electronics      Smartphone          101 Tech…  700.  101-B       White
##  3     3 Electronics      Laptop              102 Comp… 1100.  102-A       Silv…
##  4     4 Electronics      Laptop              102 Comp… 1100.  102-B       Spac…
##  5     5 Home Appliances  Refrigerator        201 Home…  900.  201-A       Stai…
##  6     6 Home Appliances  Refrigerator        201 Home…  900.  201-B       White
##  7     7 Home Appliances  Washing Machine     202 Clea…  500.  202-A       Fron…
##  8     8 Home Appliances  Washing Machine     202 Clea…  500.  202-B       Type…
##  9     9 Clothing         T-Shirt             301 Fash…   20.0 301-A       Blue 
## 10    10 Clothing         T-Shirt             301 Fash…   20.0 301-B       Red  
## 11    11 Clothing         T-Shirt             301 Fash…   20.0 301-C       Green
## 12    12 Clothing         Jeans               302 Deni…   50.0 302-A       Dark…
## 13    13 Clothing         Jeans               302 Deni…   50.0 302-B       Ligh…
## 14    14 Books            Fiction Novel       401 -       15.0 401-A       Hard…
## 15    15 Books            Fiction Novel       401 -       15.0 401-B       Pape…
## 16    16 Books            Non-Fiction Gui…    402 -       25.0 402-A       eBook
## 17    17 Books            Non-Fiction Gui…    402 -       25.0 402-B       Form…
## 18    18 Sports Equipment Basketball          501 Spor…   30.0 501-A       Size…
## 19    19 Sports Equipment Tennis Racket       502 Rack…   90.0 502-A       Black
## 20    20 Sports Equipment Tennis Racket       502 Rack…   90.0 502-B       Silv…
## # ℹ 1 more variable: variationDetails <chr>
data <- na.omit(data_XML) %>%  
mutate(color = na_if(color,'')) %>%
fill(color, .direction = 'down') %>%
mutate(variationDetails = str_replace(variationDetails, 'Size: S','Storage: 64GB'),
variationDetails = str_replace(variationDetails, 'Size: M','Storage: 64GB')) %>%
  
group_by(itemName) 
data
## # A tibble: 20 × 8
## # Groups:   itemName [11]
##    catagory       itemName itemID brand price variationID color variationDetails
##    <chr>          <chr>    <chr>  <chr> <chr> <chr>       <chr> <chr>           
##  1 " Electronics" " Smart… 101    " Te… 699.… " 101-A"    "Bla… "  Storage: 64G…
##  2 " Electronics" "Smartp… 101    "Tec… 699.… "101-B"     " Wh… " Storage: 128G…
##  3 " Electronics" "Laptop" 102    "Com… 1099… "102-A"     "Sil… " Storage: 256G…
##  4 " Electronics" "Laptop" 102    "Com… 1099… "102-B"     "Spa… " Storage: 512G…
##  5 " Home Applia… "Refrig… 201    "Hom… 899.… "201-A"     "Spa… " Capacity: 20 …
##  6 " Home Applia… "Refrig… 201    "Hom… 899.… "201-B"     "Whi… " Capacity: 18 …
##  7 " Home Applia… "Washin… 202    "Cle… 499.… "202-A"     "Whi… " Capacity: 4.5…
##  8 " Home Applia… "Washin… 202    "Cle… 499.… "202-B"     "Whi… " Capacity: 5.0…
##  9 " Clothing"    "T-Shir… 301    "Fas… 19.99 "301-A"     "Blu… " Storage: 64GB"
## 10 " Clothing"    "T-Shir… 301    "Fas… 19.99 "301-B"     "Red" " Size: M"      
## 11 " Clothing"    "T-Shir… 301    "Fas… 19.99 "301-C"     "Gre… " Size: L"      
## 12 " Clothing"    "Jeans"  302    "Den… 49.99 "302-A"     "Dar… " Size: 32"     
## 13 " Clothing"    "Jeans"  302    "Den… 49.99 "302-B"     "Lig… " Size: 34"     
## 14 " Books"       "Fictio… 401    "-"   14.99 "401-A"     "Lig… " Language: Eng…
## 15 " Books"       "Fictio… 401    "-"   14.99 "401-B"     "Lig… " Language: Spa…
## 16 " Books"       "Non-Fi… 402    "-"   24.99 "402-A"     "Lig… " Language: Eng…
## 17 " Books"       "Non-Fi… 402    "-"   24.99 "402-B"     "Lig… " Language: Fre…
## 18 " Sports Equi… "Basket… 501    "Spo… 29.99 "501-A"     "Ora… " Size 7"       
## 19 " Sports Equi… "Tennis… 502    "Rac… 89.99 "502-A"     "Bla… "Material: Grap…
## 20 " Sports Equi… "Tennis… 502    "Rac… 89.99 "502-B"     "Sil… "Aluminum"
data <- data %>%
  gather('Item','product', 2:3) %>%
  spread(Item, product) 
data
## # A tibble: 20 × 8
##    catagory       brand price variationID color variationDetails itemID itemName
##    <chr>          <chr> <chr> <chr>       <chr> <chr>            <chr>  <chr>   
##  1 " Books"       "-"   14.99 "401-A"     "Lig… " Language: Eng… 401    "Fictio…
##  2 " Books"       "-"   14.99 "401-B"     "Lig… " Language: Spa… 401    "Fictio…
##  3 " Books"       "-"   24.99 "402-A"     "Lig… " Language: Eng… 402    "Non-Fi…
##  4 " Books"       "-"   24.99 "402-B"     "Lig… " Language: Fre… 402    "Non-Fi…
##  5 " Clothing"    "Den… 49.99 "302-A"     "Dar… " Size: 32"      302    "Jeans" 
##  6 " Clothing"    "Den… 49.99 "302-B"     "Lig… " Size: 34"      302    "Jeans" 
##  7 " Clothing"    "Fas… 19.99 "301-A"     "Blu… " Storage: 64GB" 301    "T-Shir…
##  8 " Clothing"    "Fas… 19.99 "301-B"     "Red" " Size: M"       301    "T-Shir…
##  9 " Clothing"    "Fas… 19.99 "301-C"     "Gre… " Size: L"       301    "T-Shir…
## 10 " Electronics" " Te… 699.… " 101-A"    "Bla… "  Storage: 64G… 101    " Smart…
## 11 " Electronics" "Com… 1099… "102-A"     "Sil… " Storage: 256G… 102    "Laptop"
## 12 " Electronics" "Com… 1099… "102-B"     "Spa… " Storage: 512G… 102    "Laptop"
## 13 " Electronics" "Tec… 699.… "101-B"     " Wh… " Storage: 128G… 101    "Smartp…
## 14 " Home Applia… "Cle… 499.… "202-A"     "Whi… " Capacity: 4.5… 202    "Washin…
## 15 " Home Applia… "Cle… 499.… "202-B"     "Whi… " Capacity: 5.0… 202    "Washin…
## 16 " Home Applia… "Hom… 899.… "201-A"     "Spa… " Capacity: 20 … 201    "Refrig…
## 17 " Home Applia… "Hom… 899.… "201-B"     "Whi… " Capacity: 18 … 201    "Refrig…
## 18 " Sports Equi… "Rac… 89.99 "502-A"     "Bla… "Material: Grap… 502    "Tennis…
## 19 " Sports Equi… "Rac… 89.99 "502-B"     "Sil… "Aluminum"       502    "Tennis…
## 20 " Sports Equi… "Spo… 29.99 "501-A"     "Ora… " Size 7"        501    "Basket…

The pros and cons of each format:

It is possible to load each file from the remote source but RCurl was used for HTML and XML files and the JSON file was imported directly with the JSON function. Also, each file is loaded in a slightly different way and requires some manual effort to create a data frame. The HTML file needs to be converted to numbers. It had to be converted from a wide to a long format and unnested from there. The XML file was automatically imported as an XML object. Had to extract data using xmlParse, xmlRoot,and xmlToDataFrame. These three data frames are almost identical. There is a difference when parsing numeric values from source files to R data frames. The html_table function from the package automatically parses numbers as numeric values and must use xmlToDataFrame in XML.