Overview

In this lab, we have been tasked with creating many different file types from a list of inventory at a local store. These file types include JSON, HTML, XML and Parquet. Once the data is transformed into the correct format, I will export it to the file type for easy storage. Additionally I will go over the pros and cons of each file type.

Load Packages

In this lab, I researched many different packages to help format data appropriately for export.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(arrow)
## Warning: package 'arrow' was built under R version 4.4.3
## 
## Attaching package: 'arrow'
## 
## The following object is masked from 'package:lubridate':
## 
##     duration
## 
## The following object is masked from 'package:utils':
## 
##     timestamp
library(repurrrsive)
## Warning: package 'repurrrsive' was built under R version 4.4.3
library(jsonlite)
## 
## Attaching package: 'jsonlite'
## 
## The following object is masked from 'package:purrr':
## 
##     flatten
library(XML)
library(xml2)
library(tableHTML)
## Warning: package 'tableHTML' was built under R version 4.4.3

Once these were installed I could begin.

Import and Cleaning

I first saved the data given in a pdf into a CSV file, which I then saved to Github to import.

lab7data <- read.csv("https://raw.githubusercontent.com/scrummett/DATA607/refs/heads/main/lab7data.csv", row.names = NULL)

However, due to the number of columns and the number of values separated by commas, I must adjust the table.

colnames(lab7data) <- c(colnames(lab7data)[-1], "Variation.Details2")
lab7data <- lab7data |> 
  mutate(Variation.Details = paste(Variation.Details, Variation.Details2, sep = ",")) |> 
  select(!Variation.Details2)

Here I have shifted the column names left by one, named the furthest right column, then combined the values from the two columns containing item details into one column. From here we can create our different files.

JSON

I will start with our easiest files to code, that being a JSON file. Fortunately, the package “jsonlite” makes this incredibly easy for us.

toJSON(lab7data)
## [{"Category":" Electronics","Item.Name":"Smartphone","Item.ID":101,"Brand":"TechBrand","Price":699.99,"Variation.ID":"101-A","Variation.Details":"Color: Black, Storage: 64GB"},{"Category":" Electronics","Item.Name":"Smartphone","Item.ID":101,"Brand":"TechBrand","Price":699.99,"Variation.ID":"101-B","Variation.Details":"Color: White, Storage: 128GB"},{"Category":" Electronics","Item.Name":"Laptop","Item.ID":102,"Brand":"CompuBrand","Price":1099.99,"Variation.ID":"102-A","Variation.Details":"Color: Silver, Storage: 256GB"},{"Category":" Electronics","Item.Name":"Laptop","Item.ID":102,"Brand":"CompuBrand","Price":1099.99,"Variation.ID":"102-B","Variation.Details":"Color: Space Gray, Storage: 512GB"},{"Category":" Home Appliances","Item.Name":"Refrigerator","Item.ID":201,"Brand":"HomeCool","Price":899.99,"Variation.ID":"201-A","Variation.Details":"Color: Stainless Steel, Capacity: 20 cu ft"},{"Category":" Home Appliances","Item.Name":"Refrigerator","Item.ID":201,"Brand":"HomeCool","Price":899.99,"Variation.ID":"201-B","Variation.Details":"Color: White, Capacity: 18 cu ft"},{"Category":" Home Appliances","Item.Name":"Washing Machine","Item.ID":202,"Brand":"CleanTech","Price":499.99,"Variation.ID":"202-A","Variation.Details":"Type: Front Load, Capacity: 4.5 cu ft"},{"Category":" Home Appliances","Item.Name":"Washing Machine","Item.ID":202,"Brand":"CleanTech","Price":499.99,"Variation.ID":"202-B","Variation.Details":"Type: Top Load, Capacity: 5.0 cu ft"},{"Category":" Clothing","Item.Name":"T-Shirt","Item.ID":301,"Brand":"FashionCo","Price":19.99,"Variation.ID":"301-A","Variation.Details":"Color: Blue, Size: S"},{"Category":" Clothing","Item.Name":"T-Shirt","Item.ID":301,"Brand":"FashionCo","Price":19.99,"Variation.ID":"301-B","Variation.Details":"Color: Red, Size: M"},{"Category":" Clothing","Item.Name":"T-Shirt","Item.ID":301,"Brand":"FashionCo","Price":19.99,"Variation.ID":"301-C","Variation.Details":"Color: Green, Size: L"},{"Category":" Clothing","Item.Name":"Jeans","Item.ID":302,"Brand":"DenimWorks","Price":49.99,"Variation.ID":"302-A","Variation.Details":"Color: Dark Blue, Size: 32"},{"Category":" Clothing","Item.Name":"Jeans","Item.ID":302,"Brand":"DenimWorks","Price":49.99,"Variation.ID":"302-B","Variation.Details":"Color: Light Blue, Size: 34"},{"Category":" Books","Item.Name":"Fiction Novel","Item.ID":401,"Brand":"-","Price":14.99,"Variation.ID":"401-A","Variation.Details":"Format: Hardcover, Language: English"},{"Category":" Books","Item.Name":"Fiction Novel","Item.ID":401,"Brand":"-","Price":14.99,"Variation.ID":"401-B","Variation.Details":"Format: Paperback, Language: Spanish"},{"Category":" Books","Item.Name":"Non-Fiction Guide","Item.ID":402,"Brand":"-","Price":24.99,"Variation.ID":"402-A","Variation.Details":"Format: eBook, Language: English"},{"Category":" Books","Item.Name":"Non-Fiction Guide","Item.ID":402,"Brand":"-","Price":24.99,"Variation.ID":"402-B","Variation.Details":"Format: Paperback, Language: French"},{"Category":" Sports Equipment","Item.Name":"Basketball","Item.ID":501,"Brand":"SportsGear","Price":29.99,"Variation.ID":"501-A","Variation.Details":"Size: Size 7, Color: Orange"},{"Category":" Sports Equipment","Item.Name":"Tennis Racket","Item.ID":502,"Brand":"RacketPro","Price":89.99,"Variation.ID":"502-A","Variation.Details":"Material: Graphite, Color: Black"},{"Category":" Sports Equipment","Item.Name":"Tennis Racket","Item.ID":502,"Brand":"RacketPro","Price":89.99,"Variation.ID":"502-B","Variation.Details":"Material: Aluminum, Color: Silver"}]

This converts our data into JSON format, from which we can export it.

exportJSON <- toJSON(lab7data)
write(exportJSON, "C:/Users/crumm/OneDrive/Documents/MSDS Spring 2025/DATA607 Labs/lab7data.JSON")
JSONtest <- fromJSON("C:/Users/crumm/OneDrive/Documents/MSDS Spring 2025/DATA607 Labs/lab7data.JSON")
print(JSONtest)
##             Category         Item.Name Item.ID      Brand   Price Variation.ID
## 1        Electronics        Smartphone     101  TechBrand  699.99        101-A
## 2        Electronics        Smartphone     101  TechBrand  699.99        101-B
## 3        Electronics            Laptop     102 CompuBrand 1099.99        102-A
## 4        Electronics            Laptop     102 CompuBrand 1099.99        102-B
## 5    Home Appliances      Refrigerator     201   HomeCool  899.99        201-A
## 6    Home Appliances      Refrigerator     201   HomeCool  899.99        201-B
## 7    Home Appliances   Washing Machine     202  CleanTech  499.99        202-A
## 8    Home Appliances   Washing Machine     202  CleanTech  499.99        202-B
## 9           Clothing           T-Shirt     301  FashionCo   19.99        301-A
## 10          Clothing           T-Shirt     301  FashionCo   19.99        301-B
## 11          Clothing           T-Shirt     301  FashionCo   19.99        301-C
## 12          Clothing             Jeans     302 DenimWorks   49.99        302-A
## 13          Clothing             Jeans     302 DenimWorks   49.99        302-B
## 14             Books     Fiction Novel     401          -   14.99        401-A
## 15             Books     Fiction Novel     401          -   14.99        401-B
## 16             Books Non-Fiction Guide     402          -   24.99        402-A
## 17             Books Non-Fiction Guide     402          -   24.99        402-B
## 18  Sports Equipment        Basketball     501 SportsGear   29.99        501-A
## 19  Sports Equipment     Tennis Racket     502  RacketPro   89.99        502-A
## 20  Sports Equipment     Tennis Racket     502  RacketPro   89.99        502-B
##                             Variation.Details
## 1                 Color: Black, Storage: 64GB
## 2                Color: White, Storage: 128GB
## 3               Color: Silver, Storage: 256GB
## 4           Color: Space Gray, Storage: 512GB
## 5  Color: Stainless Steel, Capacity: 20 cu ft
## 6            Color: White, Capacity: 18 cu ft
## 7       Type: Front Load, Capacity: 4.5 cu ft
## 8         Type: Top Load, Capacity: 5.0 cu ft
## 9                        Color: Blue, Size: S
## 10                        Color: Red, Size: M
## 11                      Color: Green, Size: L
## 12                 Color: Dark Blue, Size: 32
## 13                Color: Light Blue, Size: 34
## 14       Format: Hardcover, Language: English
## 15       Format: Paperback, Language: Spanish
## 16           Format: eBook, Language: English
## 17        Format: Paperback, Language: French
## 18                Size: Size 7, Color: Orange
## 19           Material: Graphite, Color: Black
## 20          Material: Aluminum, Color: Silver

Here we have saved it as a JSON file, as well as called that same file to test it to see if our save has worked. This table demonstrates that we have saved it correctly.

Parquet

Next is saving our data to a parquet file, which is made simple by the package “arrow”.

write_parquet(lab7data, "C:/Users/crumm/OneDrive/Documents/MSDS Spring 2025/DATA607 Labs/lab7data.parquet")

Here we have simply taken our data and saved it as a parquet file.

lab7parquet <- open_dataset("C:/Users/crumm/OneDrive/Documents/MSDS Spring 2025/DATA607 Labs/lab7data.parquet")
head(lab7parquet) |> collect()
## # A tibble: 6 × 7
##   Category          Item.Name Item.ID Brand Price Variation.ID Variation.Details
##   <chr>             <chr>       <int> <chr> <dbl> <chr>        <chr>            
## 1 " Electronics"    Smartpho…     101 Tech…  700. 101-A        Color: Black, St…
## 2 " Electronics"    Smartpho…     101 Tech…  700. 101-B        Color: White, St…
## 3 " Electronics"    Laptop        102 Comp… 1100. 102-A        Color: Silver, S…
## 4 " Electronics"    Laptop        102 Comp… 1100. 102-B        Color: Space Gra…
## 5 " Home Appliance… Refriger…     201 Home…  900. 201-A        Color: Stainless…
## 6 " Home Appliance… Refriger…     201 Home…  900. 201-B        Color: White, Ca…

Again we have called that saved file and presented it as a table, showing that we have successfully created a parquet file.

HTML

Again, this is made simple ny a package, this package being called “tableHTML”. It takes data from a table and converts it into a table in html.

lab7HTML <- tableHTML(lab7data)
write_tableHTML(lab7HTML, file = "C:/Users/crumm/OneDrive/Documents/MSDS Spring 2025/DATA607 Labs/lab7data.html")
print(lab7HTML)
## 
## <table style="border-collapse:collapse;" class=table_6610 border=1>
## <thead>
## <tr>
##   <th id="tableHTML_header_1"> </th>
##   <th id="tableHTML_header_2">Category</th>
##   <th id="tableHTML_header_3">Item.Name</th>
##   <th id="tableHTML_header_4">Item.ID</th>
##   <th id="tableHTML_header_5">Brand</th>
##   <th id="tableHTML_header_6">Price</th>
##   <th id="tableHTML_header_7">Variation.ID</th>
##   <th id="tableHTML_header_8">Variation.Details</th>
## </tr>
## </thead>
## <tbody>
## <tr>
##   <td id="tableHTML_rownames">1</td>
##   <td id="tableHTML_column_1"> Electronics</td>
##   <td id="tableHTML_column_2">Smartphone</td>
##   <td id="tableHTML_column_3">101</td>
##   <td id="tableHTML_column_4">TechBrand</td>
##   <td id="tableHTML_column_5">699.99</td>
##   <td id="tableHTML_column_6">101-A</td>
##   <td id="tableHTML_column_7">Color: Black, Storage: 64GB</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">2</td>
##   <td id="tableHTML_column_1"> Electronics</td>
##   <td id="tableHTML_column_2">Smartphone</td>
##   <td id="tableHTML_column_3">101</td>
##   <td id="tableHTML_column_4">TechBrand</td>
##   <td id="tableHTML_column_5">699.99</td>
##   <td id="tableHTML_column_6">101-B</td>
##   <td id="tableHTML_column_7">Color: White, Storage: 128GB</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">3</td>
##   <td id="tableHTML_column_1"> Electronics</td>
##   <td id="tableHTML_column_2">Laptop</td>
##   <td id="tableHTML_column_3">102</td>
##   <td id="tableHTML_column_4">CompuBrand</td>
##   <td id="tableHTML_column_5">1099.99</td>
##   <td id="tableHTML_column_6">102-A</td>
##   <td id="tableHTML_column_7">Color: Silver, Storage: 256GB</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">4</td>
##   <td id="tableHTML_column_1"> Electronics</td>
##   <td id="tableHTML_column_2">Laptop</td>
##   <td id="tableHTML_column_3">102</td>
##   <td id="tableHTML_column_4">CompuBrand</td>
##   <td id="tableHTML_column_5">1099.99</td>
##   <td id="tableHTML_column_6">102-B</td>
##   <td id="tableHTML_column_7">Color: Space Gray, Storage: 512GB</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">5</td>
##   <td id="tableHTML_column_1"> Home Appliances</td>
##   <td id="tableHTML_column_2">Refrigerator</td>
##   <td id="tableHTML_column_3">201</td>
##   <td id="tableHTML_column_4">HomeCool</td>
##   <td id="tableHTML_column_5">899.99</td>
##   <td id="tableHTML_column_6">201-A</td>
##   <td id="tableHTML_column_7">Color: Stainless Steel, Capacity: 20 cu ft</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">6</td>
##   <td id="tableHTML_column_1"> Home Appliances</td>
##   <td id="tableHTML_column_2">Refrigerator</td>
##   <td id="tableHTML_column_3">201</td>
##   <td id="tableHTML_column_4">HomeCool</td>
##   <td id="tableHTML_column_5">899.99</td>
##   <td id="tableHTML_column_6">201-B</td>
##   <td id="tableHTML_column_7">Color: White, Capacity: 18 cu ft</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">7</td>
##   <td id="tableHTML_column_1"> Home Appliances</td>
##   <td id="tableHTML_column_2">Washing Machine</td>
##   <td id="tableHTML_column_3">202</td>
##   <td id="tableHTML_column_4">CleanTech</td>
##   <td id="tableHTML_column_5">499.99</td>
##   <td id="tableHTML_column_6">202-A</td>
##   <td id="tableHTML_column_7">Type: Front Load, Capacity: 4.5 cu ft</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">8</td>
##   <td id="tableHTML_column_1"> Home Appliances</td>
##   <td id="tableHTML_column_2">Washing Machine</td>
##   <td id="tableHTML_column_3">202</td>
##   <td id="tableHTML_column_4">CleanTech</td>
##   <td id="tableHTML_column_5">499.99</td>
##   <td id="tableHTML_column_6">202-B</td>
##   <td id="tableHTML_column_7">Type: Top Load, Capacity: 5.0 cu ft</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">9</td>
##   <td id="tableHTML_column_1"> Clothing</td>
##   <td id="tableHTML_column_2">T-Shirt</td>
##   <td id="tableHTML_column_3">301</td>
##   <td id="tableHTML_column_4">FashionCo</td>
##   <td id="tableHTML_column_5">19.99</td>
##   <td id="tableHTML_column_6">301-A</td>
##   <td id="tableHTML_column_7">Color: Blue, Size: S</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">10</td>
##   <td id="tableHTML_column_1"> Clothing</td>
##   <td id="tableHTML_column_2">T-Shirt</td>
##   <td id="tableHTML_column_3">301</td>
##   <td id="tableHTML_column_4">FashionCo</td>
##   <td id="tableHTML_column_5">19.99</td>
##   <td id="tableHTML_column_6">301-B</td>
##   <td id="tableHTML_column_7">Color: Red, Size: M</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">11</td>
##   <td id="tableHTML_column_1"> Clothing</td>
##   <td id="tableHTML_column_2">T-Shirt</td>
##   <td id="tableHTML_column_3">301</td>
##   <td id="tableHTML_column_4">FashionCo</td>
##   <td id="tableHTML_column_5">19.99</td>
##   <td id="tableHTML_column_6">301-C</td>
##   <td id="tableHTML_column_7">Color: Green, Size: L</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">12</td>
##   <td id="tableHTML_column_1"> Clothing</td>
##   <td id="tableHTML_column_2">Jeans</td>
##   <td id="tableHTML_column_3">302</td>
##   <td id="tableHTML_column_4">DenimWorks</td>
##   <td id="tableHTML_column_5">49.99</td>
##   <td id="tableHTML_column_6">302-A</td>
##   <td id="tableHTML_column_7">Color: Dark Blue, Size: 32</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">13</td>
##   <td id="tableHTML_column_1"> Clothing</td>
##   <td id="tableHTML_column_2">Jeans</td>
##   <td id="tableHTML_column_3">302</td>
##   <td id="tableHTML_column_4">DenimWorks</td>
##   <td id="tableHTML_column_5">49.99</td>
##   <td id="tableHTML_column_6">302-B</td>
##   <td id="tableHTML_column_7">Color: Light Blue, Size: 34</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">14</td>
##   <td id="tableHTML_column_1"> Books</td>
##   <td id="tableHTML_column_2">Fiction Novel</td>
##   <td id="tableHTML_column_3">401</td>
##   <td id="tableHTML_column_4">-</td>
##   <td id="tableHTML_column_5">14.99</td>
##   <td id="tableHTML_column_6">401-A</td>
##   <td id="tableHTML_column_7">Format: Hardcover, Language: English</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">15</td>
##   <td id="tableHTML_column_1"> Books</td>
##   <td id="tableHTML_column_2">Fiction Novel</td>
##   <td id="tableHTML_column_3">401</td>
##   <td id="tableHTML_column_4">-</td>
##   <td id="tableHTML_column_5">14.99</td>
##   <td id="tableHTML_column_6">401-B</td>
##   <td id="tableHTML_column_7">Format: Paperback, Language: Spanish</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">16</td>
##   <td id="tableHTML_column_1"> Books</td>
##   <td id="tableHTML_column_2">Non-Fiction Guide</td>
##   <td id="tableHTML_column_3">402</td>
##   <td id="tableHTML_column_4">-</td>
##   <td id="tableHTML_column_5">24.99</td>
##   <td id="tableHTML_column_6">402-A</td>
##   <td id="tableHTML_column_7">Format: eBook, Language: English</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">17</td>
##   <td id="tableHTML_column_1"> Books</td>
##   <td id="tableHTML_column_2">Non-Fiction Guide</td>
##   <td id="tableHTML_column_3">402</td>
##   <td id="tableHTML_column_4">-</td>
##   <td id="tableHTML_column_5">24.99</td>
##   <td id="tableHTML_column_6">402-B</td>
##   <td id="tableHTML_column_7">Format: Paperback, Language: French</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">18</td>
##   <td id="tableHTML_column_1"> Sports Equipment</td>
##   <td id="tableHTML_column_2">Basketball</td>
##   <td id="tableHTML_column_3">501</td>
##   <td id="tableHTML_column_4">SportsGear</td>
##   <td id="tableHTML_column_5">29.99</td>
##   <td id="tableHTML_column_6">501-A</td>
##   <td id="tableHTML_column_7">Size: Size 7, Color: Orange</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">19</td>
##   <td id="tableHTML_column_1"> Sports Equipment</td>
##   <td id="tableHTML_column_2">Tennis Racket</td>
##   <td id="tableHTML_column_3">502</td>
##   <td id="tableHTML_column_4">RacketPro</td>
##   <td id="tableHTML_column_5">89.99</td>
##   <td id="tableHTML_column_6">502-A</td>
##   <td id="tableHTML_column_7">Material: Graphite, Color: Black</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">20</td>
##   <td id="tableHTML_column_1"> Sports Equipment</td>
##   <td id="tableHTML_column_2">Tennis Racket</td>
##   <td id="tableHTML_column_3">502</td>
##   <td id="tableHTML_column_4">RacketPro</td>
##   <td id="tableHTML_column_5">89.99</td>
##   <td id="tableHTML_column_6">502-B</td>
##   <td id="tableHTML_column_7">Material: Aluminum, Color: Silver</td>
## </tr>
## </tbody>
## </table>
cat(lab7HTML)
## 
## <table style="border-collapse:collapse;" class=table_6610 border=1>
## <thead>
## <tr>
##   <th id="tableHTML_header_1"> </th>
##   <th id="tableHTML_header_2">Category</th>
##   <th id="tableHTML_header_3">Item.Name</th>
##   <th id="tableHTML_header_4">Item.ID</th>
##   <th id="tableHTML_header_5">Brand</th>
##   <th id="tableHTML_header_6">Price</th>
##   <th id="tableHTML_header_7">Variation.ID</th>
##   <th id="tableHTML_header_8">Variation.Details</th>
## </tr>
## </thead>
## <tbody>
## <tr>
##   <td id="tableHTML_rownames">1</td>
##   <td id="tableHTML_column_1"> Electronics</td>
##   <td id="tableHTML_column_2">Smartphone</td>
##   <td id="tableHTML_column_3">101</td>
##   <td id="tableHTML_column_4">TechBrand</td>
##   <td id="tableHTML_column_5">699.99</td>
##   <td id="tableHTML_column_6">101-A</td>
##   <td id="tableHTML_column_7">Color: Black, Storage: 64GB</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">2</td>
##   <td id="tableHTML_column_1"> Electronics</td>
##   <td id="tableHTML_column_2">Smartphone</td>
##   <td id="tableHTML_column_3">101</td>
##   <td id="tableHTML_column_4">TechBrand</td>
##   <td id="tableHTML_column_5">699.99</td>
##   <td id="tableHTML_column_6">101-B</td>
##   <td id="tableHTML_column_7">Color: White, Storage: 128GB</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">3</td>
##   <td id="tableHTML_column_1"> Electronics</td>
##   <td id="tableHTML_column_2">Laptop</td>
##   <td id="tableHTML_column_3">102</td>
##   <td id="tableHTML_column_4">CompuBrand</td>
##   <td id="tableHTML_column_5">1099.99</td>
##   <td id="tableHTML_column_6">102-A</td>
##   <td id="tableHTML_column_7">Color: Silver, Storage: 256GB</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">4</td>
##   <td id="tableHTML_column_1"> Electronics</td>
##   <td id="tableHTML_column_2">Laptop</td>
##   <td id="tableHTML_column_3">102</td>
##   <td id="tableHTML_column_4">CompuBrand</td>
##   <td id="tableHTML_column_5">1099.99</td>
##   <td id="tableHTML_column_6">102-B</td>
##   <td id="tableHTML_column_7">Color: Space Gray, Storage: 512GB</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">5</td>
##   <td id="tableHTML_column_1"> Home Appliances</td>
##   <td id="tableHTML_column_2">Refrigerator</td>
##   <td id="tableHTML_column_3">201</td>
##   <td id="tableHTML_column_4">HomeCool</td>
##   <td id="tableHTML_column_5">899.99</td>
##   <td id="tableHTML_column_6">201-A</td>
##   <td id="tableHTML_column_7">Color: Stainless Steel, Capacity: 20 cu ft</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">6</td>
##   <td id="tableHTML_column_1"> Home Appliances</td>
##   <td id="tableHTML_column_2">Refrigerator</td>
##   <td id="tableHTML_column_3">201</td>
##   <td id="tableHTML_column_4">HomeCool</td>
##   <td id="tableHTML_column_5">899.99</td>
##   <td id="tableHTML_column_6">201-B</td>
##   <td id="tableHTML_column_7">Color: White, Capacity: 18 cu ft</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">7</td>
##   <td id="tableHTML_column_1"> Home Appliances</td>
##   <td id="tableHTML_column_2">Washing Machine</td>
##   <td id="tableHTML_column_3">202</td>
##   <td id="tableHTML_column_4">CleanTech</td>
##   <td id="tableHTML_column_5">499.99</td>
##   <td id="tableHTML_column_6">202-A</td>
##   <td id="tableHTML_column_7">Type: Front Load, Capacity: 4.5 cu ft</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">8</td>
##   <td id="tableHTML_column_1"> Home Appliances</td>
##   <td id="tableHTML_column_2">Washing Machine</td>
##   <td id="tableHTML_column_3">202</td>
##   <td id="tableHTML_column_4">CleanTech</td>
##   <td id="tableHTML_column_5">499.99</td>
##   <td id="tableHTML_column_6">202-B</td>
##   <td id="tableHTML_column_7">Type: Top Load, Capacity: 5.0 cu ft</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">9</td>
##   <td id="tableHTML_column_1"> Clothing</td>
##   <td id="tableHTML_column_2">T-Shirt</td>
##   <td id="tableHTML_column_3">301</td>
##   <td id="tableHTML_column_4">FashionCo</td>
##   <td id="tableHTML_column_5">19.99</td>
##   <td id="tableHTML_column_6">301-A</td>
##   <td id="tableHTML_column_7">Color: Blue, Size: S</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">10</td>
##   <td id="tableHTML_column_1"> Clothing</td>
##   <td id="tableHTML_column_2">T-Shirt</td>
##   <td id="tableHTML_column_3">301</td>
##   <td id="tableHTML_column_4">FashionCo</td>
##   <td id="tableHTML_column_5">19.99</td>
##   <td id="tableHTML_column_6">301-B</td>
##   <td id="tableHTML_column_7">Color: Red, Size: M</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">11</td>
##   <td id="tableHTML_column_1"> Clothing</td>
##   <td id="tableHTML_column_2">T-Shirt</td>
##   <td id="tableHTML_column_3">301</td>
##   <td id="tableHTML_column_4">FashionCo</td>
##   <td id="tableHTML_column_5">19.99</td>
##   <td id="tableHTML_column_6">301-C</td>
##   <td id="tableHTML_column_7">Color: Green, Size: L</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">12</td>
##   <td id="tableHTML_column_1"> Clothing</td>
##   <td id="tableHTML_column_2">Jeans</td>
##   <td id="tableHTML_column_3">302</td>
##   <td id="tableHTML_column_4">DenimWorks</td>
##   <td id="tableHTML_column_5">49.99</td>
##   <td id="tableHTML_column_6">302-A</td>
##   <td id="tableHTML_column_7">Color: Dark Blue, Size: 32</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">13</td>
##   <td id="tableHTML_column_1"> Clothing</td>
##   <td id="tableHTML_column_2">Jeans</td>
##   <td id="tableHTML_column_3">302</td>
##   <td id="tableHTML_column_4">DenimWorks</td>
##   <td id="tableHTML_column_5">49.99</td>
##   <td id="tableHTML_column_6">302-B</td>
##   <td id="tableHTML_column_7">Color: Light Blue, Size: 34</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">14</td>
##   <td id="tableHTML_column_1"> Books</td>
##   <td id="tableHTML_column_2">Fiction Novel</td>
##   <td id="tableHTML_column_3">401</td>
##   <td id="tableHTML_column_4">-</td>
##   <td id="tableHTML_column_5">14.99</td>
##   <td id="tableHTML_column_6">401-A</td>
##   <td id="tableHTML_column_7">Format: Hardcover, Language: English</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">15</td>
##   <td id="tableHTML_column_1"> Books</td>
##   <td id="tableHTML_column_2">Fiction Novel</td>
##   <td id="tableHTML_column_3">401</td>
##   <td id="tableHTML_column_4">-</td>
##   <td id="tableHTML_column_5">14.99</td>
##   <td id="tableHTML_column_6">401-B</td>
##   <td id="tableHTML_column_7">Format: Paperback, Language: Spanish</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">16</td>
##   <td id="tableHTML_column_1"> Books</td>
##   <td id="tableHTML_column_2">Non-Fiction Guide</td>
##   <td id="tableHTML_column_3">402</td>
##   <td id="tableHTML_column_4">-</td>
##   <td id="tableHTML_column_5">24.99</td>
##   <td id="tableHTML_column_6">402-A</td>
##   <td id="tableHTML_column_7">Format: eBook, Language: English</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">17</td>
##   <td id="tableHTML_column_1"> Books</td>
##   <td id="tableHTML_column_2">Non-Fiction Guide</td>
##   <td id="tableHTML_column_3">402</td>
##   <td id="tableHTML_column_4">-</td>
##   <td id="tableHTML_column_5">24.99</td>
##   <td id="tableHTML_column_6">402-B</td>
##   <td id="tableHTML_column_7">Format: Paperback, Language: French</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">18</td>
##   <td id="tableHTML_column_1"> Sports Equipment</td>
##   <td id="tableHTML_column_2">Basketball</td>
##   <td id="tableHTML_column_3">501</td>
##   <td id="tableHTML_column_4">SportsGear</td>
##   <td id="tableHTML_column_5">29.99</td>
##   <td id="tableHTML_column_6">501-A</td>
##   <td id="tableHTML_column_7">Size: Size 7, Color: Orange</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">19</td>
##   <td id="tableHTML_column_1"> Sports Equipment</td>
##   <td id="tableHTML_column_2">Tennis Racket</td>
##   <td id="tableHTML_column_3">502</td>
##   <td id="tableHTML_column_4">RacketPro</td>
##   <td id="tableHTML_column_5">89.99</td>
##   <td id="tableHTML_column_6">502-A</td>
##   <td id="tableHTML_column_7">Material: Graphite, Color: Black</td>
## </tr>
## <tr>
##   <td id="tableHTML_rownames">20</td>
##   <td id="tableHTML_column_1"> Sports Equipment</td>
##   <td id="tableHTML_column_2">Tennis Racket</td>
##   <td id="tableHTML_column_3">502</td>
##   <td id="tableHTML_column_4">RacketPro</td>
##   <td id="tableHTML_column_5">89.99</td>
##   <td id="tableHTML_column_6">502-B</td>
##   <td id="tableHTML_column_7">Material: Aluminum, Color: Silver</td>
## </tr>
## </tbody>
## </table>

Here we have saved the file locally, and also confirmed the table has been saved as HTML.

XML

I could not find any package that would simply convert our data into an XML file, so instead I wrote a function that would manually convert it into a more readable XML format.

xml_data <- xml_new_root("lab7data")
lab7data |> 
  pmap(function(Category, Item.Name, Item.ID, Brand, Price, Variation.ID, Variation.Details) {
    node <- xml_add_child(xml_data, "Inventory")
    xml_add_child(node, "Category", Category)
    xml_add_child(node, "Item.Name", Item.Name)
    xml_add_child(node, "Item.ID", Item.ID)
    xml_add_child(node, "Brand", Brand)
    xml_add_child(node, "Price", Price)
    xml_add_child(node, "Variation.ID", Variation.ID)
    xml_add_child(node, "Variation.Details", Variation.Details)
  })
## [[1]]
## {xml_node}
## <Variation.Details>
## 
## [[2]]
## {xml_node}
## <Variation.Details>
## 
## [[3]]
## {xml_node}
## <Variation.Details>
## 
## [[4]]
## {xml_node}
## <Variation.Details>
## 
## [[5]]
## {xml_node}
## <Variation.Details>
## 
## [[6]]
## {xml_node}
## <Variation.Details>
## 
## [[7]]
## {xml_node}
## <Variation.Details>
## 
## [[8]]
## {xml_node}
## <Variation.Details>
## 
## [[9]]
## {xml_node}
## <Variation.Details>
## 
## [[10]]
## {xml_node}
## <Variation.Details>
## 
## [[11]]
## {xml_node}
## <Variation.Details>
## 
## [[12]]
## {xml_node}
## <Variation.Details>
## 
## [[13]]
## {xml_node}
## <Variation.Details>
## 
## [[14]]
## {xml_node}
## <Variation.Details>
## 
## [[15]]
## {xml_node}
## <Variation.Details>
## 
## [[16]]
## {xml_node}
## <Variation.Details>
## 
## [[17]]
## {xml_node}
## <Variation.Details>
## 
## [[18]]
## {xml_node}
## <Variation.Details>
## 
## [[19]]
## {xml_node}
## <Variation.Details>
## 
## [[20]]
## {xml_node}
## <Variation.Details>
write_xml(xml_data, "C:/Users/crumm/OneDrive/Documents/MSDS Spring 2025/DATA607 Labs/lab7data.xml")
read_xml_data <- read_xml("C:/Users/crumm/OneDrive/Documents/MSDS Spring 2025/DATA607 Labs/lab7data.xml")
cat(as.character(read_xml_data))
## <?xml version="1.0" encoding="UTF-8"?>
## <lab7data>
##   <Inventory>
##     <Category> Electronics</Category>
##     <Item.Name>Smartphone</Item.Name>
##     <Item.ID>101</Item.ID>
##     <Brand>TechBrand</Brand>
##     <Price>699.99</Price>
##     <Variation.ID>101-A</Variation.ID>
##     <Variation.Details>Color: Black, Storage: 64GB</Variation.Details>
##   </Inventory>
##   <Inventory>
##     <Category> Electronics</Category>
##     <Item.Name>Smartphone</Item.Name>
##     <Item.ID>101</Item.ID>
##     <Brand>TechBrand</Brand>
##     <Price>699.99</Price>
##     <Variation.ID>101-B</Variation.ID>
##     <Variation.Details>Color: White, Storage: 128GB</Variation.Details>
##   </Inventory>
##   <Inventory>
##     <Category> Electronics</Category>
##     <Item.Name>Laptop</Item.Name>
##     <Item.ID>102</Item.ID>
##     <Brand>CompuBrand</Brand>
##     <Price>1099.99</Price>
##     <Variation.ID>102-A</Variation.ID>
##     <Variation.Details>Color: Silver, Storage: 256GB</Variation.Details>
##   </Inventory>
##   <Inventory>
##     <Category> Electronics</Category>
##     <Item.Name>Laptop</Item.Name>
##     <Item.ID>102</Item.ID>
##     <Brand>CompuBrand</Brand>
##     <Price>1099.99</Price>
##     <Variation.ID>102-B</Variation.ID>
##     <Variation.Details>Color: Space Gray, Storage: 512GB</Variation.Details>
##   </Inventory>
##   <Inventory>
##     <Category> Home Appliances</Category>
##     <Item.Name>Refrigerator</Item.Name>
##     <Item.ID>201</Item.ID>
##     <Brand>HomeCool</Brand>
##     <Price>899.99</Price>
##     <Variation.ID>201-A</Variation.ID>
##     <Variation.Details>Color: Stainless Steel, Capacity: 20 cu ft</Variation.Details>
##   </Inventory>
##   <Inventory>
##     <Category> Home Appliances</Category>
##     <Item.Name>Refrigerator</Item.Name>
##     <Item.ID>201</Item.ID>
##     <Brand>HomeCool</Brand>
##     <Price>899.99</Price>
##     <Variation.ID>201-B</Variation.ID>
##     <Variation.Details>Color: White, Capacity: 18 cu ft</Variation.Details>
##   </Inventory>
##   <Inventory>
##     <Category> Home Appliances</Category>
##     <Item.Name>Washing Machine</Item.Name>
##     <Item.ID>202</Item.ID>
##     <Brand>CleanTech</Brand>
##     <Price>499.99</Price>
##     <Variation.ID>202-A</Variation.ID>
##     <Variation.Details>Type: Front Load, Capacity: 4.5 cu ft</Variation.Details>
##   </Inventory>
##   <Inventory>
##     <Category> Home Appliances</Category>
##     <Item.Name>Washing Machine</Item.Name>
##     <Item.ID>202</Item.ID>
##     <Brand>CleanTech</Brand>
##     <Price>499.99</Price>
##     <Variation.ID>202-B</Variation.ID>
##     <Variation.Details>Type: Top Load, Capacity: 5.0 cu ft</Variation.Details>
##   </Inventory>
##   <Inventory>
##     <Category> Clothing</Category>
##     <Item.Name>T-Shirt</Item.Name>
##     <Item.ID>301</Item.ID>
##     <Brand>FashionCo</Brand>
##     <Price>19.99</Price>
##     <Variation.ID>301-A</Variation.ID>
##     <Variation.Details>Color: Blue, Size: S</Variation.Details>
##   </Inventory>
##   <Inventory>
##     <Category> Clothing</Category>
##     <Item.Name>T-Shirt</Item.Name>
##     <Item.ID>301</Item.ID>
##     <Brand>FashionCo</Brand>
##     <Price>19.99</Price>
##     <Variation.ID>301-B</Variation.ID>
##     <Variation.Details>Color: Red, Size: M</Variation.Details>
##   </Inventory>
##   <Inventory>
##     <Category> Clothing</Category>
##     <Item.Name>T-Shirt</Item.Name>
##     <Item.ID>301</Item.ID>
##     <Brand>FashionCo</Brand>
##     <Price>19.99</Price>
##     <Variation.ID>301-C</Variation.ID>
##     <Variation.Details>Color: Green, Size: L</Variation.Details>
##   </Inventory>
##   <Inventory>
##     <Category> Clothing</Category>
##     <Item.Name>Jeans</Item.Name>
##     <Item.ID>302</Item.ID>
##     <Brand>DenimWorks</Brand>
##     <Price>49.99</Price>
##     <Variation.ID>302-A</Variation.ID>
##     <Variation.Details>Color: Dark Blue, Size: 32</Variation.Details>
##   </Inventory>
##   <Inventory>
##     <Category> Clothing</Category>
##     <Item.Name>Jeans</Item.Name>
##     <Item.ID>302</Item.ID>
##     <Brand>DenimWorks</Brand>
##     <Price>49.99</Price>
##     <Variation.ID>302-B</Variation.ID>
##     <Variation.Details>Color: Light Blue, Size: 34</Variation.Details>
##   </Inventory>
##   <Inventory>
##     <Category> Books</Category>
##     <Item.Name>Fiction Novel</Item.Name>
##     <Item.ID>401</Item.ID>
##     <Brand>-</Brand>
##     <Price>14.99</Price>
##     <Variation.ID>401-A</Variation.ID>
##     <Variation.Details>Format: Hardcover, Language: English</Variation.Details>
##   </Inventory>
##   <Inventory>
##     <Category> Books</Category>
##     <Item.Name>Fiction Novel</Item.Name>
##     <Item.ID>401</Item.ID>
##     <Brand>-</Brand>
##     <Price>14.99</Price>
##     <Variation.ID>401-B</Variation.ID>
##     <Variation.Details>Format: Paperback, Language: Spanish</Variation.Details>
##   </Inventory>
##   <Inventory>
##     <Category> Books</Category>
##     <Item.Name>Non-Fiction Guide</Item.Name>
##     <Item.ID>402</Item.ID>
##     <Brand>-</Brand>
##     <Price>24.99</Price>
##     <Variation.ID>402-A</Variation.ID>
##     <Variation.Details>Format: eBook, Language: English</Variation.Details>
##   </Inventory>
##   <Inventory>
##     <Category> Books</Category>
##     <Item.Name>Non-Fiction Guide</Item.Name>
##     <Item.ID>402</Item.ID>
##     <Brand>-</Brand>
##     <Price>24.99</Price>
##     <Variation.ID>402-B</Variation.ID>
##     <Variation.Details>Format: Paperback, Language: French</Variation.Details>
##   </Inventory>
##   <Inventory>
##     <Category> Sports Equipment</Category>
##     <Item.Name>Basketball</Item.Name>
##     <Item.ID>501</Item.ID>
##     <Brand>SportsGear</Brand>
##     <Price>29.99</Price>
##     <Variation.ID>501-A</Variation.ID>
##     <Variation.Details>Size: Size 7, Color: Orange</Variation.Details>
##   </Inventory>
##   <Inventory>
##     <Category> Sports Equipment</Category>
##     <Item.Name>Tennis Racket</Item.Name>
##     <Item.ID>502</Item.ID>
##     <Brand>RacketPro</Brand>
##     <Price>89.99</Price>
##     <Variation.ID>502-A</Variation.ID>
##     <Variation.Details>Material: Graphite, Color: Black</Variation.Details>
##   </Inventory>
##   <Inventory>
##     <Category> Sports Equipment</Category>
##     <Item.Name>Tennis Racket</Item.Name>
##     <Item.ID>502</Item.ID>
##     <Brand>RacketPro</Brand>
##     <Price>89.99</Price>
##     <Variation.ID>502-B</Variation.ID>
##     <Variation.Details>Material: Aluminum, Color: Silver</Variation.Details>
##   </Inventory>
## </lab7data>

Additionally, here we have saved the file, then called it. This makes sure our save was a success.

Pros and Cons

JSON - Some pros of are that data is stored in smaller size among small and medium datasets, and is often easy to read by actual people. Some cons are that it cannot support dates as data, as well as you cannot make comments within it.

Parquet - Storing data in columns rather than rows allows for more efficient data storage, therefore these files are typically smaller in size. Data can be queried faster as well due to being stored in columns. However, unlike JSON files, the data is not readable by humans.

HTML - This is the universal standard - all browsers support it and thus can be easily accessed. Data can also be presented in highly customizable tables. However, other languages such as JavaScript are needed to present information in more compelling ways. Additionally, quite a lot of coded is needed to store data that is simply stored in a CSV, or JSON file.

XML - Some pros are that it supports Unicode, and therefore can be accessed across languages. Additionally, you can perform changes to the data without affecting data presentation, and it is human readable. Unfortunately, it is fairly complex and this makes for larger sized files, and there everything is stored as text rather than any other data types.

##Conclusion