In this analysis, we’re working with the CUNYMart inventory dataset, which has been provided as plain text. Our goal is to import this dataset and convert it into several useful formats: JSON, HTML, XML, and Parquet. We’ll explore how each format can be beneficial depending on the context and discuss their pros and cons.
First things first, we need to bring the raw dataset into R. Since we’ve received the data as text, let’s start by loading it into R as a single block of text. Here’s the raw dataset:
# Raw text input
raw_text <- "
Category,Item Name,Item ID,Brand,Price,Variation ID,Variation Details
Electronics,Smartphone,101,TechBrand,699.99,101-A,Color: Black, Storage: 64GB
Electronics,Smartphone,101,TechBrand,699.99,101-B,Color: White, Storage: 128GB
Electronics,Laptop,102,CompuBrand,1099.99,102-A,Color: Silver, Storage: 256GB
Electronics,Laptop,102,CompuBrand,1099.99,102-B,Color: Space Gray, Storage: 512GB
Home Appliances,Refrigerator,201,HomeCool,899.99,201-A,Color: Stainless Steel, Capacity: 20 cu ft
Home Appliances,Refrigerator,201,HomeCool,899.99,201-B,Color: White, Capacity: 18 cu ft
Home Appliances,Washing Machine,202,CleanTech,499.99,202-A,Type: Front Load, Capacity: 4.5 cu ft
Home Appliances,Washing Machine,202,CleanTech,499.99,202-B,Type: Top Load, Capacity: 5.0 cu ft
Clothing,T-Shirt,301,FashionCo,19.99,301-A,Color: Blue, Size: S
Clothing,T-Shirt,301,FashionCo,19.99,301-B,Color: Red, Size: M
Clothing,T-Shirt,301,FashionCo,19.99,301-C,Color: Green, Size: L
Clothing,Jeans,302,DenimWorks,49.99,302-A,Color: Dark Blue, Size: 32
Clothing,Jeans,302,DenimWorks,49.99,302-B,Color: Light Blue, Size: 34
Books,Fiction Novel,401,-,14.99,401-A,Format: Hardcover, Language: English
Books,Fiction Novel,401,-,14.99,401-B,Format: Paperback, Language: Spanish
Books,Non-Fiction Guide,402,-,24.99,402-A,Format: eBook, Language: English
Books,Non-Fiction Guide,402,-,24.99,402-B,Format: Paperback, Language: French
Sports Equipment,Basketball,501,SportsGear,29.99,501-A,Size: Size 7, Color: Orange
Sports Equipment,Tennis Racket,502,RacketPro,89.99,502-A,Material: Graphite, Color: Black
Sports Equipment,Tennis Racket,502,RacketPro,89.99,502-B,Material: Aluminum, Color: Silver
"
Now that we have the text in R, the next step is to convert it into a structured data frame. A data frame is simply a table that organizes our data into rows and columns.
# Converting raw text to a data frame
library(readr)
# Using `read_csv` to read the text as a CSV
cunymart_data <- read_csv(raw_text)
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
## Rows: 20 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Category, Item Name, Brand, Variation ID, Variation Details
## dbl (2): Item ID, Price
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Display the data frame
print(cunymart_data)
## # A tibble: 20 × 7
## Category `Item Name` `Item ID` Brand Price `Variation ID`
## <chr> <chr> <dbl> <chr> <dbl> <chr>
## 1 Electronics Smartphone 101 TechBrand 700. 101-A
## 2 Electronics Smartphone 101 TechBrand 700. 101-B
## 3 Electronics Laptop 102 CompuBrand 1100. 102-A
## 4 Electronics Laptop 102 CompuBrand 1100. 102-B
## 5 Home Appliances Refrigerator 201 HomeCool 900. 201-A
## 6 Home Appliances Refrigerator 201 HomeCool 900. 201-B
## 7 Home Appliances Washing Machine 202 CleanTech 500. 202-A
## 8 Home Appliances Washing Machine 202 CleanTech 500. 202-B
## 9 Clothing T-Shirt 301 FashionCo 20.0 301-A
## 10 Clothing T-Shirt 301 FashionCo 20.0 301-B
## 11 Clothing T-Shirt 301 FashionCo 20.0 301-C
## 12 Clothing Jeans 302 DenimWorks 50.0 302-A
## 13 Clothing Jeans 302 DenimWorks 50.0 302-B
## 14 Books Fiction Novel 401 - 15.0 401-A
## 15 Books Fiction Novel 401 - 15.0 401-B
## 16 Books Non-Fiction Guide 402 - 25.0 402-A
## 17 Books Non-Fiction Guide 402 - 25.0 402-B
## 18 Sports Equipment Basketball 501 SportsGear 30.0 501-A
## 19 Sports Equipment Tennis Racket 502 RacketPro 90.0 502-A
## 20 Sports Equipment Tennis Racket 502 RacketPro 90.0 502-B
## # ℹ 1 more variable: `Variation Details` <chr>
At this point, we’ve turned the text into a clean, organized table that we can work with.
JSON (JavaScript Object Notation) is a popular format for exchanging data between systems, especially for web applications. It’s lightweight and easy for machines to parse. Let’s convert our data into JSON format:
library(jsonlite)
# Convert to JSON format
cunymart_json <- toJSON(cunymart_data, pretty = TRUE)
cat(cunymart_json)
## [
## {
## "Category": "Electronics",
## "Item Name": "Smartphone",
## "Item ID": 101,
## "Brand": "TechBrand",
## "Price": 699.99,
## "Variation ID": "101-A",
## "Variation Details": "Color: Black, Storage: 64GB"
## },
## {
## "Category": "Electronics",
## "Item Name": "Smartphone",
## "Item ID": 101,
## "Brand": "TechBrand",
## "Price": 699.99,
## "Variation ID": "101-B",
## "Variation Details": "Color: White, Storage: 128GB"
## },
## {
## "Category": "Electronics",
## "Item Name": "Laptop",
## "Item ID": 102,
## "Brand": "CompuBrand",
## "Price": 1099.99,
## "Variation ID": "102-A",
## "Variation Details": "Color: Silver, Storage: 256GB"
## },
## {
## "Category": "Electronics",
## "Item Name": "Laptop",
## "Item ID": 102,
## "Brand": "CompuBrand",
## "Price": 1099.99,
## "Variation ID": "102-B",
## "Variation Details": "Color: Space Gray, Storage: 512GB"
## },
## {
## "Category": "Home Appliances",
## "Item Name": "Refrigerator",
## "Item ID": 201,
## "Brand": "HomeCool",
## "Price": 899.99,
## "Variation ID": "201-A",
## "Variation Details": "Color: Stainless Steel, Capacity: 20 cu ft"
## },
## {
## "Category": "Home Appliances",
## "Item Name": "Refrigerator",
## "Item ID": 201,
## "Brand": "HomeCool",
## "Price": 899.99,
## "Variation ID": "201-B",
## "Variation Details": "Color: White, Capacity: 18 cu ft"
## },
## {
## "Category": "Home Appliances",
## "Item Name": "Washing Machine",
## "Item ID": 202,
## "Brand": "CleanTech",
## "Price": 499.99,
## "Variation ID": "202-A",
## "Variation Details": "Type: Front Load, Capacity: 4.5 cu ft"
## },
## {
## "Category": "Home Appliances",
## "Item Name": "Washing Machine",
## "Item ID": 202,
## "Brand": "CleanTech",
## "Price": 499.99,
## "Variation ID": "202-B",
## "Variation Details": "Type: Top Load, Capacity: 5.0 cu ft"
## },
## {
## "Category": "Clothing",
## "Item Name": "T-Shirt",
## "Item ID": 301,
## "Brand": "FashionCo",
## "Price": 19.99,
## "Variation ID": "301-A",
## "Variation Details": "Color: Blue, Size: S"
## },
## {
## "Category": "Clothing",
## "Item Name": "T-Shirt",
## "Item ID": 301,
## "Brand": "FashionCo",
## "Price": 19.99,
## "Variation ID": "301-B",
## "Variation Details": "Color: Red, Size: M"
## },
## {
## "Category": "Clothing",
## "Item Name": "T-Shirt",
## "Item ID": 301,
## "Brand": "FashionCo",
## "Price": 19.99,
## "Variation ID": "301-C",
## "Variation Details": "Color: Green, Size: L"
## },
## {
## "Category": "Clothing",
## "Item Name": "Jeans",
## "Item ID": 302,
## "Brand": "DenimWorks",
## "Price": 49.99,
## "Variation ID": "302-A",
## "Variation Details": "Color: Dark Blue, Size: 32"
## },
## {
## "Category": "Clothing",
## "Item Name": "Jeans",
## "Item ID": 302,
## "Brand": "DenimWorks",
## "Price": 49.99,
## "Variation ID": "302-B",
## "Variation Details": "Color: Light Blue, Size: 34"
## },
## {
## "Category": "Books",
## "Item Name": "Fiction Novel",
## "Item ID": 401,
## "Brand": "-",
## "Price": 14.99,
## "Variation ID": "401-A",
## "Variation Details": "Format: Hardcover, Language: English"
## },
## {
## "Category": "Books",
## "Item Name": "Fiction Novel",
## "Item ID": 401,
## "Brand": "-",
## "Price": 14.99,
## "Variation ID": "401-B",
## "Variation Details": "Format: Paperback, Language: Spanish"
## },
## {
## "Category": "Books",
## "Item Name": "Non-Fiction Guide",
## "Item ID": 402,
## "Brand": "-",
## "Price": 24.99,
## "Variation ID": "402-A",
## "Variation Details": "Format: eBook, Language: English"
## },
## {
## "Category": "Books",
## "Item Name": "Non-Fiction Guide",
## "Item ID": 402,
## "Brand": "-",
## "Price": 24.99,
## "Variation ID": "402-B",
## "Variation Details": "Format: Paperback, Language: French"
## },
## {
## "Category": "Sports Equipment",
## "Item Name": "Basketball",
## "Item ID": 501,
## "Brand": "SportsGear",
## "Price": 29.99,
## "Variation ID": "501-A",
## "Variation Details": "Size: Size 7, Color: Orange"
## },
## {
## "Category": "Sports Equipment",
## "Item Name": "Tennis Racket",
## "Item ID": 502,
## "Brand": "RacketPro",
## "Price": 89.99,
## "Variation ID": "502-A",
## "Variation Details": "Material: Graphite, Color: Black"
## },
## {
## "Category": "Sports Equipment",
## "Item Name": "Tennis Racket",
## "Item ID": 502,
## "Brand": "RacketPro",
## "Price": 89.99,
## "Variation ID": "502-B",
## "Variation Details": "Material: Aluminum, Color: Silver"
## }
## ]
HTML (HyperText Markup Language) is primarily used for displaying data in web browsers. Converting the dataset to HTML allows us to easily display it as a table on a webpage:
library(xtable)
# Convert to HTML format
cunymart_html <- print(xtable(cunymart_data), type = 'html')
## <!-- html table generated in R 4.3.3 by xtable 1.8-4 package -->
## <!-- Sun Oct 20 10:36:30 2024 -->
## <table border=1>
## <tr> <th> </th> <th> Category </th> <th> Item Name </th> <th> Item ID </th> <th> Brand </th> <th> Price </th> <th> Variation ID </th> <th> Variation Details </th> </tr>
## <tr> <td align="right"> 1 </td> <td> Electronics </td> <td> Smartphone </td> <td align="right"> 101.00 </td> <td> TechBrand </td> <td align="right"> 699.99 </td> <td> 101-A </td> <td> Color: Black, Storage: 64GB </td> </tr>
## <tr> <td align="right"> 2 </td> <td> Electronics </td> <td> Smartphone </td> <td align="right"> 101.00 </td> <td> TechBrand </td> <td align="right"> 699.99 </td> <td> 101-B </td> <td> Color: White, Storage: 128GB </td> </tr>
## <tr> <td align="right"> 3 </td> <td> Electronics </td> <td> Laptop </td> <td align="right"> 102.00 </td> <td> CompuBrand </td> <td align="right"> 1099.99 </td> <td> 102-A </td> <td> Color: Silver, Storage: 256GB </td> </tr>
## <tr> <td align="right"> 4 </td> <td> Electronics </td> <td> Laptop </td> <td align="right"> 102.00 </td> <td> CompuBrand </td> <td align="right"> 1099.99 </td> <td> 102-B </td> <td> Color: Space Gray, Storage: 512GB </td> </tr>
## <tr> <td align="right"> 5 </td> <td> Home Appliances </td> <td> Refrigerator </td> <td align="right"> 201.00 </td> <td> HomeCool </td> <td align="right"> 899.99 </td> <td> 201-A </td> <td> Color: Stainless Steel, Capacity: 20 cu ft </td> </tr>
## <tr> <td align="right"> 6 </td> <td> Home Appliances </td> <td> Refrigerator </td> <td align="right"> 201.00 </td> <td> HomeCool </td> <td align="right"> 899.99 </td> <td> 201-B </td> <td> Color: White, Capacity: 18 cu ft </td> </tr>
## <tr> <td align="right"> 7 </td> <td> Home Appliances </td> <td> Washing Machine </td> <td align="right"> 202.00 </td> <td> CleanTech </td> <td align="right"> 499.99 </td> <td> 202-A </td> <td> Type: Front Load, Capacity: 4.5 cu ft </td> </tr>
## <tr> <td align="right"> 8 </td> <td> Home Appliances </td> <td> Washing Machine </td> <td align="right"> 202.00 </td> <td> CleanTech </td> <td align="right"> 499.99 </td> <td> 202-B </td> <td> Type: Top Load, Capacity: 5.0 cu ft </td> </tr>
## <tr> <td align="right"> 9 </td> <td> Clothing </td> <td> T-Shirt </td> <td align="right"> 301.00 </td> <td> FashionCo </td> <td align="right"> 19.99 </td> <td> 301-A </td> <td> Color: Blue, Size: S </td> </tr>
## <tr> <td align="right"> 10 </td> <td> Clothing </td> <td> T-Shirt </td> <td align="right"> 301.00 </td> <td> FashionCo </td> <td align="right"> 19.99 </td> <td> 301-B </td> <td> Color: Red, Size: M </td> </tr>
## <tr> <td align="right"> 11 </td> <td> Clothing </td> <td> T-Shirt </td> <td align="right"> 301.00 </td> <td> FashionCo </td> <td align="right"> 19.99 </td> <td> 301-C </td> <td> Color: Green, Size: L </td> </tr>
## <tr> <td align="right"> 12 </td> <td> Clothing </td> <td> Jeans </td> <td align="right"> 302.00 </td> <td> DenimWorks </td> <td align="right"> 49.99 </td> <td> 302-A </td> <td> Color: Dark Blue, Size: 32 </td> </tr>
## <tr> <td align="right"> 13 </td> <td> Clothing </td> <td> Jeans </td> <td align="right"> 302.00 </td> <td> DenimWorks </td> <td align="right"> 49.99 </td> <td> 302-B </td> <td> Color: Light Blue, Size: 34 </td> </tr>
## <tr> <td align="right"> 14 </td> <td> Books </td> <td> Fiction Novel </td> <td align="right"> 401.00 </td> <td> - </td> <td align="right"> 14.99 </td> <td> 401-A </td> <td> Format: Hardcover, Language: English </td> </tr>
## <tr> <td align="right"> 15 </td> <td> Books </td> <td> Fiction Novel </td> <td align="right"> 401.00 </td> <td> - </td> <td align="right"> 14.99 </td> <td> 401-B </td> <td> Format: Paperback, Language: Spanish </td> </tr>
## <tr> <td align="right"> 16 </td> <td> Books </td> <td> Non-Fiction Guide </td> <td align="right"> 402.00 </td> <td> - </td> <td align="right"> 24.99 </td> <td> 402-A </td> <td> Format: eBook, Language: English </td> </tr>
## <tr> <td align="right"> 17 </td> <td> Books </td> <td> Non-Fiction Guide </td> <td align="right"> 402.00 </td> <td> - </td> <td align="right"> 24.99 </td> <td> 402-B </td> <td> Format: Paperback, Language: French </td> </tr>
## <tr> <td align="right"> 18 </td> <td> Sports Equipment </td> <td> Basketball </td> <td align="right"> 501.00 </td> <td> SportsGear </td> <td align="right"> 29.99 </td> <td> 501-A </td> <td> Size: Size 7, Color: Orange </td> </tr>
## <tr> <td align="right"> 19 </td> <td> Sports Equipment </td> <td> Tennis Racket </td> <td align="right"> 502.00 </td> <td> RacketPro </td> <td align="right"> 89.99 </td> <td> 502-A </td> <td> Material: Graphite, Color: Black </td> </tr>
## <tr> <td align="right"> 20 </td> <td> Sports Equipment </td> <td> Tennis Racket </td> <td align="right"> 502.00 </td> <td> RacketPro </td> <td align="right"> 89.99 </td> <td> 502-B </td> <td> Material: Aluminum, Color: Silver </td> </tr>
## </table>
cat(cunymart_html)
## <!-- html table generated in R 4.3.3 by xtable 1.8-4 package -->
## <!-- Sun Oct 20 10:36:30 2024 -->
## <table border=1>
## <tr> <th> </th> <th> Category </th> <th> Item Name </th> <th> Item ID </th> <th> Brand </th> <th> Price </th> <th> Variation ID </th> <th> Variation Details </th> </tr>
## <tr> <td align="right"> 1 </td> <td> Electronics </td> <td> Smartphone </td> <td align="right"> 101.00 </td> <td> TechBrand </td> <td align="right"> 699.99 </td> <td> 101-A </td> <td> Color: Black, Storage: 64GB </td> </tr>
## <tr> <td align="right"> 2 </td> <td> Electronics </td> <td> Smartphone </td> <td align="right"> 101.00 </td> <td> TechBrand </td> <td align="right"> 699.99 </td> <td> 101-B </td> <td> Color: White, Storage: 128GB </td> </tr>
## <tr> <td align="right"> 3 </td> <td> Electronics </td> <td> Laptop </td> <td align="right"> 102.00 </td> <td> CompuBrand </td> <td align="right"> 1099.99 </td> <td> 102-A </td> <td> Color: Silver, Storage: 256GB </td> </tr>
## <tr> <td align="right"> 4 </td> <td> Electronics </td> <td> Laptop </td> <td align="right"> 102.00 </td> <td> CompuBrand </td> <td align="right"> 1099.99 </td> <td> 102-B </td> <td> Color: Space Gray, Storage: 512GB </td> </tr>
## <tr> <td align="right"> 5 </td> <td> Home Appliances </td> <td> Refrigerator </td> <td align="right"> 201.00 </td> <td> HomeCool </td> <td align="right"> 899.99 </td> <td> 201-A </td> <td> Color: Stainless Steel, Capacity: 20 cu ft </td> </tr>
## <tr> <td align="right"> 6 </td> <td> Home Appliances </td> <td> Refrigerator </td> <td align="right"> 201.00 </td> <td> HomeCool </td> <td align="right"> 899.99 </td> <td> 201-B </td> <td> Color: White, Capacity: 18 cu ft </td> </tr>
## <tr> <td align="right"> 7 </td> <td> Home Appliances </td> <td> Washing Machine </td> <td align="right"> 202.00 </td> <td> CleanTech </td> <td align="right"> 499.99 </td> <td> 202-A </td> <td> Type: Front Load, Capacity: 4.5 cu ft </td> </tr>
## <tr> <td align="right"> 8 </td> <td> Home Appliances </td> <td> Washing Machine </td> <td align="right"> 202.00 </td> <td> CleanTech </td> <td align="right"> 499.99 </td> <td> 202-B </td> <td> Type: Top Load, Capacity: 5.0 cu ft </td> </tr>
## <tr> <td align="right"> 9 </td> <td> Clothing </td> <td> T-Shirt </td> <td align="right"> 301.00 </td> <td> FashionCo </td> <td align="right"> 19.99 </td> <td> 301-A </td> <td> Color: Blue, Size: S </td> </tr>
## <tr> <td align="right"> 10 </td> <td> Clothing </td> <td> T-Shirt </td> <td align="right"> 301.00 </td> <td> FashionCo </td> <td align="right"> 19.99 </td> <td> 301-B </td> <td> Color: Red, Size: M </td> </tr>
## <tr> <td align="right"> 11 </td> <td> Clothing </td> <td> T-Shirt </td> <td align="right"> 301.00 </td> <td> FashionCo </td> <td align="right"> 19.99 </td> <td> 301-C </td> <td> Color: Green, Size: L </td> </tr>
## <tr> <td align="right"> 12 </td> <td> Clothing </td> <td> Jeans </td> <td align="right"> 302.00 </td> <td> DenimWorks </td> <td align="right"> 49.99 </td> <td> 302-A </td> <td> Color: Dark Blue, Size: 32 </td> </tr>
## <tr> <td align="right"> 13 </td> <td> Clothing </td> <td> Jeans </td> <td align="right"> 302.00 </td> <td> DenimWorks </td> <td align="right"> 49.99 </td> <td> 302-B </td> <td> Color: Light Blue, Size: 34 </td> </tr>
## <tr> <td align="right"> 14 </td> <td> Books </td> <td> Fiction Novel </td> <td align="right"> 401.00 </td> <td> - </td> <td align="right"> 14.99 </td> <td> 401-A </td> <td> Format: Hardcover, Language: English </td> </tr>
## <tr> <td align="right"> 15 </td> <td> Books </td> <td> Fiction Novel </td> <td align="right"> 401.00 </td> <td> - </td> <td align="right"> 14.99 </td> <td> 401-B </td> <td> Format: Paperback, Language: Spanish </td> </tr>
## <tr> <td align="right"> 16 </td> <td> Books </td> <td> Non-Fiction Guide </td> <td align="right"> 402.00 </td> <td> - </td> <td align="right"> 24.99 </td> <td> 402-A </td> <td> Format: eBook, Language: English </td> </tr>
## <tr> <td align="right"> 17 </td> <td> Books </td> <td> Non-Fiction Guide </td> <td align="right"> 402.00 </td> <td> - </td> <td align="right"> 24.99 </td> <td> 402-B </td> <td> Format: Paperback, Language: French </td> </tr>
## <tr> <td align="right"> 18 </td> <td> Sports Equipment </td> <td> Basketball </td> <td align="right"> 501.00 </td> <td> SportsGear </td> <td align="right"> 29.99 </td> <td> 501-A </td> <td> Size: Size 7, Color: Orange </td> </tr>
## <tr> <td align="right"> 19 </td> <td> Sports Equipment </td> <td> Tennis Racket </td> <td align="right"> 502.00 </td> <td> RacketPro </td> <td align="right"> 89.99 </td> <td> 502-A </td> <td> Material: Graphite, Color: Black </td> </tr>
## <tr> <td align="right"> 20 </td> <td> Sports Equipment </td> <td> Tennis Racket </td> <td align="right"> 502.00 </td> <td> RacketPro </td> <td align="right"> 89.99 </td> <td> 502-B </td> <td> Material: Aluminum, Color: Silver </td> </tr>
## </table>
Perfect for displaying data on web pages.
Easy for humans to read in a browser. ### Cons of HTML:
Not designed for storing large datasets.
Bulky compared to other formats like JSON or Parquet.
XML (eXtensible Markup Language) is another format for structured data, often used for exchanging information between systems. It’s similar to JSON but is more verbose and has strict rules for structure:
library(XML)
xml_doc <- newXMLDoc()
root <- newXMLNode("inventory", doc = xml_doc)
suppressWarnings(
# Add rows as nodes
for (i in 1:nrow(cunymart_data)) {
item_node <- newXMLNode("item", parent = root)
for (col in names(cunymart_data)) {
newXMLNode(col, cunymart_data[i, col], parent = item_node)
}
})
# Save XML
cunymart_xml <- saveXML(xml_doc)
cat(cunymart_xml)
## <?xml version="1.0"?>
## <inventory>
## <item>
## <Category>Electronics</Category>
## <Item Name>Smartphone</Item Name>
## <Item ID>101</Item ID>
## <Brand>TechBrand</Brand>
## <Price>699.99</Price>
## <Variation ID>101-A</Variation ID>
## <Variation Details>Color: Black, Storage: 64GB</Variation Details>
## </item>
## <item>
## <Category>Electronics</Category>
## <Item Name>Smartphone</Item Name>
## <Item ID>101</Item ID>
## <Brand>TechBrand</Brand>
## <Price>699.99</Price>
## <Variation ID>101-B</Variation ID>
## <Variation Details>Color: White, Storage: 128GB</Variation Details>
## </item>
## <item>
## <Category>Electronics</Category>
## <Item Name>Laptop</Item Name>
## <Item ID>102</Item ID>
## <Brand>CompuBrand</Brand>
## <Price>1099.99</Price>
## <Variation ID>102-A</Variation ID>
## <Variation Details>Color: Silver, Storage: 256GB</Variation Details>
## </item>
## <item>
## <Category>Electronics</Category>
## <Item Name>Laptop</Item Name>
## <Item ID>102</Item ID>
## <Brand>CompuBrand</Brand>
## <Price>1099.99</Price>
## <Variation ID>102-B</Variation ID>
## <Variation Details>Color: Space Gray, Storage: 512GB</Variation Details>
## </item>
## <item>
## <Category>Home Appliances</Category>
## <Item Name>Refrigerator</Item Name>
## <Item ID>201</Item ID>
## <Brand>HomeCool</Brand>
## <Price>899.99</Price>
## <Variation ID>201-A</Variation ID>
## <Variation Details>Color: Stainless Steel, Capacity: 20 cu ft</Variation Details>
## </item>
## <item>
## <Category>Home Appliances</Category>
## <Item Name>Refrigerator</Item Name>
## <Item ID>201</Item ID>
## <Brand>HomeCool</Brand>
## <Price>899.99</Price>
## <Variation ID>201-B</Variation ID>
## <Variation Details>Color: White, Capacity: 18 cu ft</Variation Details>
## </item>
## <item>
## <Category>Home Appliances</Category>
## <Item Name>Washing Machine</Item Name>
## <Item ID>202</Item ID>
## <Brand>CleanTech</Brand>
## <Price>499.99</Price>
## <Variation ID>202-A</Variation ID>
## <Variation Details>Type: Front Load, Capacity: 4.5 cu ft</Variation Details>
## </item>
## <item>
## <Category>Home Appliances</Category>
## <Item Name>Washing Machine</Item Name>
## <Item ID>202</Item ID>
## <Brand>CleanTech</Brand>
## <Price>499.99</Price>
## <Variation ID>202-B</Variation ID>
## <Variation Details>Type: Top Load, Capacity: 5.0 cu ft</Variation Details>
## </item>
## <item>
## <Category>Clothing</Category>
## <Item Name>T-Shirt</Item Name>
## <Item ID>301</Item ID>
## <Brand>FashionCo</Brand>
## <Price>19.99</Price>
## <Variation ID>301-A</Variation ID>
## <Variation Details>Color: Blue, Size: S</Variation Details>
## </item>
## <item>
## <Category>Clothing</Category>
## <Item Name>T-Shirt</Item Name>
## <Item ID>301</Item ID>
## <Brand>FashionCo</Brand>
## <Price>19.99</Price>
## <Variation ID>301-B</Variation ID>
## <Variation Details>Color: Red, Size: M</Variation Details>
## </item>
## <item>
## <Category>Clothing</Category>
## <Item Name>T-Shirt</Item Name>
## <Item ID>301</Item ID>
## <Brand>FashionCo</Brand>
## <Price>19.99</Price>
## <Variation ID>301-C</Variation ID>
## <Variation Details>Color: Green, Size: L</Variation Details>
## </item>
## <item>
## <Category>Clothing</Category>
## <Item Name>Jeans</Item Name>
## <Item ID>302</Item ID>
## <Brand>DenimWorks</Brand>
## <Price>49.99</Price>
## <Variation ID>302-A</Variation ID>
## <Variation Details>Color: Dark Blue, Size: 32</Variation Details>
## </item>
## <item>
## <Category>Clothing</Category>
## <Item Name>Jeans</Item Name>
## <Item ID>302</Item ID>
## <Brand>DenimWorks</Brand>
## <Price>49.99</Price>
## <Variation ID>302-B</Variation ID>
## <Variation Details>Color: Light Blue, Size: 34</Variation Details>
## </item>
## <item>
## <Category>Books</Category>
## <Item Name>Fiction Novel</Item Name>
## <Item ID>401</Item ID>
## <Brand>-</Brand>
## <Price>14.99</Price>
## <Variation ID>401-A</Variation ID>
## <Variation Details>Format: Hardcover, Language: English</Variation Details>
## </item>
## <item>
## <Category>Books</Category>
## <Item Name>Fiction Novel</Item Name>
## <Item ID>401</Item ID>
## <Brand>-</Brand>
## <Price>14.99</Price>
## <Variation ID>401-B</Variation ID>
## <Variation Details>Format: Paperback, Language: Spanish</Variation Details>
## </item>
## <item>
## <Category>Books</Category>
## <Item Name>Non-Fiction Guide</Item Name>
## <Item ID>402</Item ID>
## <Brand>-</Brand>
## <Price>24.99</Price>
## <Variation ID>402-A</Variation ID>
## <Variation Details>Format: eBook, Language: English</Variation Details>
## </item>
## <item>
## <Category>Books</Category>
## <Item Name>Non-Fiction Guide</Item Name>
## <Item ID>402</Item ID>
## <Brand>-</Brand>
## <Price>24.99</Price>
## <Variation ID>402-B</Variation ID>
## <Variation Details>Format: Paperback, Language: French</Variation Details>
## </item>
## <item>
## <Category>Sports Equipment</Category>
## <Item Name>Basketball</Item Name>
## <Item ID>501</Item ID>
## <Brand>SportsGear</Brand>
## <Price>29.99</Price>
## <Variation ID>501-A</Variation ID>
## <Variation Details>Size: Size 7, Color: Orange</Variation Details>
## </item>
## <item>
## <Category>Sports Equipment</Category>
## <Item Name>Tennis Racket</Item Name>
## <Item ID>502</Item ID>
## <Brand>RacketPro</Brand>
## <Price>89.99</Price>
## <Variation ID>502-A</Variation ID>
## <Variation Details>Material: Graphite, Color: Black</Variation Details>
## </item>
## <item>
## <Category>Sports Equipment</Category>
## <Item Name>Tennis Racket</Item Name>
## <Item ID>502</Item ID>
## <Brand>RacketPro</Brand>
## <Price>89.99</Price>
## <Variation ID>502-B</Variation ID>
## <Variation Details>Material: Aluminum, Color: Silver</Variation Details>
## </item>
## </inventory>
Parquet is a columnar storage format often used in big data processing. It’s optimized for storing and reading large datasets efficiently:
library(arrow)
##
## Attaching package: 'arrow'
## The following object is masked from 'package:utils':
##
## timestamp
# Convert to Parquet format
write_parquet(cunymart_data, "cunymart_data.parquet")
# Reading the data back from the Parquet file
cunymart_parquet <- read_parquet("cunymart_data.parquet")
# Display the data
print(cunymart_parquet)
## # A tibble: 20 × 7
## Category `Item Name` `Item ID` Brand Price `Variation ID`
## * <chr> <chr> <dbl> <chr> <dbl> <chr>
## 1 Electronics Smartphone 101 TechBrand 700. 101-A
## 2 Electronics Smartphone 101 TechBrand 700. 101-B
## 3 Electronics Laptop 102 CompuBrand 1100. 102-A
## 4 Electronics Laptop 102 CompuBrand 1100. 102-B
## 5 Home Appliances Refrigerator 201 HomeCool 900. 201-A
## 6 Home Appliances Refrigerator 201 HomeCool 900. 201-B
## 7 Home Appliances Washing Machine 202 CleanTech 500. 202-A
## 8 Home Appliances Washing Machine 202 CleanTech 500. 202-B
## 9 Clothing T-Shirt 301 FashionCo 20.0 301-A
## 10 Clothing T-Shirt 301 FashionCo 20.0 301-B
## 11 Clothing T-Shirt 301 FashionCo 20.0 301-C
## 12 Clothing Jeans 302 DenimWorks 50.0 302-A
## 13 Clothing Jeans 302 DenimWorks 50.0 302-B
## 14 Books Fiction Novel 401 - 15.0 401-A
## 15 Books Fiction Novel 401 - 15.0 401-B
## 16 Books Non-Fiction Guide 402 - 25.0 402-A
## 17 Books Non-Fiction Guide 402 - 25.0 402-B
## 18 Sports Equipment Basketball 501 SportsGear 30.0 501-A
## 19 Sports Equipment Tennis Racket 502 RacketPro 90.0 502-A
## 20 Sports Equipment Tennis Racket 502 RacketPro 90.0 502-B
## # ℹ 1 more variable: `Variation Details` <chr>
In this report, we took the CUNYMart inventory dataset, imported it from text, and converted it into several useful formats: JSON, HTML, XML, and Parquet. Each format serves a specific purpose, depending on whether the data is for storage, machine-to-machine communication, or display on a webpage. JSON and Parquet are more efficient for data handling, while HTML is great for human readability, and XML offers a strict structure ideal for data exchange.