Delimited text retail inventory data was provided to prepare for analysis by formatting it in JSON, HTML, XML, and Parquet. An overview of the pros and cons of each of these file formats is also included.
Create string of all data provided without the first header row
txt_data = '
Electronics,Smartphone,101,TechBrand,699.99,101-A,Color: Black, Storage: 64GB
Electronics,Smartphone,101,TechBrand,699.99,101-B,Color: White, Storage: 128GB
Electronics,Laptop,102,CompuBrand,1099.99,102-A,Color: Silver, Storage: 256GB
Electronics,Laptop,102,CompuBrand,1099.99,102-B,Color: Space Gray, Storage: 512GB
Home Appliances,Refrigerator,201,HomeCool,899.99,201-A,Color: Stainless Steel, Capacity:20 cu ft
Home Appliances,Refrigerator,201,HomeCool,899.99,201-B,Color: White, Capacity: 18 cu ft
Home Appliances,Washing Machine,202,CleanTech,499.99,202-A,Type: Front Load, Capacity:4.5 cu ft
Home Appliances,Washing Machine,202,CleanTech,499.99,202-B,Type: Top Load, Capacity:5.0 cu ft
Clothing,T-Shirt,301,FashionCo,19.99,301-A,Color: Blue, Size: S
Clothing,T-Shirt,301,FashionCo,19.99,301-B,Color: Red, Size: M
Clothing,T-Shirt,301,FashionCo,19.99,301-C,Color: Green, Size: L
Clothing,Jeans,302,DenimWorks,49.99,302-A,Color: Dark Blue, Size: 32
Clothing,Jeans,302,DenimWorks,49.99,302-B,Color: Light Blue, Size: 34
Books,Fiction Novel,401,-,14.99,401-A,Format: Hardcover, Language: English
Books,Fiction Novel,401,-,14.99,401-B,Format: Paperback, Language: Spanish
Books,Non-Fiction Guide,402,-,24.99,402-A,Format: eBook, Language: English
Books,Non-Fiction Guide,402,-,24.99,402-B,Format: Paperback, Language: French
Sports Equipment,Basketball,501,SportsGear,29.99,501-A,Size: Size 7, Color: Orange
Sports Equipment,Tennis Racket,502,RacketPro,89.99,502-A,Material: Graphite, Color: Black
Sports Equipment,Tennis Racket,502,RacketPro,89.99,502-B,Material: Aluminum, Color: Silver'
Input text data into data frame, define column names, and merge the last two Variation_Details columns since this column was incorrectly separated when transformed into a data table as result of this column including a comma
df = read.table(
text = txt_data,
sep = ',',
header = FALSE,
col.names = c(
'Category', 'Item_Name', 'Item ID', 'Brand',
'Price', 'Variation_ID', 'Variation_Details_1',
'Variation_Details_2'))
df = df |>
unite('Variation_Details', 'Variation_Details_1', 'Variation_Details_2', sep = ", ")
Prepare and output JSON file from data frame
json_data = toJSON(df, pretty=T)
write(json_data, file = "DATA607_Week 7 Assignment_JSON.json")
json_data
## [
## {
## "Category": "Electronics",
## "Item_Name": "Smartphone",
## "Item.ID": 101,
## "Brand": "TechBrand",
## "Price": 699.99,
## "Variation_ID": "101-A",
## "Variation_Details": "Color: Black, Storage: 64GB"
## },
## {
## "Category": "Electronics",
## "Item_Name": "Smartphone",
## "Item.ID": 101,
## "Brand": "TechBrand",
## "Price": 699.99,
## "Variation_ID": "101-B",
## "Variation_Details": "Color: White, Storage: 128GB"
## },
## {
## "Category": "Electronics",
## "Item_Name": "Laptop",
## "Item.ID": 102,
## "Brand": "CompuBrand",
## "Price": 1099.99,
## "Variation_ID": "102-A",
## "Variation_Details": "Color: Silver, Storage: 256GB"
## },
## {
## "Category": "Electronics",
## "Item_Name": "Laptop",
## "Item.ID": 102,
## "Brand": "CompuBrand",
## "Price": 1099.99,
## "Variation_ID": "102-B",
## "Variation_Details": "Color: Space Gray, Storage: 512GB"
## },
## {
## "Category": "Home Appliances",
## "Item_Name": "Refrigerator",
## "Item.ID": 201,
## "Brand": "HomeCool",
## "Price": 899.99,
## "Variation_ID": "201-A",
## "Variation_Details": "Color: Stainless Steel, Capacity:20 cu ft"
## },
## {
## "Category": "Home Appliances",
## "Item_Name": "Refrigerator",
## "Item.ID": 201,
## "Brand": "HomeCool",
## "Price": 899.99,
## "Variation_ID": "201-B",
## "Variation_Details": "Color: White, Capacity: 18 cu ft"
## },
## {
## "Category": "Home Appliances",
## "Item_Name": "Washing Machine",
## "Item.ID": 202,
## "Brand": "CleanTech",
## "Price": 499.99,
## "Variation_ID": "202-A",
## "Variation_Details": "Type: Front Load, Capacity:4.5 cu ft"
## },
## {
## "Category": "Home Appliances",
## "Item_Name": "Washing Machine",
## "Item.ID": 202,
## "Brand": "CleanTech",
## "Price": 499.99,
## "Variation_ID": "202-B",
## "Variation_Details": "Type: Top Load, Capacity:5.0 cu ft"
## },
## {
## "Category": "Clothing",
## "Item_Name": "T-Shirt",
## "Item.ID": 301,
## "Brand": "FashionCo",
## "Price": 19.99,
## "Variation_ID": "301-A",
## "Variation_Details": "Color: Blue, Size: S"
## },
## {
## "Category": "Clothing",
## "Item_Name": "T-Shirt",
## "Item.ID": 301,
## "Brand": "FashionCo",
## "Price": 19.99,
## "Variation_ID": "301-B",
## "Variation_Details": "Color: Red, Size: M"
## },
## {
## "Category": "Clothing",
## "Item_Name": "T-Shirt",
## "Item.ID": 301,
## "Brand": "FashionCo",
## "Price": 19.99,
## "Variation_ID": "301-C",
## "Variation_Details": "Color: Green, Size: L"
## },
## {
## "Category": "Clothing",
## "Item_Name": "Jeans",
## "Item.ID": 302,
## "Brand": "DenimWorks",
## "Price": 49.99,
## "Variation_ID": "302-A",
## "Variation_Details": "Color: Dark Blue, Size: 32"
## },
## {
## "Category": "Clothing",
## "Item_Name": "Jeans",
## "Item.ID": 302,
## "Brand": "DenimWorks",
## "Price": 49.99,
## "Variation_ID": "302-B",
## "Variation_Details": "Color: Light Blue, Size: 34"
## },
## {
## "Category": "Books",
## "Item_Name": "Fiction Novel",
## "Item.ID": 401,
## "Brand": "-",
## "Price": 14.99,
## "Variation_ID": "401-A",
## "Variation_Details": "Format: Hardcover, Language: English"
## },
## {
## "Category": "Books",
## "Item_Name": "Fiction Novel",
## "Item.ID": 401,
## "Brand": "-",
## "Price": 14.99,
## "Variation_ID": "401-B",
## "Variation_Details": "Format: Paperback, Language: Spanish"
## },
## {
## "Category": "Books",
## "Item_Name": "Non-Fiction Guide",
## "Item.ID": 402,
## "Brand": "-",
## "Price": 24.99,
## "Variation_ID": "402-A",
## "Variation_Details": "Format: eBook, Language: English"
## },
## {
## "Category": "Books",
## "Item_Name": "Non-Fiction Guide",
## "Item.ID": 402,
## "Brand": "-",
## "Price": 24.99,
## "Variation_ID": "402-B",
## "Variation_Details": "Format: Paperback, Language: French"
## },
## {
## "Category": "Sports Equipment",
## "Item_Name": "Basketball",
## "Item.ID": 501,
## "Brand": "SportsGear",
## "Price": 29.99,
## "Variation_ID": "501-A",
## "Variation_Details": "Size: Size 7, Color: Orange"
## },
## {
## "Category": "Sports Equipment",
## "Item_Name": "Tennis Racket",
## "Item.ID": 502,
## "Brand": "RacketPro",
## "Price": 89.99,
## "Variation_ID": "502-A",
## "Variation_Details": "Material: Graphite, Color: Black"
## },
## {
## "Category": "Sports Equipment",
## "Item_Name": "Tennis Racket",
## "Item.ID": 502,
## "Brand": "RacketPro",
## "Price": 89.99,
## "Variation_ID": "502-B",
## "Variation_Details": "Material: Aluminum, Color: Silver"
## }
## ]
Prepare and output XML file from data frame
df_to_xml(
df, fields="tags",
record.tag = "Item", root.node='Inventory',
xml.file = "DATA607_Week 7 Assignment_XML.xml")
## {xml_document}
## <Inventory encoding="UTF-8">
## [1] <Item>\n <Category>Electronics</Category>\n <Item_Name>Smartphone</Ite ...
## [2] <Item>\n <Category>Electronics</Category>\n <Item_Name>Smartphone</Ite ...
## [3] <Item>\n <Category>Electronics</Category>\n <Item_Name>Laptop</Item_Na ...
## [4] <Item>\n <Category>Electronics</Category>\n <Item_Name>Laptop</Item_Na ...
## [5] <Item>\n <Category>Home Appliances</Category>\n <Item_Name>Refrigerato ...
## [6] <Item>\n <Category>Home Appliances</Category>\n <Item_Name>Refrigerato ...
## [7] <Item>\n <Category>Home Appliances</Category>\n <Item_Name>Washing Mac ...
## [8] <Item>\n <Category>Home Appliances</Category>\n <Item_Name>Washing Mac ...
## [9] <Item>\n <Category>Clothing</Category>\n <Item_Name>T-Shirt</Item_Name ...
## [10] <Item>\n <Category>Clothing</Category>\n <Item_Name>T-Shirt</Item_Name ...
## [11] <Item>\n <Category>Clothing</Category>\n <Item_Name>T-Shirt</Item_Name ...
## [12] <Item>\n <Category>Clothing</Category>\n <Item_Name>Jeans</Item_Name>\ ...
## [13] <Item>\n <Category>Clothing</Category>\n <Item_Name>Jeans</Item_Name>\ ...
## [14] <Item>\n <Category>Books</Category>\n <Item_Name>Fiction Novel</Item_N ...
## [15] <Item>\n <Category>Books</Category>\n <Item_Name>Fiction Novel</Item_N ...
## [16] <Item>\n <Category>Books</Category>\n <Item_Name>Non-Fiction Guide</It ...
## [17] <Item>\n <Category>Books</Category>\n <Item_Name>Non-Fiction Guide</It ...
## [18] <Item>\n <Category>Sports Equipment</Category>\n <Item_Name>Basketball ...
## [19] <Item>\n <Category>Sports Equipment</Category>\n <Item_Name>Tennis Rac ...
## [20] <Item>\n <Category>Sports Equipment</Category>\n <Item_Name>Tennis Rac ...
Prepare and output HTML table from data frame
html_data = df |>
tableHTML(
rownames=F,
border=1,
widths = rep(150, ncol(df)), ) |>
add_css_header(css = list(
c('background-color', 'color'),
c('#2d3c52', 'white')
), headers = 1:ncol(df)) |>
add_css_row(css = list(
c('font-family', 'font-size'),
c('arial', '14px')))
write(html_data, file = "DATA607_Week 7 Assignment_HTML.html")
html_data
| Category | Item_Name | Item.ID | Brand | Price | Variation_ID | Variation_Details |
|---|---|---|---|---|---|---|
| Electronics | Smartphone | 101 | TechBrand | 699.99 | 101-A | Color: Black, Storage: 64GB |
| Electronics | Smartphone | 101 | TechBrand | 699.99 | 101-B | Color: White, Storage: 128GB |
| Electronics | Laptop | 102 | CompuBrand | 1099.99 | 102-A | Color: Silver, Storage: 256GB |
| Electronics | Laptop | 102 | CompuBrand | 1099.99 | 102-B | Color: Space Gray, Storage: 512GB |
| Home Appliances | Refrigerator | 201 | HomeCool | 899.99 | 201-A | Color: Stainless Steel, Capacity:20 cu ft |
| Home Appliances | Refrigerator | 201 | HomeCool | 899.99 | 201-B | Color: White, Capacity: 18 cu ft |
| Home Appliances | Washing Machine | 202 | CleanTech | 499.99 | 202-A | Type: Front Load, Capacity:4.5 cu ft |
| Home Appliances | Washing Machine | 202 | CleanTech | 499.99 | 202-B | Type: Top Load, Capacity:5.0 cu ft |
| Clothing | T-Shirt | 301 | FashionCo | 19.99 | 301-A | Color: Blue, Size: S |
| Clothing | T-Shirt | 301 | FashionCo | 19.99 | 301-B | Color: Red, Size: M |
| Clothing | T-Shirt | 301 | FashionCo | 19.99 | 301-C | Color: Green, Size: L |
| Clothing | Jeans | 302 | DenimWorks | 49.99 | 302-A | Color: Dark Blue, Size: 32 |
| Clothing | Jeans | 302 | DenimWorks | 49.99 | 302-B | Color: Light Blue, Size: 34 |
| Books | Fiction Novel | 401 | - | 14.99 | 401-A | Format: Hardcover, Language: English |
| Books | Fiction Novel | 401 | - | 14.99 | 401-B | Format: Paperback, Language: Spanish |
| Books | Non-Fiction Guide | 402 | - | 24.99 | 402-A | Format: eBook, Language: English |
| Books | Non-Fiction Guide | 402 | - | 24.99 | 402-B | Format: Paperback, Language: French |
| Sports Equipment | Basketball | 501 | SportsGear | 29.99 | 501-A | Size: Size 7, Color: Orange |
| Sports Equipment | Tennis Racket | 502 | RacketPro | 89.99 | 502-A | Material: Graphite, Color: Black |
| Sports Equipment | Tennis Racket | 502 | RacketPro | 89.99 | 502-B | Material: Aluminum, Color: Silver |
Prepare and output parquet file from data frame
write_parquet(df, sink = "DATA607_Week 7 Assignment_PARQUET.parquet")
parquet_data = read_parquet("DATA607_Week 7 Assignment_PARQUET.parquet")
parquet_data
## # A tibble: 20 × 7
## Category Item_Name Item.ID Brand Price Variation_ID Variation_Details
## <chr> <chr> <int> <chr> <dbl> <chr> <chr>
## 1 Electronics Smartpho… 101 Tech… 700. 101-A Color: Black, S…
## 2 Electronics Smartpho… 101 Tech… 700. 101-B Color: White, S…
## 3 Electronics Laptop 102 Comp… 1100. 102-A Color: Silver, …
## 4 Electronics Laptop 102 Comp… 1100. 102-B Color: Space Gra…
## 5 Home Appliances Refriger… 201 Home… 900. 201-A Color: Stainless…
## 6 Home Appliances Refriger… 201 Home… 900. 201-B Color: White, C…
## 7 Home Appliances Washing … 202 Clea… 500. 202-A Type: Front Load…
## 8 Home Appliances Washing … 202 Clea… 500. 202-B Type: Top Load, …
## 9 Clothing T-Shirt 301 Fash… 20.0 301-A Color: Blue, Si…
## 10 Clothing T-Shirt 301 Fash… 20.0 301-B Color: Red, Siz…
## 11 Clothing T-Shirt 301 Fash… 20.0 301-C Color: Green, S…
## 12 Clothing Jeans 302 Deni… 50.0 302-A Color: Dark Blue…
## 13 Clothing Jeans 302 Deni… 50.0 302-B Color: Light Blu…
## 14 Books Fiction … 401 - 15.0 401-A Format: Hardcove…
## 15 Books Fiction … 401 - 15.0 401-B Format: Paperbac…
## 16 Books Non-Fict… 402 - 25.0 402-A Format: eBook, …
## 17 Books Non-Fict… 402 - 25.0 402-B Format: Paperbac…
## 18 Sports Equipme… Basketba… 501 Spor… 30.0 501-A Size: Size 7, C…
## 19 Sports Equipme… Tennis R… 502 Rack… 90.0 502-A Material: Graphi…
## 20 Sports Equipme… Tennis R… 502 Rack… 90.0 502-B Material: Alumin…
| File_Format | PROS | CONS |
|---|---|---|
| Parquet |
|
|
| JSON |
|
|
| XML |
|
|
| HTML |
|
|