Assignment 7:working with JSON, HTML, XML, and Parquet in R

Introduction

For this assignment I had to prepare data that was provided from the CUNYMart inventory, which is located at located at 123 Example Street,Anytown, USA. To prepare the data for analysis, I had to turn the given data into JSon, HTML, XML, and Parquet files and then read them in R.

Loading packages

library(tidyverse)
library("arrow")
library(XML)
library(RCurl)
library(pdftools)
library(data.table)
library(jsonlite)

Preparing data:

I created the data frame in R from the data frame we were provided for the assignment on CUNYMart’s inventory.

DF_prep<- data.frame(
  Category = c("Electronics", "Electronics", "Electronics", "Electronics",
               "Home Appliances", "Home Appliances", "Home Appliances", "Home Appliances",
               "Clothing", "Clothing", "Clothing", "Clothing", "Clothing", 
               "Books", "Books", "Books", "Books", 
               "Sports Equipment", "Sports Equipment", "Sports Equipment"),
  Item_Name = c("Smartphone", "Smartphone", "Laptop", "Laptop",
                "Refrigerator", "Refrigerator", "Washing Machine", "Washing Machine",
                "T-Shirt", "T-Shirt", "T-Shirt", "Jeans", "Jeans",
                "Fiction Novel", "Fiction Novel", "Non-Fiction Guide", "Non-Fiction Guide",
                "Basketball", "Tennis Racket", "Tennis Racket"),
  Item_ID = c(101, 101, 102, 102,
              201, 201, 202, 202,
              301, 301, 301, 302, 302,
              401, 401, 402, 402,
              501, 502, 502),
  Brand = c("TechBrand", "TechBrand", "CompuBrand", "CompuBrand",
            "HomeCool", "HomeCool", "CleanTech", "CleanTech",
            "FashionCo", "FashionCo", "FashionCo", "DenimWorks", "DenimWorks",
            NA, NA, NA, NA,
            "SportsGear", "RacketPro", "RacketPro"),
  Price = c(699.99, 699.99, 1099.99, 1099.99,
            899.99, 899.99, 499.99, 499.99,
            19.99, 19.99, 19.99, 49.99, 49.99,
            14.99, 14.99, 24.99, 24.99,
            29.99, 89.99, 89.99),
  Variation_ID = c("101-A", "101-B", "102-A", "102-B",
                   "201-A", "201-B", "202-A", "202-B",
                   "301-A", "301-B", "301-C", "302-A", "302-B",
                   "401-A", "401-B", "402-A", "402-B",
                   "501-A", "502-A", "502-B"),
  Variation_Details = c("Color: Black, Storage: 64GB", "Color: White, Storage: 128GB",
                        "Color: Silver, Storage: 256GB", "Color: Space Gray, Storage: 512GB",
                        "Color: Stainless Steel, Capacity: 20 cu ft", "Color: White, Capacity: 18 cu ft",
                        "Type: Front Load, Capacity: 4.5 cu ft", "Type: Top Load, Capacity: 5.0 cu ft",
                        "Color: Blue, Size: S", "Color: Red, Size: M", "Color: Green, Size: L",
                        "Color: Dark Blue, Size: 32", "Color: Light Blue, Size: 34",
                        "Format: Hardcover, Language: English", "Format: Paperback, Language: Spanish",
                        "Format: eBook, Language: English", "Format: Paperback, Language: French",
                        "Size: Size 7, Color: Orange", "Material: Graphite, Color: Black", "Material: Aluminum, Color: Silver")
)

HTML

**The data frame was created into a HTML file using the textEditor application. To create the HTML manually I started with , then in the next line I used “table” (to create a table within the HTML), to create a row I used “tr”, for the first row which was my header I used

for each cell follow up the name of each column (ex.”th” Category “th”), for the following rows I used “td” as the tag for the data that belongs in the specific row and column, then to close the table I placed another “table” tag,and finally ended the file with “html”. I uploaded the html file into github, to obtain the raw data. I used getURL function to load the raw html file into R.**

html_DF<-getURL('https://raw.githubusercontent.com/Andreina-A/Assignment_7_Data607/refs/heads/main/dataframe.html')
cat(html_DF) # The view of the HTML code
## <html>
## <table>
##     <tr>
##         <th>Category</th>
##         <th>Item Name</th>
##         <th>Item ID</th>
##         <th>Brand</th>
##         <th>Price</th>
##         <th>Variation ID</th>
##         <th>Variation Details</th>
##     </tr>
##     <tr>
##         <td>Electronics</td>
##         <td>Smartphone</td>
##         <td>101</td>
##         <td>TechBrand</td>
##         <td>699.99</td>
##         <td>101-A</td>
##         <td>Color: Black, Storage: 64GB</td>
##     </tr>
##     <tr>
##         <td>Electronics</td>
##         <td>Smartphone</td>
##         <td>101</td>
##         <td>TechBrand</td>
##         <td>699.99</td>
##         <td>101-B</td>
##         <td>Color: White, Storage: 128GB</td>
##     </tr>
##     <tr>
##         <td>Electronics</td>
##         <td>Laptop</td>
##         <td>102</td>
##         <td>CompuBrand</td>
##         <td>1099.99</td>
##         <td>102-A</td>
##         <td>Color: Silver, Storage: 256GB</td>
##     </tr>
##     <tr>
##         <td>Electronics</td>
##         <td>Laptop</td>
##         <td>102</td>
##         <td>CompuBrand</td>
##         <td>1099.99</td>
##         <td>102-B</td>
##         <td>Color: Space Gray, Storage: 512GB</td>
##     </tr>
##     <tr>
##         <td>Home Appliances</td>
##         <td>Refrigerator</td>
##         <td>201</td>
##         <td>HomeCool</td>
##         <td>899.99</td>
##         <td>201-A</td>
##         <td>Color: Stainless Steel, Capacity: 20 cu ft</td>
##     </tr>
##     <tr>
##         <td>Home Appliances</td>
##         <td>Refrigerator</td>
##         <td>201</td>
##         <td>HomeCool</td>
##         <td>899.99</td>
##         <td>201-B</td>
##         <td>Color: White, Capacity: 18 cu ft</td>
##     </tr>
##     <tr>
##         <td>Home Appliances</td>
##         <td>Washing Machine</td>
##         <td>202</td>
##         <td>CleanTech</td>
##         <td>499.99</td>
##         <td>202-A</td>
##         <td>Type: Front Load, Capacity: 4.5 cu ft</td>
##     </tr>
##     <tr>
##         <td>Home Appliances</td>
##         <td>Washing Machine</td>
##         <td>202</td>
##         <td>CleanTech</td>
##         <td>499.99</td>
##         <td>202-B</td>
##         <td>Type: Top Load, Capacity: 5.0 cu ft</td>
##     </tr>
##     <tr>
##         <td>Clothing</td>
##         <td>T-Shirt</td>
##         <td>301</td>
##         <td>FashionCo</td>
##         <td>19.99</td>
##         <td>301-A</td>
##         <td>Color: Blue, Size: S</td>
##     </tr>
##     <tr>
##         <td>Clothing</td>
##         <td>T-Shirt</td>
##         <td>301</td>
##         <td>FashionCo</td>
##         <td>19.99</td>
##         <td>301-B</td>
##         <td>Color: Red, Size: M</td>
##     </tr>
##     <tr>
##         <td>Clothing</td>
##         <td>T-Shirt</td>
##         <td>301</td>
##         <td>FashionCo</td>
##         <td>19.99</td>
##         <td>301-C</td>
##         <td>Color: Green, Size: L</td>
##     </tr>
##     <tr>
##         <td>Clothing</td>
##         <td>Jeans</td>
##         <td>302</td>
##         <td>DenimWorks</td>
##         <td>49.99</td>
##         <td>302-A</td>
##         <td>Color: Dark Blue, Size: 32</td>
##     </tr>
##     <tr>
##         <td>Clothing</td>
##         <td>Jeans</td>
##         <td>302</td>
##         <td>DenimWorks</td>
##         <td>49.99</td>
##         <td>302-B</td>
##         <td>Color: Light Blue, Size: 34</td>
##     </tr>
##     <tr>
##         <td>Books</td>
##         <td>Fiction Novel</td>
##         <td>401</td>
##         <td>-</td>
##         <td>14.99</td>
##         <td>401-A</td>
##         <td>Format: Hardcover, Language: English</td>
##     </tr>
##     <tr>
##         <td>Books</td>
##         <td>Fiction Novel</td>
##         <td>401</td>
##         <td>-</td>
##         <td>14.99</td>
##         <td>401-B</td>
##         <td>Format: Paperback, Language: Spanish</td>
##     </tr>
##     <tr>
##         <td>Books</td>
##         <td>Non-Fiction Guide</td>
##         <td>402</td>
##         <td>-</td>
##         <td>24.99</td>
##         <td>402-A</td>
##         <td>Format: eBook, Language: English</td>
##     </tr>
##     <tr>
##         <td>Books</td>
##         <td>Non-Fiction Guide</td>
##         <td>402</td>
##         <td>-</td>
##         <td>24.99</td>
##         <td>402-B</td>
##         <td>Format: Paperback, Language: French</td>
##     </tr>
##     <tr>
##         <td>Sports Equipment</td>
##         <td>Basketball</td>
##         <td>501</td>
##         <td>SportsGear</td>
##         <td>29.99</td>
##         <td>501-A</td>
##         <td>Size: Size 7, Color: Orange</td>
##     </tr>
##     <tr>
##         <td>Sports Equipment</td>
##         <td>Tennis Racket</td>
##         <td>502</td>
##         <td>RacketPro</td>
##         <td>89.99</td>
##         <td>502-A</td>
##         <td>Material: Graphite, Color: Black</td>
##     </tr>
##     <tr>
##         <td>Sports Equipment</td>
##         <td>Tennis Racket</td>
##         <td>502</td>
##         <td>RacketPro</td>
##         <td>89.99</td>
##         <td>502-B</td>
##         <td>Material: Aluminum, Color: Silver</td>
##     </tr>
## </table>
## 
## 
## 
## </html>

To extract the data from the HTML file, I used readHTMLTable( a function from the XML package).

html_df <-readHTMLTable(html_DF,header=TRUE)
class(html_df)# Used calls function to see is the data extracted came out as a data frame, which it came out as a list instead.
## [1] "list"

To find the data frame, checked the class of the NULL object.

class(html_df$`NULL`)# I used NULL because it was the only generated option after I enter $
## [1] "data.frame"

I created the data frame variable with by extracted the NULL object from the HTML. Now the data is ready to use in vairbale HTML_dataframe

html_dataframe<-html_df$`NULL`
html_dataframe
##            Category         Item Name Item ID      Brand   Price Variation ID
## 1       Electronics        Smartphone     101  TechBrand  699.99        101-A
## 2       Electronics        Smartphone     101  TechBrand  699.99        101-B
## 3       Electronics            Laptop     102 CompuBrand 1099.99        102-A
## 4       Electronics            Laptop     102 CompuBrand 1099.99        102-B
## 5   Home Appliances      Refrigerator     201   HomeCool  899.99        201-A
## 6   Home Appliances      Refrigerator     201   HomeCool  899.99        201-B
## 7   Home Appliances   Washing Machine     202  CleanTech  499.99        202-A
## 8   Home Appliances   Washing Machine     202  CleanTech  499.99        202-B
## 9          Clothing           T-Shirt     301  FashionCo   19.99        301-A
## 10         Clothing           T-Shirt     301  FashionCo   19.99        301-B
## 11         Clothing           T-Shirt     301  FashionCo   19.99        301-C
## 12         Clothing             Jeans     302 DenimWorks   49.99        302-A
## 13         Clothing             Jeans     302 DenimWorks   49.99        302-B
## 14            Books     Fiction Novel     401          -   14.99        401-A
## 15            Books     Fiction Novel     401          -   14.99        401-B
## 16            Books Non-Fiction Guide     402          -   24.99        402-A
## 17            Books Non-Fiction Guide     402          -   24.99        402-B
## 18 Sports Equipment        Basketball     501 SportsGear   29.99        501-A
## 19 Sports Equipment     Tennis Racket     502  RacketPro   89.99        502-A
## 20 Sports Equipment     Tennis Racket     502  RacketPro   89.99        502-B
##                             Variation Details
## 1                 Color: Black, Storage: 64GB
## 2                Color: White, Storage: 128GB
## 3               Color: Silver, Storage: 256GB
## 4           Color: Space Gray, Storage: 512GB
## 5  Color: Stainless Steel, Capacity: 20 cu ft
## 6            Color: White, Capacity: 18 cu ft
## 7       Type: Front Load, Capacity: 4.5 cu ft
## 8         Type: Top Load, Capacity: 5.0 cu ft
## 9                        Color: Blue, Size: S
## 10                        Color: Red, Size: M
## 11                      Color: Green, Size: L
## 12                 Color: Dark Blue, Size: 32
## 13                Color: Light Blue, Size: 34
## 14       Format: Hardcover, Language: English
## 15       Format: Paperback, Language: Spanish
## 16           Format: eBook, Language: English
## 17        Format: Paperback, Language: French
## 18                Size: Size 7, Color: Orange
## 19           Material: Graphite, Color: Black
## 20          Material: Aluminum, Color: Silver

XML

XML was manually created similarly to HTML, just with a difference in tags which are more straightforward, the column title themselves were used in the tags.

DF_XML<-getURL('https://raw.githubusercontent.com/Andreina-A/Assignment_7_Data607/refs/heads/main/Data_xml.xml')
cat(DF_XML)#XML code
## <?xml version="1.0" encoding="UTF-8" ?>
## <Items>
##     <Item>
##         <Category>Electronics</Category>
##         <ItemName>Smartphone</ItemName>
##         <ItemID>101</ItemID>
##         <Brand>TechBrand</Brand>
##         <Price>699.99</Price>
##         <VariationID>101-A</VariationID>
##         <VariationDetails>
##             <Color>Black</Color>
##             <Storage>64GB</Storage>
##         </VariationDetails>
##     </Item>
##     <Item>
##         <Category>Electronics</Category>
##         <ItemName>Smartphone</ItemName>
##         <ItemID>101</ItemID>
##         <Brand>TechBrand</Brand>
##         <Price>699.99</Price>
##         <VariationID>101-B</VariationID>
##         <VariationDetails>
##             <Color>White</Color>
##             <Storage>128GB</Storage>
##         </VariationDetails>
##     </Item>
##     <Item>
##         <Category>Electronics</Category>
##         <ItemName>Laptop</ItemName>
##         <ItemID>102</ItemID>
##         <Brand>CompuBrand</Brand>
##         <Price>1099.99</Price>
##         <VariationID>102-A</VariationID>
##         <VariationDetails>
##             <Color>Silver</Color>
##             <Storage>256GB</Storage>
##         </VariationDetails>
##     </Item>
##     <Item>
##         <Category>Electronics</Category>
##         <ItemName>Laptop</ItemName>
##         <ItemID>102</ItemID>
##         <Brand>CompuBrand</Brand>
##         <Price>1099.99</Price>
##         <VariationID>102-B</VariationID>
##         <VariationDetails>
##             <Color>Space Gray</Color>
##             <Storage>512GB</Storage>
##         </VariationDetails>
##     </Item>
##     <Item>
##         <Category>Home Appliances</Category>
##         <ItemName>Refrigerator</ItemName>
##         <ItemID>201</ItemID>
##         <Brand>HomeCool</Brand>
##         <Price>899.99</Price>
##         <VariationID>201-A</VariationID>
##         <VariationDetails>
##             <Color>Stainless Steel</Color>
##             <Capacity>20 cu ft</Capacity>
##         </VariationDetails>
##     </Item>
##     <Item>
##         <Category>Home Appliances</Category>
##         <ItemName>Refrigerator</ItemName>
##         <ItemID>201</ItemID>
##         <Brand>HomeCool</Brand>
##         <Price>899.99</Price>
##         <VariationID>201-B</VariationID>
##         <VariationDetails>
##             <Color>White</Color>
##             <Capacity>18 cu ft</Capacity>
##         </VariationDetails>
##     </Item>
##     <Item>
##         <Category>Home Appliances</Category>
##         <ItemName>Washing Machine</ItemName>
##         <ItemID>202</ItemID>
##         <Brand>CleanTech</Brand>
##         <Price>499.99</Price>
##         <VariationID>202-A</VariationID>
##         <VariationDetails>
##             <Type>Front Load</Type>
##             <Capacity>4.5 cu ft</Capacity>
##         </VariationDetails>
##     </Item>
##     <Item>
##         <Category>Home Appliances</Category>
##         <ItemName>Washing Machine</ItemName>
##         <ItemID>202</ItemID>
##         <Brand>CleanTech</Brand>
##         <Price>499.99</Price>
##         <VariationID>202-B</VariationID>
##         <VariationDetails>
##             <Type>Top Load</Type>
##             <Capacity>5.0 cu ft</Capacity>
##         </VariationDetails>
##     </Item>
##     <Item>
##         <Category>Clothing</Category>
##         <ItemName>T-Shirt</ItemName>
##         <ItemID>301</ItemID>
##         <Brand>FashionCo</Brand>
##         <Price>19.99</Price>
##         <VariationID>301-A</VariationID>
##         <VariationDetails>
##             <Color>Blue</Color>
##             <Size>S</Size>
##         </VariationDetails>
##     </Item>
##     <Item>
##         <Category>Clothing</Category>
##         <ItemName>T-Shirt</ItemName>
##         <ItemID>301</ItemID>
##         <Brand>FashionCo</Brand>
##         <Price>19.99</Price>
##         <VariationID>301-B</VariationID>
##         <VariationDetails>
##             <Color>Red</Color>
##             <Size>M</Size>
##         </VariationDetails>
##     </Item>
##     <Item>
##         <Category>Clothing</Category>
##         <ItemName>T-Shirt</ItemName>
##         <ItemID>301</ItemID>
##         <Brand>FashionCo</Brand>
##         <Price>19.99</Price>
##         <VariationID>301-C</VariationID>
##         <VariationDetails>
##             <Color>Green</Color>
##             <Size>L</Size>
##         </VariationDetails>
##     </Item>
##     <Item>
##         <Category>Clothing</Category>
##         <ItemName>Jeans</ItemName>
##         <ItemID>302</ItemID>
##         <Brand>DenimWorks</Brand>
##         <Price>49.99</Price>
##         <VariationID>302-A</VariationID>
##         <VariationDetails>
##             <Color>Dark Blue</Color>
##             <Size>32</Size>
##         </VariationDetails>
##     </Item>
##     <Item>
##         <Category>Clothing</Category>
##         <ItemName>Jeans</ItemName>
##         <ItemID>302</ItemID>
##         <Brand>DenimWorks</Brand>
##         <Price>49.99</Price>
##         <VariationID>302-B</VariationID>
##         <VariationDetails>
##             <Color>Light Blue</Color>
##             <Size>34</Size>
##         </VariationDetails>
##     </Item>
##     <Item>
##         <Category>Books</Category>
##         <ItemName>Fiction Novel</ItemName>
##         <ItemID>401</ItemID>
##         <Brand>-</Brand>
##         <Price>14.99</Price>
##         <VariationID>401-A</VariationID>
##         <VariationDetails>
##             <Format>Hardcover</Format>
##             <Language>English</Language>
##         </VariationDetails>
##     </Item>
##     <Item>
##         <Category>Books</Category>
##         <ItemName>Fiction Novel</ItemName>
##         <ItemID>401</ItemID>
##         <Brand>-</Brand>
##         <Price>14.99</Price>
##         <VariationID>401-B</VariationID>
##         <VariationDetails>
##             <Format>Paperback</Format>
##             <Language>Spanish</Language>
##         </VariationDetails>
##     </Item>
##     <Item>
##         <Category>Books</Category>
##         <ItemName>Non-Fiction Guide</ItemName>
##         <ItemID>402</ItemID>
##         <Brand>-</Brand>
##         <Price>24.99</Price>
##         <VariationID>402-A</VariationID>
##         <VariationDetails>
##             <Format>eBook</Format>
##             <Language>English</Language>
##         </VariationDetails>
##     </Item>
##     <Item>
##         <Category>Books</Category>
##         <ItemName>Non-Fiction Guide</ItemName>
##         <ItemID>402</ItemID>
##         <Brand>-</Brand>
##         <Price>24.99</Price>
##         <VariationID>402-B</VariationID>
##         <VariationDetails>
##             <Format>Paperback</Format>
##             <Language>French</Language>
##         </VariationDetails>
##     </Item>
##     <Item>
##         <Category>Sports Equipment</Category>
##         <ItemName>Basketball</ItemName>
##         <ItemID>501</ItemID>
##         <Brand>SportsGear</Brand>
##         <Price>29.99</Price>
##         <VariationID>501-A</VariationID>
##         <VariationDetails>
##             <Size>Size 7</Size>
##             <Color>Orange</Color>
##         </VariationDetails>
##     </Item>
##     <Item>
##         <Category>Sports Equipment</Category>
##         <ItemName>Tennis Racket</ItemName>
##         <ItemID>502</ItemID>
##         <Brand>RacketPro</Brand>
##         <Price>89.99</Price>
##         <VariationID>502-A</VariationID>
##         <VariationDetails>
##             <Material>Graphite</Material>
##             <Color>Black</Color>
##         </VariationDetails>
##     </Item>
##     <Item>
##         <Category>Sports Equipment</Category>
##         <ItemName>Tennis Racket</ItemName>
##         <ItemID>502</ItemID>
##         <Brand>RacketPro</Brand>
##         <Price>89.99</Price>
##         <VariationID>502-B</VariationID>
##         <VariationDetails>
##             <Material>Aluminum</Material>
##             <Color>Silver</Color>
##         </VariationDetails>
##     </Item>
## </Items>

Used the class function to see if the XML file will readily show as a data frame

class(DF_XML)# output was a character
## [1] "character"

To obtain the data frame from the XML file I used the xmlToDataFrame function, now the data is ready to be used for analysis.

XML_df <-xmlToDataFrame(DF_XML)
XML_df
##            Category          ItemName ItemID      Brand   Price VariationID
## 1       Electronics        Smartphone    101  TechBrand  699.99       101-A
## 2       Electronics        Smartphone    101  TechBrand  699.99       101-B
## 3       Electronics            Laptop    102 CompuBrand 1099.99       102-A
## 4       Electronics            Laptop    102 CompuBrand 1099.99       102-B
## 5   Home Appliances      Refrigerator    201   HomeCool  899.99       201-A
## 6   Home Appliances      Refrigerator    201   HomeCool  899.99       201-B
## 7   Home Appliances   Washing Machine    202  CleanTech  499.99       202-A
## 8   Home Appliances   Washing Machine    202  CleanTech  499.99       202-B
## 9          Clothing           T-Shirt    301  FashionCo   19.99       301-A
## 10         Clothing           T-Shirt    301  FashionCo   19.99       301-B
## 11         Clothing           T-Shirt    301  FashionCo   19.99       301-C
## 12         Clothing             Jeans    302 DenimWorks   49.99       302-A
## 13         Clothing             Jeans    302 DenimWorks   49.99       302-B
## 14            Books     Fiction Novel    401          -   14.99       401-A
## 15            Books     Fiction Novel    401          -   14.99       401-B
## 16            Books Non-Fiction Guide    402          -   24.99       402-A
## 17            Books Non-Fiction Guide    402          -   24.99       402-B
## 18 Sports Equipment        Basketball    501 SportsGear   29.99       501-A
## 19 Sports Equipment     Tennis Racket    502  RacketPro   89.99       502-A
## 20 Sports Equipment     Tennis Racket    502  RacketPro   89.99       502-B
##           VariationDetails
## 1                Black64GB
## 2               White128GB
## 3              Silver256GB
## 4          Space Gray512GB
## 5  Stainless Steel20 cu ft
## 6            White18 cu ft
## 7      Front Load4.5 cu ft
## 8        Top Load5.0 cu ft
## 9                    BlueS
## 10                    RedM
## 11                  GreenL
## 12             Dark Blue32
## 13            Light Blue34
## 14        HardcoverEnglish
## 15        PaperbackSpanish
## 16            eBookEnglish
## 17         PaperbackFrench
## 18            Size 7Orange
## 19           GraphiteBlack
## 20          AluminumSilver

JSON

In order to get json code I used the data frame I created in R, using the toJSON function. Once the code was generated I created a json file using the texteditor application, and lastly I uploaded the file into github. The json codes reminds me of a python dictionary, which seems easy to code manually and easy to read.

Df_Json <- toJSON(DF_prep, pretty=TRUE)
cat(Df_Json)
## [
##   {
##     "Category": "Electronics",
##     "Item_Name": "Smartphone",
##     "Item_ID": 101,
##     "Brand": "TechBrand",
##     "Price": 699.99,
##     "Variation_ID": "101-A",
##     "Variation_Details": "Color: Black, Storage: 64GB"
##   },
##   {
##     "Category": "Electronics",
##     "Item_Name": "Smartphone",
##     "Item_ID": 101,
##     "Brand": "TechBrand",
##     "Price": 699.99,
##     "Variation_ID": "101-B",
##     "Variation_Details": "Color: White, Storage: 128GB"
##   },
##   {
##     "Category": "Electronics",
##     "Item_Name": "Laptop",
##     "Item_ID": 102,
##     "Brand": "CompuBrand",
##     "Price": 1099.99,
##     "Variation_ID": "102-A",
##     "Variation_Details": "Color: Silver, Storage: 256GB"
##   },
##   {
##     "Category": "Electronics",
##     "Item_Name": "Laptop",
##     "Item_ID": 102,
##     "Brand": "CompuBrand",
##     "Price": 1099.99,
##     "Variation_ID": "102-B",
##     "Variation_Details": "Color: Space Gray, Storage: 512GB"
##   },
##   {
##     "Category": "Home Appliances",
##     "Item_Name": "Refrigerator",
##     "Item_ID": 201,
##     "Brand": "HomeCool",
##     "Price": 899.99,
##     "Variation_ID": "201-A",
##     "Variation_Details": "Color: Stainless Steel, Capacity: 20 cu ft"
##   },
##   {
##     "Category": "Home Appliances",
##     "Item_Name": "Refrigerator",
##     "Item_ID": 201,
##     "Brand": "HomeCool",
##     "Price": 899.99,
##     "Variation_ID": "201-B",
##     "Variation_Details": "Color: White, Capacity: 18 cu ft"
##   },
##   {
##     "Category": "Home Appliances",
##     "Item_Name": "Washing Machine",
##     "Item_ID": 202,
##     "Brand": "CleanTech",
##     "Price": 499.99,
##     "Variation_ID": "202-A",
##     "Variation_Details": "Type: Front Load, Capacity: 4.5 cu ft"
##   },
##   {
##     "Category": "Home Appliances",
##     "Item_Name": "Washing Machine",
##     "Item_ID": 202,
##     "Brand": "CleanTech",
##     "Price": 499.99,
##     "Variation_ID": "202-B",
##     "Variation_Details": "Type: Top Load, Capacity: 5.0 cu ft"
##   },
##   {
##     "Category": "Clothing",
##     "Item_Name": "T-Shirt",
##     "Item_ID": 301,
##     "Brand": "FashionCo",
##     "Price": 19.99,
##     "Variation_ID": "301-A",
##     "Variation_Details": "Color: Blue, Size: S"
##   },
##   {
##     "Category": "Clothing",
##     "Item_Name": "T-Shirt",
##     "Item_ID": 301,
##     "Brand": "FashionCo",
##     "Price": 19.99,
##     "Variation_ID": "301-B",
##     "Variation_Details": "Color: Red, Size: M"
##   },
##   {
##     "Category": "Clothing",
##     "Item_Name": "T-Shirt",
##     "Item_ID": 301,
##     "Brand": "FashionCo",
##     "Price": 19.99,
##     "Variation_ID": "301-C",
##     "Variation_Details": "Color: Green, Size: L"
##   },
##   {
##     "Category": "Clothing",
##     "Item_Name": "Jeans",
##     "Item_ID": 302,
##     "Brand": "DenimWorks",
##     "Price": 49.99,
##     "Variation_ID": "302-A",
##     "Variation_Details": "Color: Dark Blue, Size: 32"
##   },
##   {
##     "Category": "Clothing",
##     "Item_Name": "Jeans",
##     "Item_ID": 302,
##     "Brand": "DenimWorks",
##     "Price": 49.99,
##     "Variation_ID": "302-B",
##     "Variation_Details": "Color: Light Blue, Size: 34"
##   },
##   {
##     "Category": "Books",
##     "Item_Name": "Fiction Novel",
##     "Item_ID": 401,
##     "Price": 14.99,
##     "Variation_ID": "401-A",
##     "Variation_Details": "Format: Hardcover, Language: English"
##   },
##   {
##     "Category": "Books",
##     "Item_Name": "Fiction Novel",
##     "Item_ID": 401,
##     "Price": 14.99,
##     "Variation_ID": "401-B",
##     "Variation_Details": "Format: Paperback, Language: Spanish"
##   },
##   {
##     "Category": "Books",
##     "Item_Name": "Non-Fiction Guide",
##     "Item_ID": 402,
##     "Price": 24.99,
##     "Variation_ID": "402-A",
##     "Variation_Details": "Format: eBook, Language: English"
##   },
##   {
##     "Category": "Books",
##     "Item_Name": "Non-Fiction Guide",
##     "Item_ID": 402,
##     "Price": 24.99,
##     "Variation_ID": "402-B",
##     "Variation_Details": "Format: Paperback, Language: French"
##   },
##   {
##     "Category": "Sports Equipment",
##     "Item_Name": "Basketball",
##     "Item_ID": 501,
##     "Brand": "SportsGear",
##     "Price": 29.99,
##     "Variation_ID": "501-A",
##     "Variation_Details": "Size: Size 7, Color: Orange"
##   },
##   {
##     "Category": "Sports Equipment",
##     "Item_Name": "Tennis Racket",
##     "Item_ID": 502,
##     "Brand": "RacketPro",
##     "Price": 89.99,
##     "Variation_ID": "502-A",
##     "Variation_Details": "Material: Graphite, Color: Black"
##   },
##   {
##     "Category": "Sports Equipment",
##     "Item_Name": "Tennis Racket",
##     "Item_ID": 502,
##     "Brand": "RacketPro",
##     "Price": 89.99,
##     "Variation_ID": "502-B",
##     "Variation_Details": "Material: Aluminum, Color: Silver"
##   }
## ]

Now that the file is created, I imported file from the github url using the fromjson function in R, which will parse the file back into a data frame.

Json_df<-fromJSON("https://raw.githubusercontent.com/Andreina-A/Assignment_7_Data607/refs/heads/main/json_df.json")
class(Json_df)
## [1] "data.frame"

**Using the data.table function I will create a variable called json _df to have the table of the json file. Now the json data dataframe is ready to be used for analysis.**

Json_df<-data.table(Json_df)
Json_df
##             Category         Item_Name Item_ID      Brand   Price Variation_ID
##               <char>            <char>   <int>     <char>   <num>       <char>
##  1:      Electronics        Smartphone     101  TechBrand  699.99        101-A
##  2:      Electronics        Smartphone     101  TechBrand  699.99        101-B
##  3:      Electronics            Laptop     102 CompuBrand 1099.99        102-A
##  4:      Electronics            Laptop     102 CompuBrand 1099.99        102-B
##  5:  Home Appliances      Refrigerator     201   HomeCool  899.99        201-A
##  6:  Home Appliances      Refrigerator     201   HomeCool  899.99        201-B
##  7:  Home Appliances   Washing Machine     202  CleanTech  499.99        202-A
##  8:  Home Appliances   Washing Machine     202  CleanTech  499.99        202-B
##  9:         Clothing           T-Shirt     301  FashionCo   19.99        301-A
## 10:         Clothing           T-Shirt     301  FashionCo   19.99        301-B
## 11:         Clothing           T-Shirt     301  FashionCo   19.99        301-C
## 12:         Clothing             Jeans     302 DenimWorks   49.99        302-A
## 13:         Clothing             Jeans     302 DenimWorks   49.99        302-B
## 14:            Books     Fiction Novel     401       <NA>   14.99        401-A
## 15:            Books     Fiction Novel     401       <NA>   14.99        401-B
## 16:            Books Non-Fiction Guide     402       <NA>   24.99        402-A
## 17:            Books Non-Fiction Guide     402       <NA>   24.99        402-B
## 18: Sports Equipment        Basketball     501 SportsGear   29.99        501-A
## 19: Sports Equipment     Tennis Racket     502  RacketPro   89.99        502-A
## 20: Sports Equipment     Tennis Racket     502  RacketPro   89.99        502-B
##             Category         Item_Name Item_ID      Brand   Price Variation_ID
##                              Variation_Details
##                                         <char>
##  1:                Color: Black, Storage: 64GB
##  2:               Color: White, Storage: 128GB
##  3:              Color: Silver, Storage: 256GB
##  4:          Color: Space Gray, Storage: 512GB
##  5: Color: Stainless Steel, Capacity: 20 cu ft
##  6:           Color: White, Capacity: 18 cu ft
##  7:      Type: Front Load, Capacity: 4.5 cu ft
##  8:        Type: Top Load, Capacity: 5.0 cu ft
##  9:                       Color: Blue, Size: S
## 10:                        Color: Red, Size: M
## 11:                      Color: Green, Size: L
## 12:                 Color: Dark Blue, Size: 32
## 13:                Color: Light Blue, Size: 34
## 14:       Format: Hardcover, Language: English
## 15:       Format: Paperback, Language: Spanish
## 16:           Format: eBook, Language: English
## 17:        Format: Paperback, Language: French
## 18:                Size: Size 7, Color: Orange
## 19:           Material: Graphite, Color: Black
## 20:          Material: Aluminum, Color: Silver
##                              Variation_Details

Parquet

To create a parquet file I used the data frame I created in the beginning of this R markdown, to write a parquet file to save into my computer. I didn’t used the github to import this file becasuwe github wasn’t able to read the file, instead it just downloaded to file back to my computer.

write_parquet(DF_prep, "CUNYMart_Inventory.parquet")

#read parquet file

To read the parquet file I used the read_parquet function (which is from the “arrow” package)

Parquet_df<-read_parquet("CUNYMart_Inventory.parquet")
class(Parquet_df) #class function showed that Parquet_df is a data frame table
## [1] "tbl_df"     "tbl"        "data.frame"
head(Parquet_df)
## # A tibble: 6 × 7
##   Category        Item_Name   Item_ID Brand Price Variation_ID Variation_Details
##   <chr>           <chr>         <dbl> <chr> <dbl> <chr>        <chr>            
## 1 Electronics     Smartphone      101 Tech…  700. 101-A        Color: Black, St…
## 2 Electronics     Smartphone      101 Tech…  700. 101-B        Color: White, St…
## 3 Electronics     Laptop          102 Comp… 1100. 102-A        Color: Silver, S…
## 4 Electronics     Laptop          102 Comp… 1100. 102-B        Color: Space Gra…
## 5 Home Appliances Refrigerat…     201 Home…  900. 201-A        Color: Stainless…
## 6 Home Appliances Refrigerat…     201 Home…  900. 201-B        Color: White, Ca…

Conlusion

HTML, XML, and Json are file formats that can be created with coding in which are human readable, while Parquet requires a machine such as a R library in order to read and write than the Parquet file. I wasn’t able to load the Parquet file into github to see the coding if there is any. In my opinion Json would be the easiest file to write because you have to use the tags nor a library like parquet to create a file. I would as parquet would have less chance of mistakes as you will be creating the data frame first in a library which you can readily fix any errors without brackets or tags being in the way, and then you just create the parquet file. When comparing files sizes from descending order XML had the highest size of 7.13KB, HTML was 4.49KB, parquet was 4.37KB, and Json had a size of 4.3KB. I would say XML is has tags that are easier to understand than HTML, but XML taks up alot of space. Overall, I would prefer workign with Json files as they are space efficent and easier to write and read.