Data 607 Assignment_7

Row

json

df.json <- fromJSON("https://raw.githubusercontent.com/Anth350z/DATA-607/master/Assignment_6/Book_list.json")

df.json
$`Book List`
                                                           title
1                                                 Everybody Lies
2 Algorithms to Live By: The Computer Science of Human Decisions
3                                          Cien años de soledad 
                         authors                publisher        isbn_10
1       Seth Stephens-Davidowitz         Dey Street Books 978-1502585455
2 Brian Christian, Tom Griffiths Picador; Reprint edition     1250118360
3         Gabriel García Márquez          Vintage Espanol     0307474720
  pages
1   352
2   368
3   496

HTML

data <- getURLContent("https://raw.githubusercontent.com/Anth350z/DATA-607/master/Assignment_6/Book_list.html")
df.html <- readHTMLTable(data)

df.html
$`NULL`
                                                           title
1                                                 Everybody Lies
2 Algorithms to Live By: The Computer Science of Human Decisions
3                                           Cien años de soledad
                        authors                publisher        isbn_10
1      Seth Stephens-Davidowitz         Dey Street Books 978-1502585455
2 Brian Christian,Tom Griffiths Picador; Reprint edition     1250118360
3        Gabriel García Márquez          Vintage Espanol     0307474720
  pages
1   352
2   368
3   496

XML

data <- getURLContent("https://raw.githubusercontent.com/Anth350z/DATA-607/master/Assignment_6/Book_list.xml")

df.xml <- xmlToDataFrame(data)
df.xml 
                                                           title
1                                                 Everybody Lies
2 Algorithms to Live By: The Computer Science of Human Decisions
3                                          Cien años de soledad 
                         authors                publisher        isbn_10
1       Seth Stephens-Davidowitz         Dey Street Books 978-1502585455
2 Brian Christian, Tom Griffiths Picador; Reprint edition     1250118360
3         Gabriel García Márquez          Vintage Espanol     0307474720
  pages
1   352
2   368
3   496

Data Analysis

On this Assignment we worked on reading data from 3 different format JSON, XML, and HTML and after we should to some comparing analysis between.

#JSON
str(df.json)
List of 1
 $ Book List:'data.frame':  3 obs. of  5 variables:
  ..$ title    : chr [1:3] "Everybody Lies" "Algorithms to Live By: The Computer Science of Human Decisions" "Cien años de soledad "
  ..$ authors  :List of 3
  .. ..$ : chr "Seth Stephens-Davidowitz"
  .. ..$ : chr [1:2] "Brian Christian" "Tom Griffiths"
  .. ..$ : chr "Gabriel García Márquez"
  ..$ publisher: chr [1:3] "Dey Street Books" "Picador; Reprint edition" "Vintage Espanol"
  ..$ isbn_10  : chr [1:3] "978-1502585455" "1250118360" "0307474720"
  ..$ pages    : int [1:3] 352 368 496
#HTML
str(df.html)
List of 1
 $ NULL:'data.frame':   3 obs. of  5 variables:
  ..$ title    : Factor w/ 3 levels "Algorithms to Live By: The Computer Science of Human Decisions",..: 3 1 2
  ..$ authors  : Factor w/ 3 levels "Brian Christian,Tom Griffiths",..: 3 1 2
  ..$ publisher: Factor w/ 3 levels "Dey Street Books",..: 1 2 3
  ..$ isbn_10  : Factor w/ 3 levels "0307474720","1250118360",..: 3 2 1
  ..$ pages    : Factor w/ 3 levels "352","368","496": 1 2 3
#XML
str(df.xml)
'data.frame':   3 obs. of  5 variables:
 $ title    : Factor w/ 3 levels "Algorithms to Live By: The Computer Science of Human Decisions",..: 3 1 2
 $ authors  : Factor w/ 3 levels "Brian Christian, Tom Griffiths",..: 3 1 2
 $ publisher: Factor w/ 3 levels "Dey Street Books",..: 1 2 3
 $ isbn_10  : Factor w/ 3 levels "0307474720","1250118360",..: 3 2 1
 $ pages    : Factor w/ 3 levels "352","368","496": 1 2 3
#using the compare library to see if the values on the dataframe are equal

compare(df.html,df.json)
FALSE [FALSE]
compare(df.html,df.xml)
FALSE [FALSE]
compare(df.xml,df.json)
FALSE

when we see the data its seem very similar but when we use the compare function, this returns false maybe due to some little change that can be observed such title column name different and the data frame structure.

---
title: "JSON, XML, HTML"
output: 
  flexdashboard::flex_dashboard:
    source_code: embed
    theme: yeti
    orientation: rows
    vertical_layout: fill
    
---

```{r}
library(rjson)
library(dplyr)
library(jsonlite)
library(XML)
library(RCurl)
library(compare)

```
Data 607 Assignment_7
==============

Inputs {.sidebar data-width=250}
-----------------------------------------------------------------------

### Assignment 7
### Anthony Munoz

Row {.tabset .tabset-fade}
-----------

### json
```{r echo=T}

df.json <- fromJSON("https://raw.githubusercontent.com/Anth350z/DATA-607/master/Assignment_6/Book_list.json")

df.json
```



### HTML
```{r echo=T}
data <- getURLContent("https://raw.githubusercontent.com/Anth350z/DATA-607/master/Assignment_6/Book_list.html")
df.html <- readHTMLTable(data)

df.html
```
### XML
```{r echo=T}
data <- getURLContent("https://raw.githubusercontent.com/Anth350z/DATA-607/master/Assignment_6/Book_list.xml")

df.xml <- xmlToDataFrame(data)
df.xml 
```

### Data Analysis

On this Assignment we worked on reading data from 3 different format JSON, XML, and HTML and after we should to some comparing analysis between.


```{r echo=T}
#JSON
str(df.json)
#HTML
str(df.html)
#XML
str(df.xml)

#using the compare library to see if the values on the dataframe are equal

compare(df.html,df.json)
compare(df.html,df.xml)
compare(df.xml,df.json)


```

when we see the data its seem very similar but when we use the compare function, this returns false maybe due to some little change that can be observed such title column name different and the data frame structure.