df.json <- fromJSON("https://raw.githubusercontent.com/Anth350z/DATA-607/master/Assignment_6/Book_list.json")
df.json$`Book List`
title
1 Everybody Lies
2 Algorithms to Live By: The Computer Science of Human Decisions
3 Cien años de soledad
authors publisher isbn_10
1 Seth Stephens-Davidowitz Dey Street Books 978-1502585455
2 Brian Christian, Tom Griffiths Picador; Reprint edition 1250118360
3 Gabriel García Márquez Vintage Espanol 0307474720
pages
1 352
2 368
3 496
data <- getURLContent("https://raw.githubusercontent.com/Anth350z/DATA-607/master/Assignment_6/Book_list.html")
df.html <- readHTMLTable(data)
df.html$`NULL`
title
1 Everybody Lies
2 Algorithms to Live By: The Computer Science of Human Decisions
3 Cien años de soledad
authors publisher isbn_10
1 Seth Stephens-Davidowitz Dey Street Books 978-1502585455
2 Brian Christian,Tom Griffiths Picador; Reprint edition 1250118360
3 Gabriel García Márquez Vintage Espanol 0307474720
pages
1 352
2 368
3 496
data <- getURLContent("https://raw.githubusercontent.com/Anth350z/DATA-607/master/Assignment_6/Book_list.xml")
df.xml <- xmlToDataFrame(data)
df.xml title
1 Everybody Lies
2 Algorithms to Live By: The Computer Science of Human Decisions
3 Cien años de soledad
authors publisher isbn_10
1 Seth Stephens-Davidowitz Dey Street Books 978-1502585455
2 Brian Christian, Tom Griffiths Picador; Reprint edition 1250118360
3 Gabriel García Márquez Vintage Espanol 0307474720
pages
1 352
2 368
3 496
On this Assignment we worked on reading data from 3 different format JSON, XML, and HTML and after we should to some comparing analysis between.
#JSON
str(df.json)List of 1
$ Book List:'data.frame': 3 obs. of 5 variables:
..$ title : chr [1:3] "Everybody Lies" "Algorithms to Live By: The Computer Science of Human Decisions" "Cien años de soledad "
..$ authors :List of 3
.. ..$ : chr "Seth Stephens-Davidowitz"
.. ..$ : chr [1:2] "Brian Christian" "Tom Griffiths"
.. ..$ : chr "Gabriel García Márquez"
..$ publisher: chr [1:3] "Dey Street Books" "Picador; Reprint edition" "Vintage Espanol"
..$ isbn_10 : chr [1:3] "978-1502585455" "1250118360" "0307474720"
..$ pages : int [1:3] 352 368 496
#HTML
str(df.html)List of 1
$ NULL:'data.frame': 3 obs. of 5 variables:
..$ title : Factor w/ 3 levels "Algorithms to Live By: The Computer Science of Human Decisions",..: 3 1 2
..$ authors : Factor w/ 3 levels "Brian Christian,Tom Griffiths",..: 3 1 2
..$ publisher: Factor w/ 3 levels "Dey Street Books",..: 1 2 3
..$ isbn_10 : Factor w/ 3 levels "0307474720","1250118360",..: 3 2 1
..$ pages : Factor w/ 3 levels "352","368","496": 1 2 3
#XML
str(df.xml)'data.frame': 3 obs. of 5 variables:
$ title : Factor w/ 3 levels "Algorithms to Live By: The Computer Science of Human Decisions",..: 3 1 2
$ authors : Factor w/ 3 levels "Brian Christian, Tom Griffiths",..: 3 1 2
$ publisher: Factor w/ 3 levels "Dey Street Books",..: 1 2 3
$ isbn_10 : Factor w/ 3 levels "0307474720","1250118360",..: 3 2 1
$ pages : Factor w/ 3 levels "352","368","496": 1 2 3
#using the compare library to see if the values on the dataframe are equal
compare(df.html,df.json)FALSE [FALSE]
compare(df.html,df.xml)FALSE [FALSE]
compare(df.xml,df.json)FALSE
when we see the data its seem very similar but when we use the compare function, this returns false maybe due to some little change that can be observed such title column name different and the data frame structure.
---
title: "JSON, XML, HTML"
output:
flexdashboard::flex_dashboard:
source_code: embed
theme: yeti
orientation: rows
vertical_layout: fill
---
```{r}
library(rjson)
library(dplyr)
library(jsonlite)
library(XML)
library(RCurl)
library(compare)
```
Data 607 Assignment_7
==============
Inputs {.sidebar data-width=250}
-----------------------------------------------------------------------
### Assignment 7
### Anthony Munoz
Row {.tabset .tabset-fade}
-----------
### json
```{r echo=T}
df.json <- fromJSON("https://raw.githubusercontent.com/Anth350z/DATA-607/master/Assignment_6/Book_list.json")
df.json
```
### HTML
```{r echo=T}
data <- getURLContent("https://raw.githubusercontent.com/Anth350z/DATA-607/master/Assignment_6/Book_list.html")
df.html <- readHTMLTable(data)
df.html
```
### XML
```{r echo=T}
data <- getURLContent("https://raw.githubusercontent.com/Anth350z/DATA-607/master/Assignment_6/Book_list.xml")
df.xml <- xmlToDataFrame(data)
df.xml
```
### Data Analysis
On this Assignment we worked on reading data from 3 different format JSON, XML, and HTML and after we should to some comparing analysis between.
```{r echo=T}
#JSON
str(df.json)
#HTML
str(df.html)
#XML
str(df.xml)
#using the compare library to see if the values on the dataframe are equal
compare(df.html,df.json)
compare(df.html,df.xml)
compare(df.xml,df.json)
```
when we see the data its seem very similar but when we use the compare function, this returns false maybe due to some little change that can be observed such title column name different and the data frame structure.