library(abind)
library(gtable)
library(markdown)
library(prettyunits)
library(promises)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(RCurl)
library(tidyverse)
## -- Attaching packages ---------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.3 v stringr 1.4.0
## v tidyr 1.1.2 v forcats 0.5.0
## v readr 1.4.0
## -- Conflicts ------------------------------------------------------------------- tidyverse_conflicts() --
## x tidyr::complete() masks RCurl::complete()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(XML)
library(knitr)
library(rjson)
library(plyr)
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following object is masked from 'package:purrr':
##
## compact
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
#JSON
library(jsonlite)
##
## Attaching package: 'jsonlite'
## The following objects are masked from 'package:rjson':
##
## fromJSON, toJSON
## The following object is masked from 'package:purrr':
##
## flatten
books_json <- fromJSON("https://raw.githubusercontent.com/Darstolk/DATA607_07/main/books_jason")
books_json <- bind_rows(books_json, .id = 'Author')
books_json
## # A tibble: 3 x 4
## ID Title Author ISBN
## <int> <chr> <chr> <chr>
## 1 1 Data Wrangling with R Bradley C. Boehmke 0135133106
## 2 2 Learning Web Design Jennifer Robbins 3319455982
## 3 3 Programming Skills for Data Science Michael Freeman 1491960205
#HTML
dasbuch_html <- readHTMLTable(
getURL("https://raw.githubusercontent.com/Darstolk/DATA607_07/main/dasbuch.html"), header = TRUE, which = 1)
class(dasbuch_html)
## [1] "data.frame"
knitr::kable(dasbuch_html)
| ID | Title | Author | ISBN |
|---|---|---|---|
| 1 | Data Wrangling with R | Bradley C. Boehmke | 3319455982 |
| 2 | Learning Web Design | Jennifer Robbins | 1491960205 |
| 3 | Programming Skills for Data Science | Michael Freeman | 0135133106 |
#XML
dasbuch_zwei <- ldply(xmlToList(getURL("https://raw.githubusercontent.com/Darstolk/DATA607_07/main/dasbuch_xml.xml")), data.frame) %>%
select(-.id)
class(dasbuch_zwei)
## [1] "data.frame"
knitr::kable(dasbuch_zwei)
| id | title | author | isbn |
|---|---|---|---|
| 1 | Data Wrangling with R | Bradley C. Boehmke | 3319455982 |
| 2 | Learning Web Design | Jennifer Robbins | 1491960205 |
| 3 | Programming Skills for Data Science | Michael Freeman | 0135133106 |
#Conclusion
The way the data is being stored from file format to file format is a bit different. It took me a while to learn the differences and finally realize that HTML format is not so much different from XML format. No clue as to why this is so. The subject matter of data is so vast and incomprehensible in this instance; therefore you need to possess many years of experience only to find your bearing on most basic techniques and processes of analyzing data in meaningful and useful way, so it can be used down the road for building more complex and useful things. JASON file format is yet another addition to this entire technology stack. I had to spend a good share of time to find out how to build this type of file. My attempts to squeeze more information in addition to title, author, and ISBN number did not bear any fruits. I gave up after having tried for a prolonged stretch of time. All I can say at the end that these files are not a joke to work with. One needs some serious technical knowledge right here backed up by quite serious high level education. The books I used as mere titles for this exercise I read as reference, still it takes a long to digest the content.
dasbuch_html == dasbuch_zwei
## ID Title Author ISBN
## [1,] TRUE TRUE TRUE TRUE
## [2,] TRUE TRUE TRUE TRUE
## [3,] TRUE TRUE TRUE TRUE
This is the end of this file.