Task:
Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author. For each book, include the title, authors, and two or three other attributes that you find interesting.
Reading in HTML data into R from Github
html_url <- getURL("https://raw.githubusercontent.com/mandiemannz/Data-607--Fall-18/master/Bookshtml.html")
#read html table
html_data <- html_url%>%
readHTMLTable() %>%
data.frame()
head(html_data)
## NULL.Title
## 1 Python Crash Course: A Hands-On, Project-Based Introduction to Programming
## 2 R for Data Science: Import, Tidy, Transform, Visualize, and Model Data
## 3 Machine Learning with R
## NULL.Author NULL.Cover.Type NULL.Subject
## 1 Eric Matthes Paperback Programming
## 2 Hadley Wickham and Garrett Grolemund Paperback Programming
## 3 Brett Lantz Paperback Programming
## NULL.Pages
## 1 525
## 2 492
## 3 424
colnames(html_data) <- str_replace(colnames(html_data),"NULL\\.", "")
colnames(html_data) <- str_replace(colnames(html_data),"\\.", " ")
kable(html_data)
| Python Crash Course: A Hands-On, Project-Based Introduction to Programming |
Eric Matthes |
Paperback |
Programming |
525 |
| R for Data Science: Import, Tidy, Transform, Visualize, and Model Data |
Hadley Wickham and Garrett Grolemund |
Paperback |
Programming |
492 |
| Machine Learning with R |
Brett Lantz |
Paperback |
Programming |
424 |
Reading XML data into R from Github
xml_url<-getURL("https://raw.githubusercontent.com/mandiemannz/Data-607--Fall-18/master/booksxml.xml")
xml_data <- xml_url %>%
xmlParse() %>%
xmlToDataFrame()
kable(xml_data)
| Python Crash Course: A Hands-On, Project-Based Introduction to Programming |
Eric Matthes |
525 |
Programming |
paperback |
| R for Data Science: Import, Tidy, Transform, Visualize, and Model Data |
Hadley Wickham and Garrett Grolemund |
492 |
Programming |
paperback |
| Machine Learning with R |
Brett Lantz |
424 |
Programming |
paperback |
Reading JSON data into R from Github
json_data <- getURLContent("https://raw.githubusercontent.com/mandiemannz/Data-607--Fall-18/master/json")
json_data_frame <- fromJSON(json_data)
json_data_frame <- do.call("rbind", lapply(json_data_frame$'books', data.frame, stringsAsFactors = FALSE))
kable(json_data_frame)
| book.title |
Python Crash Course: A Hands-On, Project-Based Introduction to Programming |
R for Data Science: Import, Tidy, Transform, Visualize, and Model Data |
Machine Learning with R |
| book.author |
Eric Matthes |
Hadley Wickham and Garrett Grolemund |
Brett Lantz |
| book.pages |
525 |
492 |
424 |
| book.category |
Programming |
Programming |
Programming |
| book.cover_type |
paperback |
paperback |
paperback |
Conclusion:
The HTML and XML data frames were identical, and the JSON data frame was slightly off. The JSON format separates each book into three different columns.