The tables and dataframes differ slightly; while creating the XML file I eliminated spaces in the column names and replaced with an underscore. The JSON file contained a heading “Graphic Novel” and different column/variable title “Name”. Dates were also formatted differently. I tested an Access xml export function after creating the xml table manually as suggested (used here, above); the default settings produced complications but was informative.
Link: https://raw.githubusercontent.com/sigmasigmaiota/GraphicNovels/master/GraphicNovels.xml
The packages XML and RCurl download the file from GitHub; then I convert to a list and bind into a dataframe. kableExtra displays the result.
library(XML)
library(RCurl)
library(kableExtra)
#set url, filename.
xmlurl<-"https://raw.githubusercontent.com/sigmasigmaiota/GraphicNovels/master/GraphicNovels.xml"
#RCurl.
xmlfile<-getURL(xmlurl)
#Parse the XML file.
xml.table <- xmlParse(xmlfile,useInternal=TRUE)
xml.table2<-xmlToList(xml.table)
#Convert to dataframe.
GN.xml<-do.call(rbind.data.frame, xml.table2)
rownames(GN.xml)<-NULL
kable(GN.xml)%>%
kable_styling()
| Name | Publisher | First_Issue_Date | Last_Issue_Date | Author | CoAuthor_1 | CoAuthor_2 |
|---|---|---|---|---|---|---|
| The Sandman | DC | 1989-01-01 | 1996-03-01 | Neil Gaiman | Sam Kieth | Mike Dringenberg |
| Watchmen | DC | 1984-02-01 | 1987-09-01 | Allen Moore | Dave Gibbons | John Higgins |
| The Swamp Thing | DC | 1986-01-01 | 1987-12-01 | Allen Moore | Stephen Bissette | Jon Totleben |
The file, as it looks after download from GitHub.
print(xml.table)
## <?xml version="1.0"?>
## <GraphicNovels>
## <Graphic_Novel>
## <Name>The Sandman</Name>
## <Publisher>DC</Publisher>
## <First_Issue_Date>1989-01-01</First_Issue_Date>
## <Last_Issue_Date>1996-03-01</Last_Issue_Date>
## <Author>Neil Gaiman</Author>
## <CoAuthor_1>Sam Kieth</CoAuthor_1>
## <CoAuthor_2>Mike Dringenberg</CoAuthor_2>
## </Graphic_Novel>
## <Graphic_Novel>
## <Name>Watchmen</Name>
## <Publisher>DC</Publisher>
## <First_Issue_Date>1984-02-01</First_Issue_Date>
## <Last_Issue_Date>1987-09-01</Last_Issue_Date>
## <Author>Allen Moore</Author>
## <CoAuthor_1>Dave Gibbons</CoAuthor_1>
## <CoAuthor_2>John Higgins</CoAuthor_2>
## </Graphic_Novel>
## <Graphic_Novel>
## <Name>The Swamp Thing</Name>
## <Publisher>DC</Publisher>
## <First_Issue_Date>1986-01-01</First_Issue_Date>
## <Last_Issue_Date>1987-12-01</Last_Issue_Date>
## <Author>Allen Moore</Author>
## <CoAuthor_1>Stephen Bissette</CoAuthor_1>
## <CoAuthor_2>Jon Totleben</CoAuthor_2>
## </Graphic_Novel>
## </GraphicNovels>
##
Link: https://raw.githubusercontent.com/sigmasigmaiota/GraphicNovels/master/GraphicNovels.json
The package jsonlite is used to download and parse the JSON file.
library(jsonlite)
#set url, filename.
jsonurl<-"https://raw.githubusercontent.com/sigmasigmaiota/GraphicNovels/master/GraphicNovels.json"
#RCurl.
jsonfile<-getURL(jsonurl)
# Give the input file name to the function.
jsontable<- fromJSON(jsonfile)
#Unname the table to avoid column name changes.
GN.json<-as.data.frame(unname(jsontable))
kable(GN.json)%>%
kable_styling()
| Name | Publisher | First.Issue.Date | Last.Issue.Date | Author | CoAuthor.1 | CoAuthor.2 |
|---|---|---|---|---|---|---|
| The Sandman | DC | 1/1/89 | 3/1/96 | Neil Gaiman | Sam Kieth | Mike Dringenberg |
| Watchmen | DC | 1/1/86 | 12/1/87 | Allen Moore | Dave Gibbons | John Higgins |
| The Swamp Thing | DC | 2/1/84 | 9/1/87 | Allen Moore | Stephen Bissette | Jon Totleben |
The file, as it looks after download from GitHub, before parsing.
print(jsonfile)
## [1] "{\r\n\t\"Graphic Novels\": [{\r\n\t\t\t\"Name\": \"The Sandman\",\r\n\t\t\t\"Publisher\": \"DC\",\r\n\t\t\t\"First Issue Date\": \"1/1/89\",\r\n\t\t\t\"Last Issue Date\": \"3/1/96\",\r\n\t\t\t\"Author\": \"Neil Gaiman\",\r\n\t\t\t\"CoAuthor 1\": \"Sam Kieth\",\r\n\t\t\t\"CoAuthor 2\": \"Mike Dringenberg\"\r\n\t\t},\r\n\t\t{\r\n\t\t\t\"Name\": \"Watchmen\",\r\n\t\t\t\"Publisher\": \"DC\",\r\n\t\t\t\"First Issue Date\": \"1/1/86\",\r\n\t\t\t\"Last Issue Date\": \"12/1/87\",\r\n\t\t\t\"Author\": \"Allen Moore\",\r\n\t\t\t\"CoAuthor 1\": \"Dave Gibbons\",\r\n\t\t\t\"CoAuthor 2\": \"John Higgins\"\r\n\t\t},\r\n\t\t{\r\n\t\t\t\"Name\": \"The Swamp Thing\",\r\n\t\t\t\"Publisher\": \"DC\",\r\n\t\t\t\"First Issue Date\": \"2/1/84\",\r\n\t\t\t\"Last Issue Date\": \"9/1/87\",\r\n\t\t\t\"Author\": \"Allen Moore\",\r\n\t\t\t\"CoAuthor 1\": \"Stephen Bissette\",\r\n\t\t\t\"CoAuthor 2\": \"Jon Totleben\"\r\n\t\t}\r\n\t]\r\n}"
Link: https://raw.githubusercontent.com/sigmasigmaiota/GraphicNovels/master/GraphicNovels.html
The package XML is used to download and parse; rlist helps organize.
library(rlist)
htmlurl<-"https://raw.githubusercontent.com/sigmasigmaiota/GraphicNovels/master/GraphicNovels.html"
htmlfile<-getURL(htmlurl)
#Alternate command.
htmltable2<- readHTMLTable(htmlfile)
htmltable2<-list.clean(htmltable2,fun=is.null,recursive=FALSE)
#Unname the table to avoid column name changes.
GN.html<-as.data.frame(unname(htmltable2))
kable(GN.html)%>%
kable_styling()
| Graphic.Novel | Publisher | First.Issue.Date | Last.Issue.Date | Author | CoAuthor.1 | CoAuthor.2 |
|---|---|---|---|---|---|---|
| The Sandman | DC | 1/1/89 | 3/1/96 | Neil Gaiman | Sam Kieth | Mike Dringenberg |
| Watchmen | DC | 1/1/86 | 12/1/87 | Allen Moore | Dave Gibbons | John Higgins |
| The Swamp Thing | DC | 2/1/84 | 9/1/87 | Allen Moore | Stephen Bissette | Jon Totleben |
The file, as it looks after download from GitHub, before parsing.
print(htmlfile)
## [1] "<!DOCTYPE html>\r\n<html lang=\"en\"> \r\n<head>\r\n<meta charset=\"utf-8\"/>\r\n<title>Graphic Novels</title>\r\n<style>\r\ntable {\r\n font-family: arial, sans-serif;\r\n border-collapse: collapse;\r\n width: 100%;\r\n}\r\ntd, th {\r\n border: 1px solid #dddddd;\r\n text-align: left;\r\n padding: 8px;\r\n}\r\ntr:nth-child(even) {\r\n background-color: #dddddd;\r\n}\r\n</style>\r\n</head>\r\n<body>\r\n<h2>Graphic Novels</h2>\r\n<table>\r\n <tr>\r\n <th>Graphic Novel</th>\r\n <th>Publisher</th>\r\n <th>First Issue Date</th>\r\n <th>Last Issue Date</th>\r\n <th>Author</th>\r\n <th>CoAuthor 1</th>\r\n <th>CoAuthor 2</th>\r\n </tr>\r\n <tr>\r\n <td>The Sandman</td>\r\n <td>DC</td>\r\n <td>1/1/89</td>\r\n <td>3/1/96</td>\r\n <td>Neil Gaiman</td>\r\n <td>Sam Kieth</td>\r\n <td>Mike Dringenberg</td>\r\n </tr>\r\n <tr>\r\n <td>Watchmen</td>\r\n <td>DC</td>\r\n <td>1/1/86</td>\r\n <td>12/1/87</td>\r\n <td>Allen Moore</td>\r\n <td>Dave Gibbons</td>\r\n <td>John Higgins</td>\r\n </tr>\r\n <tr>\r\n <td>The Swamp Thing</td>\r\n <td>DC</td>\r\n <td>2/1/84</td>\r\n <td>9/1/87</td>\r\n <td>Allen Moore</td>\r\n <td>Stephen Bissette</td>\r\n <td>Jon Totleben</td>\r\n </tr>\r\n</table>\r\n</body>\r\n</html>"