#xmlParse('https://github.com/ebhtra/msds-607/blob/main/wk7_formats/dictionaries.xml') # can't use https
(xml2df <- xmlToDataFrame('dictionaries.xml'))
## title
## 1 diccionario Salamanca de la lengua española
## 2 American Heritage Dictionary
## 3 Diccionari Escolar
## editors year
## 1 PilarPeña PérezMaríadel Rosario Calderón SotoMercedesEsteban García 2002
## 2 MarkBoyerPamelaDeVinneDoloresHarris 1991
## 3 MontserratMartín EnrileFerranBallester MateosLaiaCabal Guarro 2007
## langs edition
## 1 spanish-spanish 6
## 2 english-english 2
## 3 catalan-spanishspanish-catalan 2
Nothing like a whole bunch of long Spanish names (That’s only 3 editors per row) to highlight how this method works easily but needs some work afterwards to separate the children that are siblings.
Url <- 'dictionaries.html' # Again, https URL not working for this
readHTMLTable(Url, encoding = "UTF-8")[[1]][,]
## title
## 1 diccionario Salamanca de la lengua española
## 2 American Heritage Dictionary
## 3 Diccionari Escolar
## editors
## 1 Pilar Peña Pérez, María del Rosario Calderón Soto, Mercedes Esteban García
## 2 Mark Boyer, Pamela DeVinne, Dolores Harris
## 3 Montserrat Martín Enrile, Ferran Ballester Mateos, Laia Cabal Guarro
## languages year edition
## 1 spanish-spanish 2002 6
## 2 english-english 1991 2
## 3 catalan-spanish, spanish-catalan 2007 2
This one looks a lot nicer, but that’s because I combined the editors and languages elements into one here, since html seemed to force me to, when I was constructing the table.
jd <- fromJSON("https://raw.githubusercontent.com/ebhtra/msds-607/main/wk7_formats/dictionaries.json")
jd <- data.frame(jd)
jd
## Dictionaries.title
## 1 diccionario Salamanca de la lengua española
## 2 American Heritage Dictionary
## 3 Diccionari Escolar
## Dictionaries.editors
## 1 Pilar Peña Pérez, María del Rosario Calderón Soto, Mercedes Esteban García
## 2 Mark Boyer, Pamela DeVinne, Dolores Harris
## 3 Montserrat Martín Enrile, Ferran Ballester Mateos, Laia Cabal Guarro
## Dictionaries.languages Dictionaries.year Dictionaries.edition
## 1 spanish-spanish 2002 6
## 2 english-english 1991 2
## 3 catalan-spanish, spanish-catalan 2007 2
Remove column prefixes (‘Dictionary’ was the outer dict in the JSON code)
names(jd) <- sapply(names(jd), function(n){substring(n, 14)})
kbl(jd) # View doesn't knit so use kable to show lists
| title | editors | languages | year | edition |
|---|---|---|---|---|
| diccionario Salamanca de la lengua española | Pilar Peña Pérez , María del Rosario Calderón Soto, Mercedes Esteban García | spanish-spanish | 2002 | 6 |
| American Heritage Dictionary | Mark Boyer , Pamela DeVinne, Dolores Harris | english-english | 1991 | 2 |
| Diccionari Escolar | Montserrat Martín Enrile, Ferran Ballester Mateos , Laia Cabal Guarro | catalan-spanish, spanish-catalan | 2007 | 2 |