This query pulls down a set of identifiers and descriptors for INEGI's socio economic data.
library(RODBC)
con <- odbcConnect("gisdb")
d <- sqlQuery(con, "select * from inegi.ids")
d <- data.frame(id = d$var_id, variable = d$variable)
The json package writes out R objects as JSON. However this does not produce exactly the format needed by default.
library(rjson)
toJSON(head(d))
## [1] "{\"id\":[3103004004,5300000069,3103004005,5300000079,3103004003,5300000059],\"variable\":[\"Porcentaje de hombres ocupados en el sector primario\",\"Total de personal no dependiente de la raz\\u00f3n social. Sector 48-49. Transportes.\",\"Porcentaje de hombres ocupados en el sector secundario\",\"Personal remunerado. Sector 48-49. Transportes.\",\"Porcentaje de poblaci\\u00f3n total ocupada en el sector terciario\",\"Total de personal dependiente de la raz\\u00f3n social. Sector 48-49. Transportes.\"]}"
We need the values to be repeated in this sort of format
toJSON(d[1, ])
## [1] "{\"id\":3103004004,\"variable\":\"Porcentaje de hombres ocupados en el sector primario\"}"
So applying a function across the rows produces the data store format.
f <- function(x) cat(toJSON(x), ",")
sink("/var/www/apps/inegi/INEGI.json")
cat("[")
apply(d, 1, f)
cat("]")
sink()
One problem arose in this case. The final line includes an extra comma and a NULL value. It looks like this
“Total de gastos por consumo de bienes y servicios. Sector 21. Miner\u00eda.”} ,{“id”:“1010000033”,“variable”:“Subestaciones de transmisi\u00f3n de energ\u00eda el\u00e9ctrica”} ,NULL ]
The easiest way to deal with this in this case was just to open the store in a text editor and remove the last comma and NULL.
“Total de gastos por consumo de bienes y servicios. Sector 21. Miner\u00eda.”} ,{“id”:“1010000033”,“variable”:“Subestaciones de transmisi\u00f3n de energ\u00eda el\u00e9ctrica”} ]