Building a JSON data store

This query pulls down a set of identifiers and descriptors for INEGI's socio economic data.

library(RODBC)
con <- odbcConnect("gisdb")


d <- sqlQuery(con, "select * from inegi.ids")
d <- data.frame(id = d$var_id, variable = d$variable)

Using the json package

The json package writes out R objects as JSON. However this does not produce exactly the format needed by default.

library(rjson)
toJSON(head(d))

## [1] "{\"id\":[3103004004,5300000069,3103004005,5300000079,3103004003,5300000059],\"variable\":[\"Porcentaje de hombres ocupados en el sector primario\",\"Total de personal no dependiente de la raz\\u00f3n social. Sector 48-49. Transportes.\",\"Porcentaje de hombres ocupados en el sector secundario\",\"Personal remunerado. Sector 48-49. Transportes.\",\"Porcentaje de poblaci\\u00f3n total ocupada en el sector terciario\",\"Total de personal dependiente de la raz\\u00f3n social. Sector 48-49. Transportes.\"]}"

We need the values to be repeated in this sort of format

toJSON(d[1, ])

## [1] "{\"id\":3103004004,\"variable\":\"Porcentaje de hombres ocupados en el sector primario\"}"

So applying a function across the rows produces the data store format.


f <- function(x) cat(toJSON(x), ",")
sink("/var/www/apps/inegi/INEGI.json")
cat("[")
apply(d, 1, f)
cat("]")
sink()

One problem arose in this case. The final line includes an extra comma and a NULL value. It looks like this

“Total de gastos por consumo de bienes y servicios. Sector 21. Miner\u00eda.”} ,{“id”:“1010000033”,“variable”:“Subestaciones de transmisi\u00f3n de energ\u00eda el\u00e9ctrica”} ,NULL ]

The easiest way to deal with this in this case was just to open the store in a text editor and remove the last comma and NULL.

“Total de gastos por consumo de bienes y servicios. Sector 21. Miner\u00eda.”} ,{“id”:“1010000033”,“variable”:“Subestaciones de transmisi\u00f3n de energ\u00eda el\u00e9ctrica”} ]