This vignette shows how to use the new CSV API to the BEACON database. It's a really quick example using R. If you want to add example code for Matlab, we can probably do that! Email me at david.holstius@berkeley.edu.
Why you might want to use the API:
First we make a request to the server. It might take a few seconds to respond if you're requesting a new URL. The httr R package is simple and easy but not fast (this is a demo!). Responses are cached on the server, so second and third requests should be faster.
You can reuse the get_month() and get_day functions without needing to understand the guts. I've wrapped them into a beacon R package hoping that people find this dataflow useful.
Both get_month(...) and get_day(...) fetch data from the API, load it into a data.frame (R's native format), and parse the timestamps, which by default are stored in UTC. They'll show up in the timezone you provide, which by default is “America/Los_Angeles”. Let's peek at the data:
suppressPackageStartupMessages(library(beacon))
daily_means <- get_month("Vaisala_CO2_ppm", 2012, 10)
## Downloaded 185 datapoints from /avg/Vaisala_CO2_ppm/2012/10/1d/stacked.csv
head(daily_means)
## metric timestamp value site
## 1 Vaisala_CO2_ppm 2012-09-30 10:00:26 454.0 BurckES
## 2 Vaisala_CO2_ppm 2012-09-30 10:00:27 408.4 WestEd
## 3 Vaisala_CO2_ppm 2012-09-30 10:00:28 437.1 InternationalES
## 4 Vaisala_CO2_ppm 2012-09-30 10:00:31 455.5 KorematsuES
## 5 Vaisala_CO2_ppm 2012-09-30 10:00:32 367.7 SkylineHS
## 6 Vaisala_CO2_ppm 2012-09-30 10:00:33 429.3 StElizabethHS
Unfortunately there are still 0s and -999s in the database, which means the averaged values will be artificially low. We need to clean those out. But here's a quick plot with those artifacts evident:
suppressPackageStartupMessages(require(ggplot2))
suppressPackageStartupMessages(require(scales))
fig <- qplot(timestamp, value, color = site, geom = "line", data = daily_means)
fig <- fig + scale_y_continuous("Vaisala_CO2_ppm", limits = c(0, 600))
fig <- fig + scale_x_datetime("America/Los_Angeles", breaks = date_breaks("1 week"),
minor_breaks = date_breaks("1 day"), labels = date_format("%a %m/%d"))
show(fig + ggtitle("Daily Means"))
## Warning: Removed 14 rows containing missing values (geom_path).
OK, let's break it down. That URL had three important parts:
The metric name is Vaisala_CO2_pmm. This is the same name that the metric has in our time-series database, OpenTSDB. We need to load in data for more metrics (like board_V).
The date range is the month of October 2012 (“2012/10”) and the averaging interval is daily (“1d”). You can also request hourly averages using get_month(metric, year, month, "1h"), and 1-hour or 5-minute (“5m”) averages using get_day(metric, year, month, day, "5m"). For example:
hourly_means <- get_day("Vaisala_CO2_ppm", 2012, 10, 1)
## Downloaded 174 datapoints from
## /avg/Vaisala_CO2_ppm/2012/10/01/1h/stacked.csv
fig <- qplot(timestamp, value, color = site, geom = "line", data = hourly_means)
fig <- fig + scale_y_continuous("Vaisala_CO2_ppm")
fig <- fig + scale_x_datetime("America/Los_Angeles")
show(fig + ggtitle("Hourly Means"))
They will actually fetch a little more data than is needed, so if you need exact date ranges, use subset(...) to trim the data based on the timestamps.
Since these are public-facing URLs, they are limited in the amount of data they return, so (a) the server doesn't freak out and (b) the caching performs well. Let me know if you want to be able to request larger ranges at higher resolution.
You can install the API bindings for R using devtools. If you don't have devtools, this will install it for you:
install.packages("devtools")
Once you have that, you can install the beacon package from my GitHub repository:
library(devtools)
install_github("beacon", "holstius")
Then you can load it at any time using:
library(beacon)
The format is “stacked” or “long”. Talk to me if you need support for “unstacked” or “wide”. The PANDAS project has awesome support for stacking/unstacking data in Python and an excellent description of the difference.
http://pandas.pydata.org/pandas-docs/dev/reshaping.html#reshaping-by-stacking-and-unstacking