Most organizations generate their reports through a database / Excel combination. Excel is not suited for repetitive work that involves large amounts of data. The result is a resource-intensive and error-prone process, that outputs ugly plots.
The good news is that the data pipeline can be completely automated in R.
This means that text, plots and tables update when the underlying data changes. The benefits of this approach are:
The approach is shown with criminality statistics in Houston, USA.
The aim here is not to show the R code itself, but the output as seen by the user. In this case a HTML file has been generated and put on a server. The HTML file can however also be sent by email as a standalone file. It is also possible to generate PDF, Word, or Excel files. In those cases the interactivity is lost.
The data covers offenses in 2010 in the Houston area. The following crimes are included: murder, robbery, aggravated assault, burglary, auto theft, theft, rape.
A count of the data is presented below.
The data are analysed by weekday and by hour. The graph below shows the type of offense per day of the week.
As you can see, the offenses peak at different times of the day. For example, burglary peaks at 8 o’clock, whereas autotheft peaks at 22 o’clock
What happens if the underlying data of this analysis change? Let’s assign random hours for auto theft and burglary and replot, followed by a re-run of the paragraph above:
As you can see, the new offenses peak at different times of the day. For example, burglary peaks at 17 o’clock, whereas autotheft peaks at 11 o’clock. The new hours were programmatically inserted in the sentences, it was not necessary to inspect plots, code or data manually.
Since the coordinates of the crimes are recorded, a map can be made. In this case we concentrate on downtown Houston and violent crimes (robbery, aggravated assault, rape and murder).
The advantage of html output is that you can include interactive plots. Here is a timeline for burglaries on Sundays. Hover your mouse over the plot to see the values. You can select the period under consideration by dragging the selectors under the plot. The box in the lower left corner shows the rolling average value. It is set at 1 day and you can experiment with values for weekly [7] or monthly [30] plots.
Your users may want to see the data in raw or processed form directly. It is possible to include sortable and searchable tables in the output, as shown below.
Here is an example of an interactive map. You can zoom in with “+” and “-”. The offenses are clustered, meaning that you can see the area covered by hovering your mouse over the points. Clicking on the points zooms in, up to the level of the individual marker. Clicking on that marker will give you the offense name at that particular location.