1 Why your reporting process is broken

Most organizations generate their reports through a database / Excel combination. Excel is not suited for repetitive work that involves large amounts of data. The result is a resource-intensive and error-prone process, that outputs ugly plots.

1.1 How to solve it

The good news is that the data pipeline can be completely automated in R.

This means that text, plots and tables update when the underlying data changes. The benefits of this approach are:

  • Large reduction of resources that are required to do standard work;
  • Improved standardization;
  • Reduced error potential;
  • Marginal costs of updated reporting are close to zero.
  • Excel plots are ugly. R plots are not.
  • Excel output is static. R output can be made interactive.

1.2 About this demo report

The approach is shown with criminality statistics in Houston, USA.

The aim here is not to show the R code itself, but the output as seen by the user. In this case a HTML file has been generated and put on a server. The HTML file can however also be sent by email as a standalone file. It is also possible to generate PDF, Word, or Excel files. In those cases the interactivity is lost.

2 The data: criminal statistics of Houston

The data covers offenses in 2010 in the Houston area. The following crimes are included: murder, robbery, aggravated assault, burglary, auto theft, theft, rape.

A count of the data is presented below.

Offense count

3 When did it happen?

3.1 Facetted plots per offense type

The data are analysed by weekday and by hour. The graph below shows the type of offense per day of the week.

Offense count per hour

As you can see, the offenses peak at different times of the day. For example, burglary peaks at 8 o’clock, whereas autotheft peaks at 22 o’clock

3.2 Automatic updating of results (auto plotting)

What happens if the underlying data of this analysis change? Let’s assign random hours for auto theft and burglary and replot, followed by a re-run of the paragraph above:

Offense count per hour - reduced

As you can see, the new offenses peak at different times of the day. For example, burglary peaks at 17 o’clock, whereas autotheft peaks at 11 o’clock. The new hours were programmatically inserted in the sentences, it was not necessary to inspect plots, code or data manually.

4 Where did it happen?

Since the coordinates of the crimes are recorded, a map can be made. In this case we concentrate on downtown Houston and violent crimes (robbery, aggravated assault, rape and murder).

Locations of violent crimes

5 Interactive auto plotting: involving your users

5.1 Interactive timeline

The advantage of html output is that you can include interactive plots. Here is a timeline for burglaries on Sundays. Hover your mouse over the plot to see the values. You can select the period under consideration by dragging the selectors under the plot. The box in the lower left corner shows the rolling average value. It is set at 1 day and you can experiment with values for weekly [7] or monthly [30] plots.

5.2 Interactive table

Your users may want to see the data in raw or processed form directly. It is possible to include sortable and searchable tables in the output, as shown below.

5.3 Interactive maps

Here is an example of an interactive map. You can zoom in with “+” and “-”. The offenses are clustered, meaning that you can see the area covered by hovering your mouse over the points. Clicking on the points zooms in, up to the level of the individual marker. Clicking on that marker will give you the offense name at that particular location.