This is a prototype R Markdown Notebook which summarises the key facts and figures of Ontario.ca. It is a proof of concept that summarises a script which pulls directly from the Google Analytics API to produce a report.

Pulling from the API

Not contained in this notebook is a call to the Google Analytics API to download the following parameters from three ODS Google Analytics properties: Onesite, Onepage (both Ontario.ca) and Roll-up (gov.on.ca).

date_range = c("2010-01-01", "2018-08-19"),
metrics = c("sessions", "avgSessionDuration"), 
dimensions = c("date", "deviceCategory", "country", "language"),
anti_sample = TRUE)

Instead of reproducing those calls, the results are merged into one flat file and stored on disk in a file named ga_data_raw.rds.

ga_data_raw <- read_rds("ga_data_raw.rds")
head(ga_data_raw)

Fixing the ODS GA Install in the Backend

Unfortunately, for historical reasons, Google Analytics install on Ontario.ca is split between two GA properties: Onepage - Master and Onesite - Master. Those two properties have the following number of rows in the table:

table(ga_data_raw$viewName)
## 
## Onepage - Master Onesite - Master Roll-up - Master 
##          1092849           830028          3235370

We are only interested in the combined total of these two GA properties, so we make a new table named ga_data that sums all the traffic between the two properties and dumps all the other columns:

ga_data <-  # this cleans up the raw inputs from the API by combining the Ontario.ca properties and renaming the properties
  ga_data_raw %>%
  select(-avgSessionDuration, -viewId) %>%  # Combine `Onesite - Master` and `Onepage - Master`` into `Ontario.ca`
  spread(viewName, sessions, fill = 0) %>%
  mutate(`Ontario.ca` = `Onepage - Master` + `Onesite - Master`) %>%
  rename(`gov.on.ca` = `Roll-up - Master`) %>%
  select(-`Onepage - Master`, -`Onesite - Master`) %>%  # Remove components
  gather(viewName, sessions, c(`Ontario.ca`, `gov.on.ca`)) %>%  # Clean up into tidy data
  filter(sessions != 0) %>%  # Remove 0 rows, so lines aren't drawn on the chart
  mutate(yearmonth = as_date(parse_date_time(paste0(year(date), month(date)), "ym"))) %>%  # create a year-month date column
  group_by(yearmonth, viewName) %>%
  summarise(sessions = sum(sessions)) %>%
  write_rds("ga_data.rds")
head(ga_data)

Now that we have a tidy table, we can produce our report.

Example Report

Traffic data for Ontario.ca begins on , and this Factsheet contains data up to -. The website had the goal of migrating traffic from the constilation of gov.on.ca websites to Ontario.ca. This factsheet summarises the performance towards that goal.

ga_data %>%
  ggplot() +
  geom_line(aes(x = yearmonth, y = sessions, color = viewName, group = viewName)) +
  theme_minimal() +  # formatting lines
  labs(
    title = "The long work of Ontario.ca transitions",
    subtitle = "Ontario.ca (Onepage + Onesite) slowly taking traffic away from gov.on.ca",
    caption = "Data from Google Analytics, monthly figures\nalex.lougheed@Ontario.ca",
    x = "",
    y = "Sessions",
    color = "Property"
  ) +
  theme(legend.position = "bottom") +
  scale_x_date(date_labels = "'%y", date_breaks = "1 year") +
  scale_y_continuous(labels = comma)

The key performance indicator for the ODS is the percent of traffic shifting from gov.on.ca to Ontario.ca.

ga_data %>%
  spread(viewName, sessions) %>%
  mutate(traffic_shift_kpi = `Ontario.ca` / (`Ontario.ca` + `gov.on.ca`)) %>%
  filter(!is.na(traffic_shift_kpi)) %>%  # Do not consider time when Ontario.ca did not exist and kpi = NA
  ggplot() +
  geom_line(aes(x = yearmonth, y = traffic_shift_kpi)) +
  theme_minimal() +  # formatting lines
  labs(
    title = "The long work of Ontario.ca transitions",
    subtitle = "Ontario.ca's slowly taking up traffic formerly belonging to gov.on.ca",
    caption = "Data from Google Analytics, monthly figures\nalex.lougheed@Ontario.ca",
    x = "",
    y = "Percent of traffic",
    color = ""
  ) +
  theme(legend.position = "bottom") +
  scale_x_date(date_labels = "'%y", date_breaks = "1 year") +
  scale_y_continuous(labels = percent)

We can also produce tables in these types of reports, such as the last 9 months of the KPI.

ga_data %>%
  spread(viewName, sessions) %>%
  mutate(traffic_shift_kpi = `Ontario.ca` / (`Ontario.ca` + `gov.on.ca`)) %>%
  filter(!is.na(traffic_shift_kpi)) %>%
  filter(yearmonth > today()-months(9)) %>%
  kable(digits = 2)
yearmonth gov.on.ca Ontario.ca traffic_shift_kpi
2017-12-01 4180776 4814117 0.54
2018-01-01 5773173 7054606 0.55
2018-02-01 4613114 6161148 0.57
2018-03-01 4732410 6710891 0.59
2018-04-01 4229004 6133631 0.59
2018-05-01 4065967 6651769 0.62
2018-06-01 3691552 6405524 0.63
2018-07-01 3655742 7477206 0.67
2018-08-01 1959498 4385741 0.69