Below are some tests comparing the speed of the Adobe Analytics API to the Google Analytics API. To do the comparison, we’ve looked at two sites – one small and one quite large – that are both running both Adobe Analytics and Google Analytics. To keep the comparison, well, comparable, the queries stick to standard dimensions and metrics so that the “same” data is being pulled from both systems. The queries all pull for the last 365 days.
This is a test comparing the speed of the Adobe Analytics Reporting API to the Google Analytics Reporting API. A few notes about that:
This is focused on the free version of Google Analytics; in the case of the large site, that means sampling comes into play, and, to make it an apples-to-apples comparison, the query uses the anti_sample = TRUE flag in googleAnalyticsR. That’s, basically, a bit of a hack to work around Google Analytics sampling. That shows in the results.
A separate comparison could be comparing the crunching of Adobe Analytics data feed data to Google Analytics 360 data that has been pushed into BigQuery. That’s wayyyy beyond the scope of this assessment, though.
The current test just runs each query once. Eventually, I may update this to run each query multiple times (and possibly try at different days of the week and times of day) to get a more robust set of data. But, I’m starting simple! I haven’t included the setup code here, as that’s a little tedious. But, I’ve tried to include all of the salient code for transparency purposes.
The Queries
This is certainly something that can be fiddled with a bit, but, for the initial test, I set up two queries:
- Daily unique visitors (users), visits (sessions), and page views (pageviews)
- Visits (sessions) broken down by device type and browser
For now, I haven’t added any segments to either query – either inline (dynamic) or named. But, that may be something to add in the future.
The functions below can then be run on different accounts to make these queries:
## COMPARISON #1: DAILY METRICS
# Adobe Analytics
daily_data_aa <- function(rsid, date_from, date_to){
aa_data <- QueueOvertime(rsid,
date.from = date_from,
date.to = date_to,
metrics = c("uniquevisitors", "visits", "pageviews"),
date.granularity = "day")
}
# Google Analytics
# This includes the "unsampled" flag to ensure more of an apples-to-apples comparison
daily_data_ga <- function(view_id, date_from, date_to){
ga_data <- google_analytics_4(view_id,
date_range = c(date_from, date_to),
metrics = c("users", "sessions", "pageviews"),
dimensions = "date",
anti_sample = TRUE)
}
## COMPARISON #2: DEVICE TYPE + BROWSER AGGREGATION
# Adobe Analytics
ranked_data_aa <- function(rsid, date_from, date_to){
aa_data <- QueueRanked(rsid,
date.from = date_from,
date.to = date_to,
metrics = "visits",
elements = c("mobiledevicetype", "browsertype"),
top = c(25,25))
}
# Google Analytics
# This includes the "unsampled" flag to ensure more of an apples-to-apples comparison. One
# extra step is required here to get the data to come back similarly -- specifying the order.
order_ga <- order_type("sessions", sort_order = "DESCENDING", orderType = "VALUE")
ranked_data_ga <- function(view_id, date_from, date_to){
ga_data <- google_analytics_4(view_id,
date_range = c(date_from, date_to),
metrics = "sessions",
dimensions = c("deviceCategory", "browser"),
order = order_ga,
anti_sample = TRUE)
}
Small Site First!
We’ll start with a small site. The first query is the daily data for 365 days. A bit of the data itself is included below to show that they two platforms return comparable results. All that is being recorded for the time is the time to actually make and retrieve the results of the query.
Adobe Analytics: Daily Trend - Small Site
# Adobe Analytics
time_daily_aa <- system.time(aa_data <- daily_data_aa(aa_rsid, start_date_year, end_date))
[1] "Requesting URL attempt #1"
[1] "Requesting URL attempt #2"
[1] "Received overtime report."
aa_data <- aa_data %>%
select(datetime, uniquevisitors, visits, pageviews) %>%
mutate(datetime = as.Date(datetime))
kable(head(aa_data))
| 2016-07-12 |
1493 |
1600 |
2156 |
| 2016-07-13 |
1484 |
1597 |
2124 |
| 2016-07-14 |
1527 |
1608 |
2163 |
| 2016-07-15 |
1108 |
1187 |
1572 |
| 2016-07-16 |
338 |
365 |
494 |
| 2016-07-17 |
542 |
583 |
779 |
Google Analytics: Daily Trend - Small Site
# Google Analytics
time_daily_ga <- system.time(ga_data <- daily_data_ga(ga_view_id, start_date_year, end_date))
kable(head(ga_data))
| 2016-07-12 |
1507 |
1628 |
2188 |
| 2016-07-13 |
1479 |
1620 |
2126 |
| 2016-07-14 |
1505 |
1596 |
2132 |
| 2016-07-15 |
1096 |
1187 |
1589 |
| 2016-07-16 |
335 |
365 |
490 |
| 2016-07-17 |
549 |
598 |
786 |
This query took 13.12 seconds for the Adobe Analytics API and 1.66 seconds for the Google Analytics API.
Now, let’s pull the device type and browser data – also for a year.
Adobe Analytics: Ranked Data - Small Site
# Adobe Analytics
time_ranked_aa <- system.time(aa_data <- ranked_data_aa(aa_rsid, start_date_year, end_date))
[1] "Requesting URL attempt #1"
[1] "Requesting URL attempt #2"
[1] "Received ranked report."
# We're not counting this in the time calculation -- just making it a bit more apples-to-apples with
# Google Analytics.
aa_data <- aa_data %>%
arrange(-visits) %>%
mutate(mobiledevicetype = ifelse(mobiledevicetype=="Other", "Desktop", mobiledevicetype)) %>%
select(mobiledevicetype, browsertype, visits)
kable(head(aa_data))
| Desktop |
Google |
284833 |
| Desktop |
Microsoft |
60961 |
| Desktop |
Mozilla |
39910 |
| Mobile Phone |
Apple |
16831 |
| Desktop |
Apple |
13570 |
| Mobile Phone |
Google |
13483 |
Google Analytics: Ranked Data - Small Site
# Google Analytics
time_ranked_ga <- system.time(ga_data <- ranked_data_ga(ga_view_id, start_date_year, end_date))
kable(head(ga_data))
| desktop |
Chrome |
284983 |
| desktop |
Internet Explorer |
50327 |
| desktop |
Firefox |
39297 |
| mobile |
Safari |
14351 |
| mobile |
Chrome |
13939 |
| desktop |
Safari |
13184 |
This query took 12.8 seconds for the Adobe Analytics API and 1.59 seconds for the Google Analytics API.
Now, A Much Larger Site
This is a much larger site – primarily an Adobe Analytics shop, but they also have Google Analytics implemented (lightly), so it suits our purposes. Again, the first query is the daily data for 365 days. We’re not going to show the code here, because it’s identical to the code above.
Adobe Analytics: Daily Trend - Large Site
[1] "Requesting URL attempt #1"
[1] "Requesting URL attempt #2"
[1] "Requesting URL attempt #3"
[1] "Received overtime report."
| 2016-07-12 |
90375 |
114619 |
692623 |
| 2016-07-13 |
97333 |
122789 |
737202 |
| 2016-07-14 |
94911 |
119724 |
708964 |
| 2016-07-15 |
76312 |
94919 |
547828 |
| 2016-07-16 |
11084 |
12459 |
70326 |
| 2016-07-17 |
9013 |
10003 |
48970 |
Google Analytics: Daily Trend - Large Site
| 2016-07-12 |
92946 |
122265 |
711739 |
| 2016-07-13 |
99901 |
130705 |
740745 |
| 2016-07-14 |
97663 |
127825 |
721988 |
| 2016-07-15 |
78651 |
101623 |
564974 |
| 2016-07-16 |
11226 |
12812 |
71551 |
| 2016-07-17 |
9145 |
10250 |
49598 |
This query took 22.55 seconds for the Adobe Analytics API and 2.15 seconds for the Google Analytics API.
Now, let’s pull the device type and browser data – also for a year.
Adobe Analytics: Ranked Data - Large Site
[1] "Requesting URL attempt #1"
[1] "Requesting URL attempt #2"
[1] "Received ranked report."
| Desktop |
Microsoft |
14624108 |
| Desktop |
Google |
13528832 |
| Desktop |
Mozilla |
4192844 |
| Tablet |
Apple |
1132420 |
| Desktop |
Apple |
734011 |
| Mobile Phone |
Apple |
298861 |
Google Analytics: Ranked Data - Large Site
| desktop |
Chrome |
14255211 |
| desktop |
Internet Explorer |
13774114 |
| desktop |
Firefox |
4308947 |
| desktop |
Edge |
1522778 |
| tablet |
Safari |
1174571 |
| desktop |
Safari |
754205 |
This query took 11.5 seconds for the Adobe Analytics API and 80.47 seconds for the Google Analytics API.
Results Summary
Below is a summary of the results.
| Small Site: Daily Trends |
13.1 |
1.7 |
| Small Site: Ranked Values |
12.8 |
1.6 |
| Large Site: Daily Trends |
22.5 |
2.1 |
| Large Site: Ranked Values |
11.5 |
80.5 |
Google Analytics got killed on the ranked data. For a large site, and with the “anti-sampling” flag being used in the query, that is because there were many calls to the API. If the site was a GA360 site with a custom table set up, it would have come back much faster (but custom tables only go back 30 days, so, even with GA360, this may be a tough one.)
---
title: "Adobe Analytics / Google Analytics API Speed"
output: html_notebook
---

Below are some tests comparing the speed of the Adobe Analytics API to the Google Analytics API. To do the comparison, we've looked at two sites -- one small and one quite large -- that are both running both Adobe Analytics and Google Analytics. To keep the comparison, well, comparable, the queries stick to standard dimensions and metrics so that the "same" data is being pulled from both systems. The queries all pull for the last 365 days.

This is a test comparing the speed of the Adobe Analytics Reporting API to the Google Analytics Reporting API. A few notes about that:

* This is focused on the free version of Google Analytics; in the case of the large site, that means sampling comes into play, and, to make it an apples-to-apples comparison, the query uses the `anti_sample = TRUE` flag in `googleAnalyticsR`. That's, basically, a bit of a hack to work around Google Analytics sampling. That shows in the results.

* A separate comparison could be comparing the crunching of Adobe Analytics data feed data to Google Analytics 360 data that has been pushed into BigQuery. That's wayyyy beyond the scope of this assessment, though.

The current test just runs each query once. Eventually, I may update this to run each query multiple times (and possibly try at different days of the week and times of day) to get a more robust set of data. But, I'm starting simple! I haven't included the setup code here, as that's a little tedious. But, I've tried to include all of the salient code for transparency purposes.

```{r setup, include=FALSE}

# Get Google Analytics credentials and set them. This is just for the user running
# the file and will work across multiple sites. For Adobe, separate credentials
# are needed for each site.
ga_client_id <- Sys.getenv("GA_CLIENT_ID")
ga_client_secret <- Sys.getenv("GA_CLIENT_SECRET")

options(googleAuthR.client_id = ga_client_id)
options(googleAuthR.client_secret = ga_client_secret) 

library(tidyverse)
library(googleAnalyticsR)
library(RSiteCatalyst)
library(knitr)

end_date <- Sys.Date() - 1
start_date_year <- end_date-364

# Authorize Google Analytics
ga_auth()

# Initialize a data frame that will log all the results
results_log <- data.frame(testname = character(), aa_time = numeric(), ga_time = numeric(), stringsAsFactors = FALSE)

```

## The Queries

This is certainly something that can be fiddled with a bit, but, for the initial test, I set up two queries:

* Daily **unique visitors** (users), **visits** (sessions), and **page views** (pageviews)
* **Visits** (sessions) broken down by **device type** and **browser**

For now, I haven't added any segments to either query -- either inline (dynamic) or named. But, that may be something to add in the future.

The functions below can then be run on different accounts to make these queries:

```{r data_functions, echo=TRUE}

## COMPARISON #1: DAILY METRICS

# Adobe Analytics
daily_data_aa <- function(rsid, date_from, date_to){
  aa_data <- QueueOvertime(rsid,
                         date.from = date_from,
                         date.to = date_to,
                         metrics = c("uniquevisitors", "visits", "pageviews"),
                         date.granularity = "day")
}

# Google Analytics
# This includes the "unsampled" flag to ensure more of an apples-to-apples comparison
daily_data_ga <- function(view_id, date_from, date_to){
  ga_data <- google_analytics_4(view_id,
                                date_range = c(date_from, date_to),
                                metrics = c("users", "sessions", "pageviews"),
                                dimensions = "date",
                                anti_sample = TRUE)
}

## COMPARISON #2: DEVICE TYPE + BROWSER AGGREGATION

# Adobe Analytics
ranked_data_aa <- function(rsid, date_from, date_to){
  aa_data <- QueueRanked(rsid,
                           date.from = date_from,
                           date.to = date_to,
                           metrics = "visits",
                           elements = c("mobiledevicetype", "browsertype"),
                         top = c(25,25))
}

# Google Analytics
# This includes the "unsampled" flag to ensure more of an apples-to-apples comparison. One
# extra step is required here to get the data to come back similarly -- specifying the order.

order_ga <- order_type("sessions", sort_order = "DESCENDING", orderType = "VALUE")

ranked_data_ga <- function(view_id, date_from, date_to){
  ga_data <- google_analytics_4(view_id,
                                date_range = c(date_from, date_to),
                                metrics = "sessions",
                                dimensions = c("deviceCategory", "browser"),
                                order = order_ga,
                                anti_sample = TRUE)
}

```

## Small Site First!

```{r setup_small, include=FALSE}
# Set up the tests for the small site

# Adobe Analytics
aa_username <- Sys.getenv("ADOBE_API_USERNAME")
aa_secret <- Sys.getenv("ADOBE_API_SECRET")
aa_rsid <-Sys.getenv("ADOBE_RSID")

# Authorize Adobe Analytics
SCAuth(aa_username, aa_secret)

# Google Analytics
ga_view_id <- Sys.getenv("GA_VIEW_ID")

```

We'll start with a small site. The first query is the daily data for 365 days. A bit of the data itself is included below to show that they two platforms return comparable results. All that is being recorded for the time is the time to actually make and retrieve the results of the query.

### Adobe Analytics: Daily Trend - Small Site

```{r daily_data_small_aa, echo=TRUE, message=FALSE}
# Adobe Analytics
time_daily_aa <- system.time(aa_data <- daily_data_aa(aa_rsid, start_date_year, end_date))
aa_data <- aa_data %>% 
  select(datetime, uniquevisitors, visits, pageviews) %>% 
  mutate(datetime = as.Date(datetime))
kable(head(aa_data))
```

### Google Analytics: Daily Trend - Small Site

```{r daily_data_small_ga, echo=TRUE, message=FALSE}
# Google Analytics
time_daily_ga <- system.time(ga_data <- daily_data_ga(ga_view_id, start_date_year, end_date))
kable(head(ga_data))
```

This query took **`r round(time_daily_aa[[3]],2)`** seconds for the Adobe Analytics API and **`r round(time_daily_ga[[3]],2)`** seconds for the Google Analytics API.

```{r log_daily_small, include=FALSE}
# Log the results
results_log[nrow(results_log)+1,] <- c("Small Site: Daily Trends", time_daily_aa[[3]], time_daily_ga[[3]])
```

Now, let's pull the **device type** and **browser** data -- also for a year.

### Adobe Analytics: Ranked Data - Small Site

```{r ranked_data_small_aa, echo=TRUE, message=FALSE, warning=FALSE}
# Adobe Analytics
time_ranked_aa <- system.time(aa_data <- ranked_data_aa(aa_rsid, start_date_year, end_date))

# We're not counting this in the time calculation -- just making it a bit more apples-to-apples with
# Google Analytics.
  aa_data <- aa_data %>% 
    arrange(-visits) %>% 
    mutate(mobiledevicetype = ifelse(mobiledevicetype=="Other", "Desktop", mobiledevicetype)) %>%
    select(mobiledevicetype, browsertype, visits)

kable(head(aa_data))
```

### Google Analytics: Ranked Data - Small Site

```{r ranked_data_small_ga, echo=TRUE, message=FALSE, warning=FALSE}
# Google Analytics
time_ranked_ga <- system.time(ga_data <- ranked_data_ga(ga_view_id, start_date_year, end_date))
kable(head(ga_data))
```

This query took **`r round(time_ranked_aa[[3]],2)`** seconds for the Adobe Analytics API and **`r round(time_ranked_ga[[3]],2)`** seconds for the Google Analytics API.

```{r log_ranked_small, include=FALSE}
# Log the results
results_log[nrow(results_log)+1,] <- c("Small Site: Ranked Values", time_ranked_aa[[3]], time_ranked_ga[[3]])
```

## Now, A Much Larger Site

```{r setup_large, include=FALSE}
# Set up the tests for the larger site

# Adobe Analytics
aa_username <- Sys.getenv("ADOBE_API_USERNAME_M")
aa_secret <- Sys.getenv("ADOBE_API_SECRET_M")
aa_rsid <-Sys.getenv("ADOBE_RSID_M")

# Authorize Adobe Analytics
SCAuth(aa_username, aa_secret)

# Google Analytics
ga_view_id <- Sys.getenv("GA_VIEW_ID_M")

```

This is a much larger site -- primarily an Adobe Analytics shop, but they also have Google Analytics implemented (lightly), so it suits our purposes. Again, the first query is the daily data for 365 days. We're not going to show the code here, because it's identical to the code above.

### Adobe Analytics: Daily Trend - Large Site

```{r daily_data_large_aa, echo=FALSE, message=FALSE}
# Adobe Analytics
time_daily_large_aa <- system.time(aa_data <- daily_data_aa(aa_rsid, start_date_year, end_date))
aa_data <- aa_data %>% 
  select(datetime, uniquevisitors, visits, pageviews) %>% 
  mutate(datetime = as.Date(datetime))
kable(head(aa_data))
```

### Google Analytics: Daily Trend - Large Site

```{r daily_data_large_ga, echo=FALSE, message=FALSE}
# Google Analytics
time_daily_large_ga <- system.time(ga_data <- daily_data_ga(ga_view_id, start_date_year, end_date))
kable(head(ga_data))
```

This query took **`r round(time_daily_large_aa[[3]],2)`** seconds for the Adobe Analytics API and **`r round(time_daily_large_ga[[3]],2)`** seconds for the Google Analytics API.

```{r log_daily_large, include=FALSE}
# Log the results
results_log[nrow(results_log)+1,] <- c("Large Site: Daily Trends", time_daily_large_aa[[3]], time_daily_large_ga[[3]])
```

Now, let's pull the **device type** and **browser** data -- also for a year.

### Adobe Analytics: Ranked Data - Large Site

```{r ranked_data_large_aa, echo=FALSE, message=FALSE, warning=FALSE}
# Adobe Analytics
time_ranked_large_aa <- system.time(aa_data <- ranked_data_aa(aa_rsid, start_date_year, end_date))

# We're not counting this in the time calculation -- just making it a bit more apples-to-apples with
# Google Analytics.
  aa_data <- aa_data %>% 
    arrange(-visits) %>% 
    mutate(mobiledevicetype = ifelse(mobiledevicetype=="Other", "Desktop", mobiledevicetype)) %>%
    select(mobiledevicetype, browsertype, visits)

kable(head(aa_data))
```

### Google Analytics: Ranked Data - Large Site

```{r ranked_data_large_ga, echo=FALSE, message=FALSE, warning=FALSE}

# Google Analytics
time_ranked_large_ga <- system.time(ga_data <- ranked_data_ga(ga_view_id, start_date_year, end_date))

ga_data <- ga_data %>% 
  arrange(-sessions)

kable(head(ga_data))
```

This query took **`r round(time_ranked_large_aa[[3]],2)`** seconds for the Adobe Analytics API and **`r round(time_ranked_large_ga[[3]],2)`** seconds for the Google Analytics API.

```{r log_ranked_large, include=FALSE}
# Log the results
results_log[nrow(results_log)+1,] <- c("Large Site: Ranked Values", time_ranked_large_aa[[3]], time_ranked_large_ga[[3]])
```

## Results Summary

Below is a summary of the results.

```{r results, echo=FALSE}

results_log <- results_log %>% 
  mutate(aa_time = round(as.numeric(aa_time),1), ga_time = round(as.numeric(ga_time),1))

names(results_log) <- c("Query", "Adobe Analytics Time (seconds)", "Google Analytics Time (seconds)")

kable(results_log)

# # Convert to long
# results_long <- gather(results_log, platform, time, `Adobe Analytics Time (seconds)`, `Google Analytics Time (seconds)`)
# 
# results_plot <- ggplot(data = results_long, mapping = aes(x=`Query`, y=time)) +
#   geom_bar(stat="identity")
# 
# results_plot

```

Google Analytics got killed on the ranked data. For a large site, and with the "anti-sampling" flag being used in the query, that is because there were many calls to the API. If the site was a GA360 site with a custom table set up, it would have come back much faster (but custom tables only go back 30 days, so, even with GA360, this may be a tough one.)
