Below are some tests comparing the speed of the Adobe Analytics API to the Google Analytics API. To do the comparison, we’ve looked at two sites – one small and one quite large – that are both running both Adobe Analytics and Google Analytics. To keep the comparison, well, comparable, the queries stick to standard dimensions and metrics so that the “same” data is being pulled from both systems. The queries all pull for the last 365 days.

This is a test comparing the speed of the Adobe Analytics Reporting API to the Google Analytics Reporting API. A few notes about that:

The current test just runs each query once. Eventually, I may update this to run each query multiple times (and possibly try at different days of the week and times of day) to get a more robust set of data. But, I’m starting simple! I haven’t included the setup code here, as that’s a little tedious. But, I’ve tried to include all of the salient code for transparency purposes.

The Queries

This is certainly something that can be fiddled with a bit, but, for the initial test, I set up two queries:

For now, I haven’t added any segments to either query – either inline (dynamic) or named. But, that may be something to add in the future.

The functions below can then be run on different accounts to make these queries:

## COMPARISON #1: DAILY METRICS
# Adobe Analytics
daily_data_aa <- function(rsid, date_from, date_to){
  aa_data <- QueueOvertime(rsid,
                         date.from = date_from,
                         date.to = date_to,
                         metrics = c("uniquevisitors", "visits", "pageviews"),
                         date.granularity = "day")
}
# Google Analytics
# This includes the "unsampled" flag to ensure more of an apples-to-apples comparison
daily_data_ga <- function(view_id, date_from, date_to){
  ga_data <- google_analytics_4(view_id,
                                date_range = c(date_from, date_to),
                                metrics = c("users", "sessions", "pageviews"),
                                dimensions = "date",
                                anti_sample = TRUE)
}
## COMPARISON #2: DEVICE TYPE + BROWSER AGGREGATION
# Adobe Analytics
ranked_data_aa <- function(rsid, date_from, date_to){
  aa_data <- QueueRanked(rsid,
                           date.from = date_from,
                           date.to = date_to,
                           metrics = "visits",
                           elements = c("mobiledevicetype", "browsertype"),
                         top = c(25,25))
}
# Google Analytics
# This includes the "unsampled" flag to ensure more of an apples-to-apples comparison. One
# extra step is required here to get the data to come back similarly -- specifying the order.
order_ga <- order_type("sessions", sort_order = "DESCENDING", orderType = "VALUE")
ranked_data_ga <- function(view_id, date_from, date_to){
  ga_data <- google_analytics_4(view_id,
                                date_range = c(date_from, date_to),
                                metrics = "sessions",
                                dimensions = c("deviceCategory", "browser"),
                                order = order_ga,
                                anti_sample = TRUE)
}

Small Site First!

We’ll start with a small site. The first query is the daily data for 365 days. A bit of the data itself is included below to show that they two platforms return comparable results. All that is being recorded for the time is the time to actually make and retrieve the results of the query.

Adobe Analytics: Daily Trend - Small Site

# Adobe Analytics
time_daily_aa <- system.time(aa_data <- daily_data_aa(aa_rsid, start_date_year, end_date))
[1] "Requesting URL attempt #1"
[1] "Requesting URL attempt #2"
[1] "Received overtime report."
aa_data <- aa_data %>% 
  select(datetime, uniquevisitors, visits, pageviews) %>% 
  mutate(datetime = as.Date(datetime))
kable(head(aa_data))
datetime uniquevisitors visits pageviews
2016-07-12 1493 1600 2156
2016-07-13 1484 1597 2124
2016-07-14 1527 1608 2163
2016-07-15 1108 1187 1572
2016-07-16 338 365 494
2016-07-17 542 583 779

Google Analytics: Daily Trend - Small Site

# Google Analytics
time_daily_ga <- system.time(ga_data <- daily_data_ga(ga_view_id, start_date_year, end_date))
kable(head(ga_data))
date users sessions pageviews
2016-07-12 1507 1628 2188
2016-07-13 1479 1620 2126
2016-07-14 1505 1596 2132
2016-07-15 1096 1187 1589
2016-07-16 335 365 490
2016-07-17 549 598 786

This query took 13.12 seconds for the Adobe Analytics API and 1.66 seconds for the Google Analytics API.

Now, let’s pull the device type and browser data – also for a year.

Adobe Analytics: Ranked Data - Small Site

# Adobe Analytics
time_ranked_aa <- system.time(aa_data <- ranked_data_aa(aa_rsid, start_date_year, end_date))
[1] "Requesting URL attempt #1"
[1] "Requesting URL attempt #2"
[1] "Received ranked report."
# We're not counting this in the time calculation -- just making it a bit more apples-to-apples with
# Google Analytics.
  aa_data <- aa_data %>% 
    arrange(-visits) %>% 
    mutate(mobiledevicetype = ifelse(mobiledevicetype=="Other", "Desktop", mobiledevicetype)) %>%
    select(mobiledevicetype, browsertype, visits)
kable(head(aa_data))
mobiledevicetype browsertype visits
Desktop Google 284833
Desktop Microsoft 60961
Desktop Mozilla 39910
Mobile Phone Apple 16831
Desktop Apple 13570
Mobile Phone Google 13483

Google Analytics: Ranked Data - Small Site

# Google Analytics
time_ranked_ga <- system.time(ga_data <- ranked_data_ga(ga_view_id, start_date_year, end_date))
kable(head(ga_data))
deviceCategory browser sessions
desktop Chrome 284983
desktop Internet Explorer 50327
desktop Firefox 39297
mobile Safari 14351
mobile Chrome 13939
desktop Safari 13184

This query took 12.8 seconds for the Adobe Analytics API and 1.59 seconds for the Google Analytics API.

Now, A Much Larger Site

This is a much larger site – primarily an Adobe Analytics shop, but they also have Google Analytics implemented (lightly), so it suits our purposes. Again, the first query is the daily data for 365 days. We’re not going to show the code here, because it’s identical to the code above.

Adobe Analytics: Daily Trend - Large Site

[1] "Requesting URL attempt #1"
[1] "Requesting URL attempt #2"
[1] "Requesting URL attempt #3"
[1] "Received overtime report."
datetime uniquevisitors visits pageviews
2016-07-12 90375 114619 692623
2016-07-13 97333 122789 737202
2016-07-14 94911 119724 708964
2016-07-15 76312 94919 547828
2016-07-16 11084 12459 70326
2016-07-17 9013 10003 48970

Google Analytics: Daily Trend - Large Site

date users sessions pageviews
2016-07-12 92946 122265 711739
2016-07-13 99901 130705 740745
2016-07-14 97663 127825 721988
2016-07-15 78651 101623 564974
2016-07-16 11226 12812 71551
2016-07-17 9145 10250 49598

This query took 22.55 seconds for the Adobe Analytics API and 2.15 seconds for the Google Analytics API.

Now, let’s pull the device type and browser data – also for a year.

Adobe Analytics: Ranked Data - Large Site

[1] "Requesting URL attempt #1"
[1] "Requesting URL attempt #2"
[1] "Received ranked report."
mobiledevicetype browsertype visits
Desktop Microsoft 14624108
Desktop Google 13528832
Desktop Mozilla 4192844
Tablet Apple 1132420
Desktop Apple 734011
Mobile Phone Apple 298861

Google Analytics: Ranked Data - Large Site

deviceCategory browser sessions
desktop Chrome 14255211
desktop Internet Explorer 13774114
desktop Firefox 4308947
desktop Edge 1522778
tablet Safari 1174571
desktop Safari 754205

This query took 11.5 seconds for the Adobe Analytics API and 80.47 seconds for the Google Analytics API.

Results Summary

Below is a summary of the results.

Query Adobe Analytics Time (seconds) Google Analytics Time (seconds)
Small Site: Daily Trends 13.1 1.7
Small Site: Ranked Values 12.8 1.6
Large Site: Daily Trends 22.5 2.1
Large Site: Ranked Values 11.5 80.5

Google Analytics got killed on the ranked data. For a large site, and with the “anti-sampling” flag being used in the query, that is because there were many calls to the API. If the site was a GA360 site with a custom table set up, it would have come back much faster (but custom tables only go back 30 days, so, even with GA360, this may be a tough one.)

---
title: "Adobe Analytics / Google Analytics API Speed"
output: html_notebook
---

Below are some tests comparing the speed of the Adobe Analytics API to the Google Analytics API. To do the comparison, we've looked at two sites -- one small and one quite large -- that are both running both Adobe Analytics and Google Analytics. To keep the comparison, well, comparable, the queries stick to standard dimensions and metrics so that the "same" data is being pulled from both systems. The queries all pull for the last 365 days.

This is a test comparing the speed of the Adobe Analytics Reporting API to the Google Analytics Reporting API. A few notes about that:

* This is focused on the free version of Google Analytics; in the case of the large site, that means sampling comes into play, and, to make it an apples-to-apples comparison, the query uses the `anti_sample = TRUE` flag in `googleAnalyticsR`. That's, basically, a bit of a hack to work around Google Analytics sampling. That shows in the results.

* A separate comparison could be comparing the crunching of Adobe Analytics data feed data to Google Analytics 360 data that has been pushed into BigQuery. That's wayyyy beyond the scope of this assessment, though.

The current test just runs each query once. Eventually, I may update this to run each query multiple times (and possibly try at different days of the week and times of day) to get a more robust set of data. But, I'm starting simple! I haven't included the setup code here, as that's a little tedious. But, I've tried to include all of the salient code for transparency purposes.

```{r setup, include=FALSE}

# Get Google Analytics credentials and set them. This is just for the user running
# the file and will work across multiple sites. For Adobe, separate credentials
# are needed for each site.
ga_client_id <- Sys.getenv("GA_CLIENT_ID")
ga_client_secret <- Sys.getenv("GA_CLIENT_SECRET")

options(googleAuthR.client_id = ga_client_id)
options(googleAuthR.client_secret = ga_client_secret) 

library(tidyverse)
library(googleAnalyticsR)
library(RSiteCatalyst)
library(knitr)

end_date <- Sys.Date() - 1
start_date_year <- end_date-364

# Authorize Google Analytics
ga_auth()

# Initialize a data frame that will log all the results
results_log <- data.frame(testname = character(), aa_time = numeric(), ga_time = numeric(), stringsAsFactors = FALSE)

```

## The Queries

This is certainly something that can be fiddled with a bit, but, for the initial test, I set up two queries:

* Daily **unique visitors** (users), **visits** (sessions), and **page views** (pageviews)
* **Visits** (sessions) broken down by **device type** and **browser**

For now, I haven't added any segments to either query -- either inline (dynamic) or named. But, that may be something to add in the future.

The functions below can then be run on different accounts to make these queries:

```{r data_functions, echo=TRUE}

## COMPARISON #1: DAILY METRICS

# Adobe Analytics
daily_data_aa <- function(rsid, date_from, date_to){
  aa_data <- QueueOvertime(rsid,
                         date.from = date_from,
                         date.to = date_to,
                         metrics = c("uniquevisitors", "visits", "pageviews"),
                         date.granularity = "day")
}

# Google Analytics
# This includes the "unsampled" flag to ensure more of an apples-to-apples comparison
daily_data_ga <- function(view_id, date_from, date_to){
  ga_data <- google_analytics_4(view_id,
                                date_range = c(date_from, date_to),
                                metrics = c("users", "sessions", "pageviews"),
                                dimensions = "date",
                                anti_sample = TRUE)
}

## COMPARISON #2: DEVICE TYPE + BROWSER AGGREGATION

# Adobe Analytics
ranked_data_aa <- function(rsid, date_from, date_to){
  aa_data <- QueueRanked(rsid,
                           date.from = date_from,
                           date.to = date_to,
                           metrics = "visits",
                           elements = c("mobiledevicetype", "browsertype"),
                         top = c(25,25))
}

# Google Analytics
# This includes the "unsampled" flag to ensure more of an apples-to-apples comparison. One
# extra step is required here to get the data to come back similarly -- specifying the order.

order_ga <- order_type("sessions", sort_order = "DESCENDING", orderType = "VALUE")

ranked_data_ga <- function(view_id, date_from, date_to){
  ga_data <- google_analytics_4(view_id,
                                date_range = c(date_from, date_to),
                                metrics = "sessions",
                                dimensions = c("deviceCategory", "browser"),
                                order = order_ga,
                                anti_sample = TRUE)
}

```

## Small Site First!

```{r setup_small, include=FALSE}
# Set up the tests for the small site

# Adobe Analytics
aa_username <- Sys.getenv("ADOBE_API_USERNAME")
aa_secret <- Sys.getenv("ADOBE_API_SECRET")
aa_rsid <-Sys.getenv("ADOBE_RSID")

# Authorize Adobe Analytics
SCAuth(aa_username, aa_secret)

# Google Analytics
ga_view_id <- Sys.getenv("GA_VIEW_ID")

```

We'll start with a small site. The first query is the daily data for 365 days. A bit of the data itself is included below to show that they two platforms return comparable results. All that is being recorded for the time is the time to actually make and retrieve the results of the query.

### Adobe Analytics: Daily Trend - Small Site

```{r daily_data_small_aa, echo=TRUE, message=FALSE}
# Adobe Analytics
time_daily_aa <- system.time(aa_data <- daily_data_aa(aa_rsid, start_date_year, end_date))
aa_data <- aa_data %>% 
  select(datetime, uniquevisitors, visits, pageviews) %>% 
  mutate(datetime = as.Date(datetime))
kable(head(aa_data))
```

### Google Analytics: Daily Trend - Small Site

```{r daily_data_small_ga, echo=TRUE, message=FALSE}
# Google Analytics
time_daily_ga <- system.time(ga_data <- daily_data_ga(ga_view_id, start_date_year, end_date))
kable(head(ga_data))
```

This query took **`r round(time_daily_aa[[3]],2)`** seconds for the Adobe Analytics API and **`r round(time_daily_ga[[3]],2)`** seconds for the Google Analytics API.

```{r log_daily_small, include=FALSE}
# Log the results
results_log[nrow(results_log)+1,] <- c("Small Site: Daily Trends", time_daily_aa[[3]], time_daily_ga[[3]])
```

Now, let's pull the **device type** and **browser** data -- also for a year.

### Adobe Analytics: Ranked Data - Small Site

```{r ranked_data_small_aa, echo=TRUE, message=FALSE, warning=FALSE}
# Adobe Analytics
time_ranked_aa <- system.time(aa_data <- ranked_data_aa(aa_rsid, start_date_year, end_date))

# We're not counting this in the time calculation -- just making it a bit more apples-to-apples with
# Google Analytics.
  aa_data <- aa_data %>% 
    arrange(-visits) %>% 
    mutate(mobiledevicetype = ifelse(mobiledevicetype=="Other", "Desktop", mobiledevicetype)) %>%
    select(mobiledevicetype, browsertype, visits)

kable(head(aa_data))
```

### Google Analytics: Ranked Data - Small Site

```{r ranked_data_small_ga, echo=TRUE, message=FALSE, warning=FALSE}
# Google Analytics
time_ranked_ga <- system.time(ga_data <- ranked_data_ga(ga_view_id, start_date_year, end_date))
kable(head(ga_data))
```

This query took **`r round(time_ranked_aa[[3]],2)`** seconds for the Adobe Analytics API and **`r round(time_ranked_ga[[3]],2)`** seconds for the Google Analytics API.

```{r log_ranked_small, include=FALSE}
# Log the results
results_log[nrow(results_log)+1,] <- c("Small Site: Ranked Values", time_ranked_aa[[3]], time_ranked_ga[[3]])
```

## Now, A Much Larger Site

```{r setup_large, include=FALSE}
# Set up the tests for the larger site

# Adobe Analytics
aa_username <- Sys.getenv("ADOBE_API_USERNAME_M")
aa_secret <- Sys.getenv("ADOBE_API_SECRET_M")
aa_rsid <-Sys.getenv("ADOBE_RSID_M")

# Authorize Adobe Analytics
SCAuth(aa_username, aa_secret)

# Google Analytics
ga_view_id <- Sys.getenv("GA_VIEW_ID_M")

```

This is a much larger site -- primarily an Adobe Analytics shop, but they also have Google Analytics implemented (lightly), so it suits our purposes. Again, the first query is the daily data for 365 days. We're not going to show the code here, because it's identical to the code above.

### Adobe Analytics: Daily Trend - Large Site

```{r daily_data_large_aa, echo=FALSE, message=FALSE}
# Adobe Analytics
time_daily_large_aa <- system.time(aa_data <- daily_data_aa(aa_rsid, start_date_year, end_date))
aa_data <- aa_data %>% 
  select(datetime, uniquevisitors, visits, pageviews) %>% 
  mutate(datetime = as.Date(datetime))
kable(head(aa_data))
```

### Google Analytics: Daily Trend - Large Site

```{r daily_data_large_ga, echo=FALSE, message=FALSE}
# Google Analytics
time_daily_large_ga <- system.time(ga_data <- daily_data_ga(ga_view_id, start_date_year, end_date))
kable(head(ga_data))
```

This query took **`r round(time_daily_large_aa[[3]],2)`** seconds for the Adobe Analytics API and **`r round(time_daily_large_ga[[3]],2)`** seconds for the Google Analytics API.

```{r log_daily_large, include=FALSE}
# Log the results
results_log[nrow(results_log)+1,] <- c("Large Site: Daily Trends", time_daily_large_aa[[3]], time_daily_large_ga[[3]])
```

Now, let's pull the **device type** and **browser** data -- also for a year.

### Adobe Analytics: Ranked Data - Large Site

```{r ranked_data_large_aa, echo=FALSE, message=FALSE, warning=FALSE}
# Adobe Analytics
time_ranked_large_aa <- system.time(aa_data <- ranked_data_aa(aa_rsid, start_date_year, end_date))

# We're not counting this in the time calculation -- just making it a bit more apples-to-apples with
# Google Analytics.
  aa_data <- aa_data %>% 
    arrange(-visits) %>% 
    mutate(mobiledevicetype = ifelse(mobiledevicetype=="Other", "Desktop", mobiledevicetype)) %>%
    select(mobiledevicetype, browsertype, visits)

kable(head(aa_data))
```

### Google Analytics: Ranked Data - Large Site

```{r ranked_data_large_ga, echo=FALSE, message=FALSE, warning=FALSE}

# Google Analytics
time_ranked_large_ga <- system.time(ga_data <- ranked_data_ga(ga_view_id, start_date_year, end_date))

ga_data <- ga_data %>% 
  arrange(-sessions)

kable(head(ga_data))
```

This query took **`r round(time_ranked_large_aa[[3]],2)`** seconds for the Adobe Analytics API and **`r round(time_ranked_large_ga[[3]],2)`** seconds for the Google Analytics API.

```{r log_ranked_large, include=FALSE}
# Log the results
results_log[nrow(results_log)+1,] <- c("Large Site: Ranked Values", time_ranked_large_aa[[3]], time_ranked_large_ga[[3]])
```

## Results Summary

Below is a summary of the results.

```{r results, echo=FALSE}

results_log <- results_log %>% 
  mutate(aa_time = round(as.numeric(aa_time),1), ga_time = round(as.numeric(ga_time),1))

names(results_log) <- c("Query", "Adobe Analytics Time (seconds)", "Google Analytics Time (seconds)")

kable(results_log)

# # Convert to long
# results_long <- gather(results_log, platform, time, `Adobe Analytics Time (seconds)`, `Google Analytics Time (seconds)`)
# 
# results_plot <- ggplot(data = results_long, mapping = aes(x=`Query`, y=time)) +
#   geom_bar(stat="identity")
# 
# results_plot

```

Google Analytics got killed on the ranked data. For a large site, and with the "anti-sampling" flag being used in the query, that is because there were many calls to the API. If the site was a GA360 site with a custom table set up, it would have come back much faster (but custom tables only go back 30 days, so, even with GA360, this may be a tough one.)
