Let's Pull Some Data

With four commands in a script, we're going to pull daily data from 01.01.2016 through 08.11.2016.

## Load the package we need to pull Google Analytics
library(googleAnalyticsR)
## Authenticate (gotta' have access to the data!)
ga_auth()
## Specify the view we're going to use data from
view_id <- 81054104
## Get the data!
web_data <- google_analytics_4(view_id, 
                                date_range = c("2016-01-01", as.character(Sys.Date()-1)),
                                metrics = c("sessions","pageviews",
                                            "entrances","bounces","hits","totalEvents"),
                                dimensions = c("date","deviceCategory",
                                               "channelGrouping"),
                                anti_sample = TRUE)

Here's what the data looks like.

date deviceCategory channelGrouping sessions pageviews entrances bounces hits totalEvents
2016-01-01 desktop (Other) 12 17 12 6 21 4
2016-01-01 desktop Direct 840 906 827 795 1028 92
2016-01-01 desktop Display 320 473 318 187 698 221
2016-01-01 desktop Email 4 11 4 1 18 7
2016-01-01 desktop Organic Search 113 428 113 15 791 361
2016-01-01 desktop Paid Search 690 1174 687 391 1799 606
2016-01-01 desktop Print 71 107 71 18 160 53
2016-01-01 desktop Referral 77 225 76 18 396 162
2016-01-01 desktop Social 6 6 6 3 10 4

Let's correlate the metrics!

Three lines of code:

## We only want the metrics (not the dimensions), so let's make a "metrics_only" object.
metrics_only <- select(web_data, -date, -deviceCategory, -channelGrouping)
## Some quick cleanup (because the column names don't come out of Google prettily).
names(metrics_only) <- c("Sessions","Pageviews","Entrances","Bounces","Hits","Total Events")
## Now, create a "corrgram" of the data.
corrgram(metrics_only, 
         lower.panel = NULL, 
         upper.panel = panel.pts, 
         gap = 1.1)

And…here's our corrgram (01.01.2016 to 08.11.2016)

What if we wanted the correlation coefficients?

We plotted the original corrgram with this line of code:

corrgram(metrics_only, 
         lower.panel = NULL, 
         upper.panel = panel.pts, 
         gap = 1.1)

Let's make a slight tweak to get the correlation coefficients included:

corrgram(metrics_only, 
         lower.panel = panel.cor, 
         upper.panel = panel.pts, 
         gap = 1.1)

The Result (01.01.2016 to 08.11.2016)

Correlate Sessions Across Channels

Start by "pivoting" the data to get channels across columns.

## Get only desktop rows, and the date, channelGrouping and sessions columns
pivoted <- web_data %>% 
  filter(deviceCategory == "desktop") %>% 
  select(date, channelGrouping, sessions) %>%
  spread(channelGrouping, sessions)
## Figure out what the top 10 channelGroupings are by total sessions
top10_channels <- group_by(web_data,channelGrouping) %>% 
  summarise(total_sessions = sum(sessions)) %>% 
  arrange(-total_sessions) %>%
  top_n(10) %>%
  select(channelGrouping)
## Take ALL the pivoted data and get JUST the top 10 channels
pivoted <- pivoted[,c("date",as.character(top10_channels$channelGrouping))]
## Get rid of any NAs and replace with 0s
pivoted[is.na(pivoted)] <- 0

That Gives Us Data That Looks Like This (01.01.2016 to 08.11.2016)

date Paid Search Direct Display Print Organic Search Referral Video Social (Other) Email
2016-01-01 690 840 320 71 113 77 2 6 12 4
2016-01-02 506 854 197 105 126 66 0 6 9 10
2016-01-03 510 935 281 79 147 68 0 5 14 13
2016-01-04 682 901 353 111 246 132 2 16 22 17
2016-01-05 526 847 190 199 237 100 2 11 7 16
2016-01-06 521 907 86 129 299 114 10 5 12 24
2016-01-07 563 896 79 128 286 121 3 16 15 26
2016-01-08 578 855 67 150 221 108 1 6 11 18
2016-01-09 705 1033 60 116 145 103 1 6 10 21

Now We Can Generate a Corrgram of Sessions Between Channels

And…We can Add Correlation Coefficients!

Anomaly Detection of Organic Search

Anomaly Detection of Organic Search

Extract just the Desktop / Organic Search data:

# Filter to just desktop / organic search data and get just the date and sessions columns
organic_search <- filter(web_data,
                         deviceCategory == "desktop", 
                         channelGrouping == "Organic Search") %>%
                  select(date, sessions)

# Convert the date (string) to POSIXct (date) format.
organic_search$date <- as.POSIXct(organic_search$date)
# Load the AnomalyDetection library
library(AnomalyDetection)
# Run Anomaly Detection on that data frame
organic_search_anomalies <- AnomalyDetectionTs(organic_search, plot=TRUE)
# Plot the Data with Anomalies Highlighted
organic_search_anomalies$plot

Anomaly Detection of Organic Search

Time-Series Trend of all Channels (Sessions)

## Make a "Time-Series" Object
web_data_ts <- ts(pivoted[-1], frequency = 7)
## Time-series are set up to have useful plots
par(col="#003399", cex.lab=0.8, cex.main=0.9)
plot(web_data_ts, axes = FALSE, main = "Daily Sessions by Channel",
     xlab = "Date")

Time-Series Trend of all Channels (Sessions)

Decomposition of Organic Search

Holt-Winters Forecast for Organic Search

Holt-Winters Forecast for Organic Search