Intro: Review of Airline Site Performance

Online shoppers everywhere prefer fast sites. How well are the airlines doing in optimizing their booking experiences for page speed? In this file, we take a first look at how well airlines are doing for page speed and what implementation changes could improve matters the most.

This is quite preliminary; we are working on an extended post to the flexponsive blog - if you have any feedback on what we’re doing, please contact flexponsive!

The dataset

We use the same sample as in our study of airline web visits and passenger volume, which contains 40 airlines - the 10 leading carriers in the world and a number of European LCCs (see the earlier post for more discussion and a list of airlines included).

For each of these airlines, we now attempt to retrieve performance metrics using the Google Page Speed Insights API (PSI); we are able to retrieve PSI reports for all but one airline who apparently blocks the PSI request. Moreover, one airline page was consistently erroring out (i.e. 38 airlines in remain in the sample).

Parsing Google Page Speed Insights with GNU R

This sample code shows how parse the JSON response of the PSI API using the RCurl and jsonlite modules in R. This is relatively straightforward, with a “gotcha” that the structure of the PSI response object for pageStats depends on the technologies used on the page.

Commented source code here:

library(RCurl);
library(jsonlite);

# returns PSI json file, if required downloads it (SLOW) else returns from local cache.
fetchPSI <- function(domain) {
  psiEndpoint <- paste("https://www.googleapis.com/pagespeedonline/v1/runPagespeed?url=http://", domain, sep ="");
  localFile <- paste("out/SM_psi/", domain, ".json", sep="");
  if(!file.exists(localFile)) {
    Sys.sleep(0.5);
    print(paste("Downloading PSI for:", localFile, sep=" "));
    psi<-getURL(psiEndpoint);
    cat(psi, file = localFile, append = F)    
  }
  psi <- readLines(localFile);
  return(psi);
}

# pagestats object varies by row, so we need to fix it
defaultPageStats = c(
  "numberResources" = 0,
  "numberHosts" = 0,
  "textResponseBytes" = 0,
  "totalRequestBytes" = 0,
  "numberStaticResources" = 0,
  "htmlResponseBytes" = 0,
  "cssResponseBytes"= 0,
  "imageResponseBytes" = 0,
  "javascriptResponseBytes" = 0,
  "flashResponseBytes" = 0,
  "otherResponseBytes" = 0,
  "numberJsResources" = 0,
  "numberCssResources" = 0);

parse_domain_psi <- function(domain) {
  psi <- fromJSON(fetchPSI(domain));
  if (is.null(psi$score)) {
    return(NULL);
  }
  
  # we want to get impacts for each rule and page stats
  impacts<- lapply(psi$formattedResults$ruleResults,
    function(result) {
      return(as.numeric(result$ruleImpact));
  });
  
  # and stats on the individual page items (how many, how heavy - for each type)
  # --------------------------------------------------------------------------------------------
  # BEWARE - the JSON object returned by PSI has a different structure depending on the
  # site, e.g. may be missing individual keys (like flashResponseBytes) if a certain technology
  # is not used. 
  # --------------------------------------------------------------------------------------------
  # To keep everything nicely rectangular, we merge the response for the individual
  # site with a list containing all possible keys with value zero for each
  pageStats <- sort(merge.list(unlist(psi$pageStats),defaultPageStats));
  res<- append(
      unlist(impacts), 
      unlist(pageStats[order(names(pageStats))])
  );

  res[["URL"]] <- domain;
  res[["score"]] <- psi$score;
  return(res);
}

Airline Booking Pages: Page Size

Let’s take a look at airline page size, excluding airlines that have a language selector page.

# collect also aggregate page size
pageSizeItems <-  c("htmlResponseBytes", "cssResponseBytes", "imageResponseBytes", "javascriptResponseBytes", "flashResponseBytes", "otherResponseBytes")
dds$totalResponseBytes <- rowSums(dds[,pageSizeItems]);

# airlines without language selector
dds_nols <- dds[which(dds$language.selector == 0),]

q<-as.data.frame(sapply(dds_nols[,pageSizeItems],mean))
colnames(q) <- c("meanBytes");
q$meanMB <- q$meanBytes / (1024^2);
q$type <- rownames(q);
# drop technologies never used: flash
q<- q[-which(q$meanMB == 0),]

q<- q[order(-q$meanBytes),];

q$type <- rownames(q)
q$type <- reorder(q$type, - q$meanBytes)

# add values inside the pie
q$pos <- with(q, ave(meanMB, FUN = function(x) cumsum(x) - 0.5*x))
q$label <- paste(round(q$meanMB,2), "MB");
q$axisLabels <- round(cumsum(q$meanMB),2)
q$axisLabels[4] <- ''
q[q$meanMB<0.30,]$label <- ''

ggplot(q) +
   geom_bar(aes(x="", y = meanMB, fill = q$type), stat = "identity") +
   scale_fill_discrete(
     name="",
     labels = paste( 
       gsub("ResponseBytes", "", q$type), 
       ' (', round(q$meanBytes/sum(q$meanBytes)*100), '%)  ', sep="")) +
  scale_y_continuous(
    breaks = cumsum(q$meanMB),
    labels = q$axisLabels
  ) +
  geom_text(aes(x="",y = pos, label = label)) +
  coord_polar("y", start = 0) + ylab("") + xlab("") + 
  ggtitle("Download Size of Airline Booking Sites") +
  theme(panel.background = element_rect(fill = "white"), 
    text = element_text(size=20, colour="black"), legend.position ="bottom")

Fine the resulting figure here: Airline page size (pie chart)

The size of the airline page is on average 3.13 MB, with 75% of that being used for JavaScript and images.

Airline Site Performance Issues: An Empirical Look

rules<-c("AvoidLandingPageRedirects", "EnableGzipCompression", 
"LeverageBrowserCaching", "MainResourceServerResponseTime", "MinifyCss", 
"MinifyHTML", "MinifyJavaScript", "MinimizeRenderBlockingResources", 
"OptimizeImages", "PrioritizeVisibleContent");

v<-sapply(rules, FUN = function(x) {
  return(median(dds[,x], na.rm=T));
});
print(sort(v, decreasing=T));
## MinimizeRenderBlockingResources          LeverageBrowserCaching 
##                        12.00000                        11.73752 
##                  OptimizeImages                MinifyJavaScript 
##                         3.84940                         1.14445 
##           EnableGzipCompression                      MinifyHTML 
##                         0.99075                         0.20925 
##                       MinifyCss       AvoidLandingPageRedirects 
##                         0.18150                         0.00000 
##  MainResourceServerResponseTime        PrioritizeVisibleContent 
##                         0.00000                         0.00000
library(reshape);
mdata<-melt(dds[,append(rules,"domain")], id="domain")
mdata$v2 <- factor(mdata$variable, levels= names(sort(v, decreasing=F)));

ggplot(mdata, aes(x = v2 , y=value)) + 
  geom_boxplot(outlier.size=1, outlier.color="red", outlier.shape=16, notch=F) +
  theme(panel.background = element_rect(fill = "white"), 
    text = element_text(size=16, colour="black"),
    axis.text.y = element_text(colour = "black")) + 
  coord_flip() +
  ylab("Rule Impact") + xlab("Page Speed Rule") + ylim(0,25)
## Warning: Removed 17 rows containing non-finite values (stat_boxplot).

ggsave('/tmp/airline-page-size.png');
## Warning: Removed 17 rows containing non-finite values (stat_boxplot).

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.