This short note accompanies “The voice of users - analysing mobile app reviews” post on UXBooth and shows how to download user app reviews from iTunes Store and Google Play.

How to Download Apple App Store Reviews with GNU R

Apple provides an RSS feed to retrieve user reviews from the iOS app store. This feed makes it quite easy to download reviews for further analysis. Here’s some code to get you started.

What you need to know before you start: - you can only go back 10 pages for a given filter, so may want to save old reviews locally. - you need to retrieve by app store country.

To find the reviews just insert the right values to the following URL format: https://itunes.apple.com/CODE/rss/customerreviews/page=1/id=APPID/sortby=mostrecent/FILETYPE

CODE is an App Store territory code. The complete list is available here.

APPID is a 9 digit number unique for each app.

FILETYPE can be xml or json.

Find the appIds you are interested in

Before you can get started downloading the reviews, you will need to find out the appId for the apps you are interested in. HOW TO DO IT

require(RJSONIO)
## Loading required package: RJSONIO
df.apps = data.frame(
  appName <- c("spotify","facebook"), #App names
    appId <- c(324684580,284882215), #list of app Id's we want to fetch
  stringsAsFactors = F
);
countries <- c("US","UK"); # you can add more app store countries here if you like

# c("AE","AG","AI","AL","AM","AO","AR","AT","AU","AZ","BB","BE","BF","BG","BH","BJ","BM","BN","BO","BR","BS","BT","BW","BY","BZ","CA","CG","CH","CL","CN","CO","CR","CV","CY","CZ","DE","DK","DM","DO","DZ","EC","EE","EG","ES","FI","FJ","FM","FR","GB","GD","GH","GM","GR","GT","GW","GY","HK","HN","HR","HU","ID","IE","IL","IN","IS","IT","JM","JO","JP","KE","KG","KH","KN","KR","KW","KY","KZ","LA","LB","LC","LK","LR","LT","LU","LV","MD","MG","MK","ML","MN","MO","MR","MS","MT","MU","MW","MX","MY","MZ","NA","NE","NG","NI","NL","NO","NP","NZ","OM","PA","PE","PG","PH","PK","PL","PT","PW","PY","QA","RO","RU","SA","SB","SC","SE","SG","SI","SK","SL","SN","SR","ST","SV","SZ","TC","TD","TH","TJ","TM","TN","TR","TT","TW","TZ","UA","UG","US","UY","UZ","VC","VE","VG","VN","YE","ZA","ZW") #A complete list of iTunes territories as of January 2016

For our example, this data frame then looks like so:

## Loading required package: knitr
appName….c..spotify….facebook.. appId….c.324684580..284882215.
spotify 324684580
facebook 284882215

Fetch All Reviews for an iOS App

df.reviews <- data.frame(
  review_id = character(0),
  appName = character(0),
  appId = character(0),
  appVersion = character(0),
  country = character(0),
  author = character(0),
  rating = numeric(0),
  title = character(0),
  content = character(0),
  stringsAsFactors=FALSE
);

fetch_all_reviews <- function(appName, appId, country) {
  for (page in 1:10) { 
    url <- paste0('https://itunes.apple.com/', country, '/rss/customerreviews/page=', page,
                  '/id=', appId, '/sortby=mostrecent/json');
    response <- fromJSON(url);
    entries <- response$feed$entry;
    for (i in 1:length(entries)) {
      item <- entries[[i]];
      if (is.null(item$author)) {
        # doesn't have author? then this entry is not a review.
        next();
      }
      review <- c(
        item$id,
        appName,
        appId,
        item$`im:version`,
        country,
        item$author$name,
        item$`im:rating`,
        item$title,
        item$content$label
      );
      df.reviews[nrow(df.reviews)+1,] <<- review;
    }
  }
}

To check what the results look like, let’s retrieve all ratings for the first app in our dataset (happens to be Spotify) in the first country we listed (US):

fetch_all_reviews(df.apps[1,]$appName, df.apps[1,]$appId, countries[1]);
kable(head(df.reviews,n=5));
review_id appName appId appVersion country author rating title content
1333215925 spotify 324684580 4.9.0 US Phatt Babii 5 I love it! I love that i can save and play songs offline
1333215439 spotify 324684580 4.9.0 US Gianna Sayers 5 Spotify Spotify is a great app! It allows you to listen to music without having to purchase it. One thing I would suggest is updating the app so when you start your music you could have a timer that turns your music off after a certain time limit. I know I like to listen to music as a fall asleep at night and some nights I wake up with my music still playing.
1333212466 spotify 324684580 4.9.0 US thatonebrandon 5 cool cool
1333210536 spotify 324684580 4.9.0 US Tp18484 5 Good app Pretty good
1333205361 spotify 324684580 4.9.0 US Bryan Escalona 5 Great App Awesome Music Listening App

(I only showed the first few results here to save space)

Putting it All Together: Bulk App Store Review Download with R

Now it’s time to prepare a coffee, before doing the final step: loop over all the countries you are interested in, for all the apps you need to find out more about. Then enjoy:

bulk_review_download <- function() {
  for (country in countries) {
    by(df.apps, 1:nrow(df.apps), function(app) {
      print(paste("Downloading", app$appName, "for", country, sep=" "));
      fetch_all_reviews(app$appName, app$appId, country);
    });
  }
}
# bulk_review_download()

Google Play:

The best way to access android user reviews is through an unofficial API. The reviews can be accessed either directly through the website, or can be downloaded using instructions and the API key provided with registration using bulk operations with some web parsing software (can also be done with R). The API costs just 10 euro per month but is not free of bugs, therefore, a careful analysis of the downloaded reviews is required (e.g. to exclude double entries). Despite those inconveniences it is still a great tool. A sample code to fetch reviews from Google Play using the API can be found below.

In the first step a connection with MASHAPE API is established. To deal with API’s minor bugs a number of fixes is included in the code.

library(RCurl);
library(jsonlite);

urlPattern <- 'https://gplaystore.p.mashape.com/applicationReviews?id=%s&lang=en&page=%d';
progressMessage <- '%s : page %d downloaded; %d reviews. First author: %s';
headers <- c(
  'X-Mashape-Key' = 'YOUR MASHAPE KEY',
  'Accept' = 'application/json'
)

helper_tryFetchJSON <- function(pageUrl) {
  out <- tryCatch({
      got <- getURL(pageUrl, httpheader = headers, ssl.verifyhost = 0L, ssl.verifypeer = 0L); #WARNING!! - ssl.verifyhost = 0L, ssl.verifypeer = 0L       avoid SSL, which may not be a good idea for security reasons, but solves the issues that RCurl packages has with SSL on Windows. For Unix based OS the code would run without those two arguments/
      pageJSON <- fromJSON(got);
      return(pageJSON);
  },
  error=function(cond) {
    message(paste("error downloading ", pageUrl));
    message(cond);
    return(NA);
  });
  return(out);
}

fetchAllReviews <- function(appId) {
  page<-1;
  results <- NULL;
  repeat 
  {
    pageUrl <- sprintf(urlPattern, appId, page);
    try <- 0;
    repeat {
      pageJSON <- helper_tryFetchJSON(pageUrl);
      if (!is.na(pageJSON)) {
        break; # success!
      } else {
        try <- try+1;
        if (try <= 20) {
          Sys.sleep(2);
        }
        else {
          stop("The internet is not working. Turn off/on?")
        }
      }
    }
    if (!is.null(pageJSON$error)) {
      # this page got an error, most likely it means no more reviews
      break;
    }

    # R does not like nested lists in dataframes
    pageContent <- flatten(pageJSON)
    pageContent$isoDate <- as.Date(pageJSON$date, format = "%B %d, %Y")

    print(sprintf(progressMessage, appId, page, nrow(pageContent), pageContent[1,"author.name"]));
    # append new page to existing results
    results <- rbind(results, pageContent, make.row.names = F);
    page <- page+1;

    if (nrow(pageJSON) < 40) {
      # API bug: when we fetch a page after the end of results, don't get an error right away -
      # just repeats the last page of results
      break;
    }
  }
  print('-----------------------------------------------');
  return(results);
}

Once the connection is established we can select the apps we are interested in and download the data.

appIds <- c("com.spotify.music", "com.facebook.katana"); #Two popular apps used as an example

z <- lapply(appIds, function(appId) {
  reviews <- fetchAllReviews(appId);
  reviews$appId <- appId;
  return(reviews);
});


outDs <- do.call("rbind", z);
write.csv(outDs, './androidFetch.csv', row.names = F);