This is a vignette about taking control of all that data Google collects on us and being able to use it when and how we want. As a first step, download your Location History from your Google account, which is in a json file format.
We want to import it and convert it into a more useful format, in order to make it ready to be used for whatever purpose you want. For this vignette the output is to graph it on a map, but the real benefit is to have the data in an easily available dataframe in R. That will allow you to include it as part of a broader data analysis exercise such as analysing for association with other variables or mapping associations with other people.
Along the way I want to mention some of the difficulties I had and mistakes I made, for all those who are learning too!
First of all there are a number of packages required to load, manipulate and map the data:
library(jsonlite)
library(dplyr)
library(leaflet)
library(leaflet.extras)
The next thing is to load the json file into R. This can be more difficult than it sounds. I started trying to use the rjson package, but that did not allow me to import into an R dataframe easily. From reading I saw that perhaps the jsonlite package would be better, so that’s what is installed above, and importing was very straightforward. The below code firstly imports the data and then extracts the different variables that are all second level to the ‘location’ tag in the file.
LocData= fromJSON(txt = "Loc History med.json")
LocF= LocData$locations
head(LocF,10)
## timestampMs latitudeE7 longitudeE7 accuracy altitude verticalAccuracy
## 1 1525490082191 -339035154 1511418917 16 62 2
## 2 1525489703185 -339034859 1511419409 17 62 2
## 3 1525489325482 -339035326 1511418678 16 62 2
## 4 1525488965330 -339035326 1511418678 16 62 2
## 5 1525488604168 -339049977 1511437787 500 NA NA
## 6 1525488243774 -339049977 1511437787 500 NA NA
## 7 1525487871132 -339035386 1511418635 16 61 2
## 8 1525487500936 -339035386 1511418635 16 61 2
## 9 1525487483075 -339049977 1511437787 500 NA NA
## 10 1525487104204 -339049977 1511437787 500 NA NA
## activity velocity heading
## 1 NULL NA NA
## 2 NULL NA NA
## 3 NULL NA NA
## 4 NULL NA NA
## 5 NULL NA NA
## 6 NULL NA NA
## 7 NULL NA NA
## 8 NULL NA NA
## 9 NULL NA NA
## 10 NULL NA NA
At this stage the timestamp in particular doesn’t look very human friendly, and for that matter the latitude and longitude are not as we would normally see them. Timestamp data is in Unix Epoch time, being the number of seconds since midnight at the start of 1 January 1970 (one of the more intriguing pieces of trivia I have learnt in data science so far!).
##Convert the position and time stamps into a more readable form
LocGood <- LocF %>%
mutate(lat = latitudeE7 / 10000000, lon = longitudeE7 / 10000000) %>%
mutate(timestampMs = as.numeric(timestampMs)) %>%
mutate(Date = as.POSIXct(timestampMs/1000, origin="1970-01-01"))
##Extract just the location and time vectors as that is my main interest here, and add the accuracy reading
##for location in case I need to check for dodgy data
LocGood1=LocGood[,10:12]
LocGood1$accuracy = LocGood$accuracy
head(LocGood1,10)
## lat lon Date accuracy
## 1 -33.90352 151.1419 2018-05-05 13:14:42 16
## 2 -33.90349 151.1419 2018-05-05 13:08:23 17
## 3 -33.90353 151.1419 2018-05-05 13:02:05 16
## 4 -33.90353 151.1419 2018-05-05 12:56:05 16
## 5 -33.90500 151.1438 2018-05-05 12:50:04 500
## 6 -33.90500 151.1438 2018-05-05 12:44:03 500
## 7 -33.90354 151.1419 2018-05-05 12:37:51 16
## 8 -33.90354 151.1419 2018-05-05 12:31:40 16
## 9 -33.90500 151.1438 2018-05-05 12:31:23 500
## 10 -33.90500 151.1438 2018-05-05 12:25:04 500
Now we have the time and place in a format that it can be easily referenced and used for any analysis. In this case there are 66,441 observations running for the six months ending in May 2018 (when I switched off the location tracker for my phone :)).
There appear to be a number of ways to visualise location data, but using the leaflet package seems to be powerful and popular. The code below uses a base map from the provider CartoDB, via the addProviderTiles verb of leaflet, and then builds a heat map of my movements during these six months over the top. In the leaflet package there are many base map providers, so that the look and feel of the picture can be chosen as you like.
myMap = leaflet(LocGood1) %>%
addProviderTiles(providers$CartoDB.Positron) %>%
fitBounds(~min(lon), ~min(lat), ~max(lon), ~max(lat)) %>%
addHeatmap(lng = ~lon, lat = ~lat, group = "HeatMap", blur = 20, max = 0.01, radius = 15) %>%
addMarkers(data = LocGood1, ~lon, ~lat, clusterOptions = markerClusterOptions(), group = "Points")
myMap
Remember that I have not examined the raw data so far, so creating a map in the fairly early stages of using the data is actually a good way to check for outliers. The map above shows that there were five location readings in Japan in this time, although I did not go there. Therefore before finalising my movement map it is necessary to remove those outliers. This can be done easily by removing rows in the dataframe where the latitude is positive i.e. northern hemisphere, and then re-mapping.
LocGood1 = LocGood1[!LocGood1$lat>0,]
myMap = leaflet(LocGood1) %>%
addProviderTiles(providers$CartoDB.Positron) %>%
fitBounds(~min(lon), ~min(lat), ~max(lon), ~max(lat)) %>%
addHeatmap(lng = ~lon, lat = ~lat, group = "HeatMap", blur = 20, max = 0.01, radius = 15) %>%
addMarkers(data = LocGood1, ~lon, ~lat, clusterOptions = markerClusterOptions(), group = "Points")
myMap
And that indeed shows the world that in these six months I was a relative home-body, not leaving the state of NSW or venturing too far from the coast. But ate least I now have granular and easily usable data to more closely examine my movements in that time in relation to other topics.
Smythe, J Map your Google Location Data with R Shiny, https://www.cultureofinsight.com/blog/2018/01/31/2018-01-31-map-your-google-location-data-with-r-shiny/
RDocumentation 2019 addProviderTiles, https://www.rdocumentation.org/packages/leaflet/versions/2.0.2/topics/addProviderTiles
CRAN 2019 Getting started with JSON and jsonlite https://cran.r-project.org/web/packages/jsonlite/vignettes/json-aaquickstart.html
Shirin’s playgRound December 2016 How to map your Google location history with R https://www.r-bloggers.com/how-to-map-your-google-location-history-with-r/