How many data points did Google collect about me?


The inspiration for the project was the “Quantified Self” article shared in class. The Quantified Self (QS) is a movement motivated to leverage the synergy of wearables, analytics, and “Big Data”. This movement exploits the ease and convenience of data acquisition through the internet of things (IoT) to feed the growing obsession of personal informatics and quotidian data.

After looking into all my personal data, the only data available on me was my google maps data. With the help of google maps takeouts I extracted my travel data from the year 2010 to 2015. Around that time I had also bought my first Andriod smart phone with google maps on it. This data is very fascinating to me since I had a travelling job in India. Starting with the first question: 1. How much data did google maps collect on me on a daily, monthy and yearly basis?


How accurate is the location data?


95% of data seems to be in the High and Medium accuracy category.

Where all did I travel in India?


Since I had a traveling job, we can see a good number of plots all over the country. There were some places I had visited that I do not see

Where all did I travel in my home city and what was the accuracy of my GPS locations?


The plots covered in the city of Mumbai is pretty good

How many Kilometers have I traveled from 2012 to 2015?


My work travel was really busy from 2014 to 2015. I was traveling 4 days a week all over the country

  1. My monthly travel varied a lot based on work travel, hence we can see the quartiles are not equal in the number of data points collected monthly.

  2. Overall GPS accuracy was pretty good

  3. There was a lot of missing data with regards to places I had traveled.

  4. On an average my monthly travel (when I traveled) for work was > 4000 kms.


---
title: "Google Travel History: ANLY 512 Data Visualization Final Project"
output: 
  flexdashboard::flex_dashboard:
    storyboard: true
    source: embed
---

```{r setup, include=FALSE}
library(markdown)
library(flexdashboard)
library(knitr)
library(ggplot2)
library(tidyverse)
library(readxl)
library(dplyr)
library(xts)
library(zoo)
library(lubridate)
library(jsonlite)
library(raster)
x <- fromJSON("~/Downloads/Takeout 2/Location History/Location.json")
str(x)

loc = x$locations
loc$time = as.POSIXct(as.numeric(x$locations$timestampMs)/1000, origin = "1970-01-01")
loc$lat = loc$latitudeE7 / 1e7
loc$lon = loc$longitudeE7 / 1e7

head(loc)
nrow(loc)
library(devtools)
devtools::install_github("rstudio/rmarkdown")
min(loc$time)
max(loc$time)

library(lubridate)
library(zoo)

loc$date <- as.Date(loc$time, '%Y/%m/%d')
loc$year <- year(loc$date)
loc$month_year <- as.yearmon(loc$date)

points_p_day <- data.frame(table(loc$date), group = "day")
points_p_month <- data.frame(table(loc$month_year), group = "month")
points_p_year <- data.frame(table(loc$year), group = "year")

nrow(points_p_day)

nrow(points_p_month)

nrow(points_p_year)

library(ggplot2)
library(ggmap)
```

-----------------------------------------------------------------------
### How many data points did Google collect about me?

```{r}
my_theme <- function(base_size = 12, base_family = "sans"){
  theme_grey(base_size = base_size, base_family = base_family) +
    theme(
      axis.text = element_text(size = 9),
      axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1),
      axis.title = element_text(size = 14),
      panel.grid.major = element_line(color = "grey"),
      panel.grid.minor = element_blank(),
      panel.background = element_rect(fill = "aliceblue"),
      strip.background = element_rect(fill = "lightgrey", color = "grey", size = 1),
      strip.text = element_text(face = "bold", size = 12, color = "navy"),
      legend.position = "right",
      legend.background = element_blank(),
      panel.margin = unit(.5, "lines"),
      panel.border = element_rect(color = "grey", fill = NA, size = 0.5)
    )
}
points <- rbind(points_p_day[, -1], points_p_month[, -1], points_p_year[, -1])

ggplot(points, aes(x = group, y = Freq)) + 
  geom_point(position = position_jitter(width = 0.2), alpha = 0.3) + 
  geom_boxplot(aes(color = group), size = 1, outlier.colour = NA) + 
  facet_grid(group ~ ., scales = "free") + my_theme() +
  theme(
    legend.position = "none",
    strip.placement = "outside",
    strip.background = element_blank(),
    strip.text = element_blank(),
    axis.text.x = element_text(angle = 0, vjust = 0.5, hjust = 0.5)
  ) +
  labs(
    x = "",
    y = "Number of data points",
    subtitle = "Number of data points per day, month and year"
  )

```

***

The inspiration for the project was the "Quantified Self" article shared in class. 
The Quantified Self (QS) is a movement motivated to leverage the synergy of wearables, analytics, and "Big Data". This movement exploits the ease and convenience of data acquisition through the internet of things (IoT) to feed the growing obsession of personal informatics and quotidian data. 

After looking into all my personal data, the only data available on me was my google maps data. With the help of google maps takeouts I extracted my travel data from the year 2010 to 2015. Around that time I had also bought my first Andriod smart phone with google maps on it. This data is very fascinating to me since I had a travelling job in India. Starting with the first question: 1. How much data did google maps collect on me on a daily, monthy and yearly basis?

- On a daily basis Google collected between 0 and 1500 data points (median ~250)

- On a monthly basis between 0 and 40,000 (median ~1000)

- On a yearly basis between 10,000 and 250,000 (median ~40,000)
    
-----------------------------------------------------------------------
### How accurate is the location data?

```{r}
accuracy <- data.frame(accuracy = loc$accuracy, group = ifelse(loc$accuracy < 800, "High", ifelse(loc$accuracy < 5000, "Medium", "Low")))

accuracy$group <- factor(accuracy$group, levels = c("High", "Medium", "Low"))

ggplot(accuracy, aes(x = accuracy, fill = group)) + 
  geom_histogram() + 
  facet_grid(group ~ ., scales="free") + 
  my_theme() +
  theme(
    legend.position = "none",
    strip.placement = "outside",
    strip.background = element_blank(),
    axis.text.x = element_text(angle = 0, vjust = 0.5, hjust = 0.5)
  ) +
  labs(
    x = "Accuracy in metres",
    y = "Count",
    subtitle = "Histogram of accuracy of location points"
  )
```

***

95% of data seems to be in the High and Medium accuracy category. 

- Low accuracy was probably from areas with bad satellite reception

- Also, the low accuracy is not too far from 0. I would consider the overall accuracy to be pretty good. 

### Where all did I travel in India?

```{r}

India <- get_map(location = 'India', zoom = 5)

map <- get_map(location = 'India', zoom = 5)
ggmap(map) + geom_point(data = loc, aes(x = lon, y = lat), alpha = 0.1, color = "blue") + 
  theme(legend.position = "right") + 
  labs(
    x = "Longitude", 
    y = "Latitude")

```

***

Since I had a traveling job, we can see a good number of plots all over the country. There were some places I had visited that I do not see 

- There were many more places I had visited in North East of India that were not recorded

- Also, I was not able plot any of US travel data. My plots were limited to Asia and Europe.



### Where all did I travel in my home city and what was the accuracy of my GPS locations? 

```{r}

Mumbai <- get_map(location = 'Mumbai', zoom = 10)

options(stringsAsFactors = T)
ggmap(Mumbai) + 
  stat_summary_2d(geom = "tile", bins = 100, data = loc, aes(x = lon, y = lat, z = accuracy), alpha = 0.3) + 
  scale_fill_gradient(low = "blue", high = "red", guide = guide_legend(title = "Accuracy")) +
  labs(
    x = "Longitude", 
    y = "Latitude", 
    title = "Location history data points around Mumbai")

```

***

The plots covered in the city of Mumbai is pretty good

- Since I lived 15 miles from the city of Bombay, a lot of road traveling in, out and within the city of Mumbai from Navi Mumbai can be seen

- Color scale shows accuracy (low: red, high: purple)


### How many Kilometers have I traveled from 2012 to 2015?

```{r}

loc3 <- with(loc, subset(loc, loc$time > as.POSIXct('2009-01-01 0:00:01')))
loc3 <- with(loc, subset(loc3, loc$time < as.POSIXct('2015-12-22 23:59:59')))

# Shifting vectors for latitude and longitude to include end position
shift.vec <- function(vec, shift){
  if (length(vec) <= abs(shift)){
    rep(NA ,length(vec))
  } else {
    if (shift >= 0) {
      c(rep(NA, shift), vec[1:(length(vec) - shift)]) }
    else {
      c(vec[(abs(shift) + 1):length(vec)], rep(NA, abs(shift)))
    }
  }
}

loc3$lat.p1 <- shift.vec(loc3$lat, -1)
loc3$lon.p1 <- shift.vec(loc3$lon, -1)

# Calculating distances between points (in metres) with the function pointDistance from the 'raster' package.

loc3$dist.to.prev <- apply(loc3, 1, FUN = function(row) {
  pointDistance(c(as.numeric(as.character(row["lat.p1"])),
                  as.numeric(as.character(row["lon.p1"]))),
                c(as.numeric(as.character(row["lat"])), as.numeric(as.character(row["lon"]))),
                lonlat = T)
})
# distance in km


distance_p_month <- aggregate(loc3$dist.to.prev, by = list(month_year = as.factor(loc3$month_year)), FUN = sum)
distance_p_month$x <- distance_p_month$x*0.001
ggplot(distance_p_month[-1, ], aes(x = month_year, y = x,  fill = month_year)) + 
  geom_bar(stat = "identity")  + 
  guides(fill = FALSE) +
  my_theme() +
  labs(
    x = "",
    y = "Distance in kilometers",
    title = "Distance traveled per month from 2012 to 2015"
  )


```

*** 
My work travel was really busy from 2014 to 2015. I was traveling 4 days a week all over the country 

- There was a peak in the month of Jan 2014 when I traveled 12000 kms

- I wish I could track if my product sales increased with my extensive travel for presentations. The company could probably measure if money spent was worth it. 

- Conclusion
Based on visualization, the following conclusions can be drawn

1. My monthly travel varied a lot based on work travel, hence we can see the quartiles are not equal in the number of data points collected monthly.

2. Overall GPS accuracy was pretty good

3. There was a lot of missing data with regards to places I had traveled.

4. On an average my monthly travel (when I traveled) for work was > 4000 kms.