Introduction


The Quantified Self (QS) is a movement motivated to leverage the synergy of wearables, analytics, and “Big Data”. This movement exploits the ease and convenience of data acquisition through the internet of things (IoT) to feed the growing obsession of personal informatics and quotidian data.

The goal of this project is to answer the following five questions based on my location data histroy and financial data history.

  1. What are the different places I have visited in India?
  2. How much did I travel each month last year?
  3. What is my monthly expenditure for 2018?
  4. Where did I spend the most money last year?
  5. How much did I spend on Gas each month?

Data Prep

Sample fiance dataset
Trans..Date Month Post.Date Description Amount Category
6/20/2018 6 6/20/2018 CARROLL AUTOMOTIVE PITTSBURGH PA 83.46 Automotive
12/14/2018 12 12/14/2018 CARROLL AUTOMOTIVE PITTSBURGH PA 33.17 Automotive
1/10/2018 1 1/10/2018 SPEEDWAY 02424 CAMBRIDGE MA 30.78 Gasoline
1/26/2018 1 1/26/2018 SUNOCO 0016867401 BRIGHTON MA 10.21 Gasoline
1/31/2018 1 1/31/2018 SUNOCO 0016867401 BRIGHTON MA 4.13 Gasoline
2/2/2018 2 2/2/2018 SUNOCO 0016867401 BRIGHTON MA 5.14 Gasoline
2/2/2018 2 2/2/2018 SPEEDWAY 02493 ALLSTON MA 5.61 Gasoline
2/4/2018 2 2/4/2018 SHELL 57544187107 CAMBRIDGE MA 5.19 Gasoline
2/17/2018 2 2/17/2018 SUNOCO 0016867401 BRIGHTON MA 35.53 Gasoline
2/18/2018 2 2/18/2018 SUNOCO 0491247300 ALLENTOWN PA 32.66 Gasoline
2/25/2018 2 2/25/2018 SPEEDWAY 02901 MO MONROEVILLE PA 33.97 Gasoline
3/15/2018 3 3/15/2018 SUNOCO 0513148700 PITTSBURGH PA 34.77 Gasoline
3/30/2018 3 3/30/2018 SUNOCO 0363227005 PITTSBURGH PA 10.07 Gasoline
4/4/2018 4 4/4/2018 GET GO #3130 MOUNT LEBANONPA 34.85 Gasoline
4/7/2018 4 4/7/2018 SHEETZ 022500002253498 BREEZEWOOD PA 3.17 Gasoline
4/19/2018 4 4/19/2018 GET GO #3130 MOUNT LEBANONPA 37.85 Gasoline
5/10/2018 5 5/10/2018 BP#85547191256 BKEYE LK BUCKEYE LAKE OH 32.94 Gasoline
5/12/2018 5 5/14/2018 SHELL 51311550068 CAMBRIDGE CITIN 30.61 Gasoline
5/14/2018 5 5/14/2018 SUNOCO 0363227005 PITTSBURGH PA 36.54 Gasoline
5/26/2018 5 5/26/2018 SUNOCO 0784322002 PITTSBURGH PA 33.89 Gasoline

I used two data for this dataset for this project:

  1. My Google Location History

  2. My financial Data for 2018 (credit card spending data for the year 2018)

In the left you can see a sample of financial data.

For Q4 and Q5, I had to do some data manipulation (the codes of which are included within their respective section).

Q1. What are the different cities I have visited in India?


The image on the left shows all the places I have visited in India over the last four years. The point in the center is my hometown and the point up north is the region called Kashmir. The other two points are Mumbai (west) and New Delhi (Center - North). I went to these places to take connecting flights.

Q2. How much did I travel each month last year?


Next I wanted to analyze more recent location data. The graph on the left shows the total distance traveled for the year 2018 in Kilometeres. We can see that I traveled significantly more during the month of Decemeber 2018. The spike is because of my trip to India in December.

Next I wanted to see if there is a corelation between my spending with and travel.

Q3. What is my monthly expenditure for 2018?


To see my spending history, I used my credit card statement for the year 2018.

From the chart we can see that I spent a significantly higher amount in the month of Novemeber 2018. I booked my tickets to India in the month of Novemeber, thus, the spike in that month. Therefore, even though obvious, from the data I can conclude that there is a corelation between my spending and traveling history.

Q4. Where did I spend most of my money?


The pie chart on the left shows differnt categories in which I spent moeny and the amount of money spent in each category for the year 2018.

Thus, we can see that I spent the most of Travel/Entertainment.

Q5. How much did I spend on Gas each month?


Through the first four questions, I analyzed my travel history and big travel expenses. For the last question, I wanted to check my gasoline expenses. The line graph on the left shows the monthly gasoline expenditure. Thus, we can see that the month that I vacation, I spend less money on gas (as I am not driving).

---
title: "ANLY 512 - Final Project"
author: "Sumeet Sharma"
date: "April 15 2019"
output: 
  flexdashboard::flex_dashboard:
    storyboard: true
    social: menu
    source: embed
    orientation: columns
    vertical_layout: fill
---

```{r setup, include=FALSE}
library(flexdashboard)
library(knitr)
library(ggplot2)
library(tidyverse)
library(readxl)
library(dplyr)
library(xts)
library(zoo)
library(lubridate)
library("jsonlite")
library(ggmap)
library(raster)
library(readr)
```

### Introduction

```{r}
knitr::include_graphics("C:/Users/Sumeet/OneDrive/HU/ANLY 512-91-O 2019- Spring Data Viz/Final Project/G.png")
```

***

The Quantified Self (QS) is a movement motivated to leverage the synergy of wearables, analytics, and "Big Data". This movement exploits the ease and convenience of data acquisition through the internet of things (IoT) to feed the growing obsession of personal informatics and quotidian data.

The goal of this project is to answer the following five questions based on my location data histroy and financial data history.\n

1. What are the different places I have visited in India?\n
2. How much did I travel each month last year?\n
3. What is my monthly expenditure for 2018?\n
4. Where did I spend the most money last year?\n
5. How much did I spend on Gas each month?

### Data Prep

```{r}
YearEndSummary <- read.csv("C:/Users/Sumeet/OneDrive/HU/ANLY 512-91-O 2019- Spring Data Viz/Final Project/YearEndSummary.csv")

kable(YearEndSummary[1:20,], caption="Sample fiance dataset")
```

*** 
I used two data for this dataset for this project:\n

1. My Google Location History \n

2. My financial Data for 2018 (credit card spending data for the year 2018)\n

In the left you can see a sample of financial data.\n

For Q4 and Q5, I had to do some data manipulation (the codes of which are included within their respective section).

### Q1. What are the different cities I have visited in India?

```{r}
ds <- fromJSON('C:/Users/Sumeet/OneDrive/HU/ANLY 512-91-O 2019- Spring Data Viz/Final Project/LocationHistory.json')
loc <- ds$locations
loc$time = as.POSIXct(as.numeric(ds$locations$timestampMs)/1000, origin = "1970-01-01")
loc$lat = loc$latitudeE7 / 1e7
loc$lon = loc$longitudeE7 / 1e7
loc$date <- as.Date(loc$time, '%Y/%m/%d')
loc$year <- year(loc$date)
loc$month_year <- as.yearmon(loc$date)
#remvoing key after kniting for privacy purpose
india <- get_map(location = 'India', zoom = 4)
ggmap(india) + geom_point(data = loc, aes(x = lon, y = lat), alpha = 0.5, color = "red") + 
  theme(legend.position = "right") + 
  labs(
    x = "Longitude", 
    y = "Latitude", 
    title = "Location history data points in India",
    caption = "\nA simple point plot shows recorded positions.")

```

***

The image on the left shows all the places I have visited in India over the last four years. The point in the center is my hometown and the point up north is the region called Kashmir. The other two points are Mumbai (west) and New Delhi (Center - North). I went to these places to take connecting flights.


### Q2. How much did I travel each month last year?

```{r}
loc3 <- with(loc, subset(loc, loc$time > as.POSIXct('2018-01-01 0:00:01')))
loc3 <- with(loc, subset(loc3, loc$time < as.POSIXct('2018-12-22 23:59:59')))
# Shifting vectors for latitude and longitude to include end position
shift.vec <- function(vec, shift){
  if (length(vec) <= abs(shift)){
    rep(NA ,length(vec))
  } else {
    if (shift >= 0) {
      c(rep(NA, shift), vec[1:(length(vec) - shift)]) }
    else {
      c(vec[(abs(shift) + 1):length(vec)], rep(NA, abs(shift)))
    }
  }
}

loc3$lat.p1 <- shift.vec(loc3$lat, -1)
loc3$lon.p1 <- shift.vec(loc3$lon, -1)

# Calculating distances between points (in metres) with the function pointDistance from the 'raster' package.

loc3$dist.to.prev <- apply(loc3, 1, FUN = function(row) {
  pointDistance(c(as.numeric(as.character(row["lat.p1"])),
                  as.numeric(as.character(row["lon.p1"]))),
                c(as.numeric(as.character(row["lat"])), as.numeric(as.character(row["lon"]))),
                lonlat = T) # Parameter 'lonlat' has to be TRUE!
})
# distance in km
distance_p_month <- aggregate(loc3$dist.to.prev, by = list(month_year = as.factor(loc3$month_year)), FUN = sum)
distance_p_month$x <- distance_p_month$x*0.001
#selecting only 2018 months
distance_p_month <- distance_p_month[1:12,]
#extracting only month
distance_p_month$month <- substr(distance_p_month$month_year, start = 1, stop = 3)
#changing order for the graph
distance_p_month$month <- factor(distance_p_month$month, levels = distance_p_month$month)


ggplot(distance_p_month, aes(x=month, y=x)) +   geom_point() + 
  geom_segment( aes(x=month, xend=month, y=0, yend=x)) + 
  labs ( x="", y = "Distance in KM", title = "Distance traveled per month in 2018")

```

***
Next I wanted to analyze more recent location data. The graph on the left shows the total distance traveled for the year 2018 in Kilometeres. We can see that I traveled significantly more during the month of Decemeber 2018. The spike is because of my trip to India in December.

Next I wanted to see if there is a corelation between my spending with and travel.

### Q3. What is my monthly expenditure for 2018?
```{r}
library(ggplot2)
ggplot(YearEndSummary, aes(x = Month, y=Amount)) + 
  geom_bar(stat = "identity", fill = "Blue")+
  scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) +
  labs(title = "Spending per Month in 2018", x = "Month", y = "Amount($)") +
  theme_minimal()
```

***

To see my spending history, I used my credit card statement for the year 2018. 

From the chart we can see that I spent a significantly higher amount in the month of Novemeber 2018. I booked my tickets to India in the month of Novemeber, thus, the spike in that month. Therefore, even though obvious, from the data I can conclude that there is a corelation between my spending and traveling history.

### Q4. Where did I spend most of my money?

```{r}
#money spent per category
sumCategory <- aggregate(YearEndSummary$Amount, by=list(Category=YearEndSummary$Category), FUN=sum)

ggplot(sumCategory, aes(x="", y=x, fill=Category)) + geom_bar(stat="identity", width=1) + coord_polar("y", start=0)+ labs(x = NULL, y = NULL, fill = NULL, title = "Spending by Category")
```

***

The pie chart on the left shows differnt categories in which I spent moeny and the amount of money spent in each category for the year 2018.\n

Thus, we can see that I spent the most of Travel/Entertainment.

### Q5. How much did I spend on Gas each month?
```{r}
#money spent per category
Gas <- YearEndSummary
Gas <- Gas %>% filter(Category == "Gasoline")
GasMonth <- aggregate(Gas$Amount, by=list(Category=Gas$Month), FUN=sum)
ggplot(data=GasMonth, aes(x=Category, y=x, group=1)) +
  geom_line()+
  geom_point() +  scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) + labs(title = "Money spent on gas per month", x = "Month", y = "Amount($)")
```

***

Through the first four questions, I analyzed my travel history and big travel expenses. For the last question, I wanted to check my gasoline expenses. The line graph on the left shows the monthly gasoline expenditure. Thus, we can see that the month that I vacation, I spend less money on gas (as I am not driving).