The Quantified Self (QS) is a movement motivated to leverage the synergy of wearables, analytics, and “Big Data”. This movement exploits the ease and convenience of data acquisition through the internet of things (IoT) to feed the growing obsession of personal informatics and quotidian data.
The goal of this project is to answer the following five questions based on my location data histroy and financial data history.
| Trans..Date | Month | Post.Date | Description | Amount | Category |
|---|---|---|---|---|---|
| 6/20/2018 | 6 | 6/20/2018 | CARROLL AUTOMOTIVE PITTSBURGH PA | 83.46 | Automotive |
| 12/14/2018 | 12 | 12/14/2018 | CARROLL AUTOMOTIVE PITTSBURGH PA | 33.17 | Automotive |
| 1/10/2018 | 1 | 1/10/2018 | SPEEDWAY 02424 CAMBRIDGE MA | 30.78 | Gasoline |
| 1/26/2018 | 1 | 1/26/2018 | SUNOCO 0016867401 BRIGHTON MA | 10.21 | Gasoline |
| 1/31/2018 | 1 | 1/31/2018 | SUNOCO 0016867401 BRIGHTON MA | 4.13 | Gasoline |
| 2/2/2018 | 2 | 2/2/2018 | SUNOCO 0016867401 BRIGHTON MA | 5.14 | Gasoline |
| 2/2/2018 | 2 | 2/2/2018 | SPEEDWAY 02493 ALLSTON MA | 5.61 | Gasoline |
| 2/4/2018 | 2 | 2/4/2018 | SHELL 57544187107 CAMBRIDGE MA | 5.19 | Gasoline |
| 2/17/2018 | 2 | 2/17/2018 | SUNOCO 0016867401 BRIGHTON MA | 35.53 | Gasoline |
| 2/18/2018 | 2 | 2/18/2018 | SUNOCO 0491247300 ALLENTOWN PA | 32.66 | Gasoline |
| 2/25/2018 | 2 | 2/25/2018 | SPEEDWAY 02901 MO MONROEVILLE PA | 33.97 | Gasoline |
| 3/15/2018 | 3 | 3/15/2018 | SUNOCO 0513148700 PITTSBURGH PA | 34.77 | Gasoline |
| 3/30/2018 | 3 | 3/30/2018 | SUNOCO 0363227005 PITTSBURGH PA | 10.07 | Gasoline |
| 4/4/2018 | 4 | 4/4/2018 | GET GO #3130 MOUNT LEBANONPA | 34.85 | Gasoline |
| 4/7/2018 | 4 | 4/7/2018 | SHEETZ 022500002253498 BREEZEWOOD PA | 3.17 | Gasoline |
| 4/19/2018 | 4 | 4/19/2018 | GET GO #3130 MOUNT LEBANONPA | 37.85 | Gasoline |
| 5/10/2018 | 5 | 5/10/2018 | BP#85547191256 BKEYE LK BUCKEYE LAKE OH | 32.94 | Gasoline |
| 5/12/2018 | 5 | 5/14/2018 | SHELL 51311550068 CAMBRIDGE CITIN | 30.61 | Gasoline |
| 5/14/2018 | 5 | 5/14/2018 | SUNOCO 0363227005 PITTSBURGH PA | 36.54 | Gasoline |
| 5/26/2018 | 5 | 5/26/2018 | SUNOCO 0784322002 PITTSBURGH PA | 33.89 | Gasoline |
I used two data for this dataset for this project:
My Google Location History
My financial Data for 2018 (credit card spending data for the year 2018)
In the left you can see a sample of financial data.
For Q4 and Q5, I had to do some data manipulation (the codes of which are included within their respective section).
The image on the left shows all the places I have visited in India over the last four years. The point in the center is my hometown and the point up north is the region called Kashmir. The other two points are Mumbai (west) and New Delhi (Center - North). I went to these places to take connecting flights.
Next I wanted to analyze more recent location data. The graph on the left shows the total distance traveled for the year 2018 in Kilometeres. We can see that I traveled significantly more during the month of Decemeber 2018. The spike is because of my trip to India in December.
Next I wanted to see if there is a corelation between my spending with and travel.
To see my spending history, I used my credit card statement for the year 2018.
From the chart we can see that I spent a significantly higher amount in the month of Novemeber 2018. I booked my tickets to India in the month of Novemeber, thus, the spike in that month. Therefore, even though obvious, from the data I can conclude that there is a corelation between my spending and traveling history.
The pie chart on the left shows differnt categories in which I spent moeny and the amount of money spent in each category for the year 2018.
Thus, we can see that I spent the most of Travel/Entertainment.
Through the first four questions, I analyzed my travel history and big travel expenses. For the last question, I wanted to check my gasoline expenses. The line graph on the left shows the monthly gasoline expenditure. Thus, we can see that the month that I vacation, I spend less money on gas (as I am not driving).
---
title: "ANLY 512 - Final Project"
author: "Sumeet Sharma"
date: "April 15 2019"
output:
flexdashboard::flex_dashboard:
storyboard: true
social: menu
source: embed
orientation: columns
vertical_layout: fill
---
```{r setup, include=FALSE}
library(flexdashboard)
library(knitr)
library(ggplot2)
library(tidyverse)
library(readxl)
library(dplyr)
library(xts)
library(zoo)
library(lubridate)
library("jsonlite")
library(ggmap)
library(raster)
library(readr)
```
### Introduction
```{r}
knitr::include_graphics("C:/Users/Sumeet/OneDrive/HU/ANLY 512-91-O 2019- Spring Data Viz/Final Project/G.png")
```
***
The Quantified Self (QS) is a movement motivated to leverage the synergy of wearables, analytics, and "Big Data". This movement exploits the ease and convenience of data acquisition through the internet of things (IoT) to feed the growing obsession of personal informatics and quotidian data.
The goal of this project is to answer the following five questions based on my location data histroy and financial data history.\n
1. What are the different places I have visited in India?\n
2. How much did I travel each month last year?\n
3. What is my monthly expenditure for 2018?\n
4. Where did I spend the most money last year?\n
5. How much did I spend on Gas each month?
### Data Prep
```{r}
YearEndSummary <- read.csv("C:/Users/Sumeet/OneDrive/HU/ANLY 512-91-O 2019- Spring Data Viz/Final Project/YearEndSummary.csv")
kable(YearEndSummary[1:20,], caption="Sample fiance dataset")
```
***
I used two data for this dataset for this project:\n
1. My Google Location History \n
2. My financial Data for 2018 (credit card spending data for the year 2018)\n
In the left you can see a sample of financial data.\n
For Q4 and Q5, I had to do some data manipulation (the codes of which are included within their respective section).
### Q1. What are the different cities I have visited in India?
```{r}
ds <- fromJSON('C:/Users/Sumeet/OneDrive/HU/ANLY 512-91-O 2019- Spring Data Viz/Final Project/LocationHistory.json')
loc <- ds$locations
loc$time = as.POSIXct(as.numeric(ds$locations$timestampMs)/1000, origin = "1970-01-01")
loc$lat = loc$latitudeE7 / 1e7
loc$lon = loc$longitudeE7 / 1e7
loc$date <- as.Date(loc$time, '%Y/%m/%d')
loc$year <- year(loc$date)
loc$month_year <- as.yearmon(loc$date)
#remvoing key after kniting for privacy purpose
india <- get_map(location = 'India', zoom = 4)
ggmap(india) + geom_point(data = loc, aes(x = lon, y = lat), alpha = 0.5, color = "red") +
theme(legend.position = "right") +
labs(
x = "Longitude",
y = "Latitude",
title = "Location history data points in India",
caption = "\nA simple point plot shows recorded positions.")
```
***
The image on the left shows all the places I have visited in India over the last four years. The point in the center is my hometown and the point up north is the region called Kashmir. The other two points are Mumbai (west) and New Delhi (Center - North). I went to these places to take connecting flights.
### Q2. How much did I travel each month last year?
```{r}
loc3 <- with(loc, subset(loc, loc$time > as.POSIXct('2018-01-01 0:00:01')))
loc3 <- with(loc, subset(loc3, loc$time < as.POSIXct('2018-12-22 23:59:59')))
# Shifting vectors for latitude and longitude to include end position
shift.vec <- function(vec, shift){
if (length(vec) <= abs(shift)){
rep(NA ,length(vec))
} else {
if (shift >= 0) {
c(rep(NA, shift), vec[1:(length(vec) - shift)]) }
else {
c(vec[(abs(shift) + 1):length(vec)], rep(NA, abs(shift)))
}
}
}
loc3$lat.p1 <- shift.vec(loc3$lat, -1)
loc3$lon.p1 <- shift.vec(loc3$lon, -1)
# Calculating distances between points (in metres) with the function pointDistance from the 'raster' package.
loc3$dist.to.prev <- apply(loc3, 1, FUN = function(row) {
pointDistance(c(as.numeric(as.character(row["lat.p1"])),
as.numeric(as.character(row["lon.p1"]))),
c(as.numeric(as.character(row["lat"])), as.numeric(as.character(row["lon"]))),
lonlat = T) # Parameter 'lonlat' has to be TRUE!
})
# distance in km
distance_p_month <- aggregate(loc3$dist.to.prev, by = list(month_year = as.factor(loc3$month_year)), FUN = sum)
distance_p_month$x <- distance_p_month$x*0.001
#selecting only 2018 months
distance_p_month <- distance_p_month[1:12,]
#extracting only month
distance_p_month$month <- substr(distance_p_month$month_year, start = 1, stop = 3)
#changing order for the graph
distance_p_month$month <- factor(distance_p_month$month, levels = distance_p_month$month)
ggplot(distance_p_month, aes(x=month, y=x)) + geom_point() +
geom_segment( aes(x=month, xend=month, y=0, yend=x)) +
labs ( x="", y = "Distance in KM", title = "Distance traveled per month in 2018")
```
***
Next I wanted to analyze more recent location data. The graph on the left shows the total distance traveled for the year 2018 in Kilometeres. We can see that I traveled significantly more during the month of Decemeber 2018. The spike is because of my trip to India in December.
Next I wanted to see if there is a corelation between my spending with and travel.
### Q3. What is my monthly expenditure for 2018?
```{r}
library(ggplot2)
ggplot(YearEndSummary, aes(x = Month, y=Amount)) +
geom_bar(stat = "identity", fill = "Blue")+
scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) +
labs(title = "Spending per Month in 2018", x = "Month", y = "Amount($)") +
theme_minimal()
```
***
To see my spending history, I used my credit card statement for the year 2018.
From the chart we can see that I spent a significantly higher amount in the month of Novemeber 2018. I booked my tickets to India in the month of Novemeber, thus, the spike in that month. Therefore, even though obvious, from the data I can conclude that there is a corelation between my spending and traveling history.
### Q4. Where did I spend most of my money?
```{r}
#money spent per category
sumCategory <- aggregate(YearEndSummary$Amount, by=list(Category=YearEndSummary$Category), FUN=sum)
ggplot(sumCategory, aes(x="", y=x, fill=Category)) + geom_bar(stat="identity", width=1) + coord_polar("y", start=0)+ labs(x = NULL, y = NULL, fill = NULL, title = "Spending by Category")
```
***
The pie chart on the left shows differnt categories in which I spent moeny and the amount of money spent in each category for the year 2018.\n
Thus, we can see that I spent the most of Travel/Entertainment.
### Q5. How much did I spend on Gas each month?
```{r}
#money spent per category
Gas <- YearEndSummary
Gas <- Gas %>% filter(Category == "Gasoline")
GasMonth <- aggregate(Gas$Amount, by=list(Category=Gas$Month), FUN=sum)
ggplot(data=GasMonth, aes(x=Category, y=x, group=1)) +
geom_line()+
geom_point() + scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) + labs(title = "Money spent on gas per month", x = "Month", y = "Amount($)")
```
***
Through the first four questions, I analyzed my travel history and big travel expenses. For the last question, I wanted to check my gasoline expenses. The line graph on the left shows the monthly gasoline expenditure. Thus, we can see that the month that I vacation, I spend less money on gas (as I am not driving).