Overview of the Quantified Self movement


A project for ANLY512: Data Visualization AT Hitch

The Quantified Self (QS) is a movement motivated to leverage the synergy of wearables, analytics, and “Big Data”. Designing this project around the QS movement makes perfect sense because it offers the opportunity to be both the data and question provider, the data analyst, the vis designer, and the end user.In this individual project, I uses a collection of two years’ data on spending and payments through discover credit card.

The goal of the project is to collect, analyze and visualize the data using the tools and methods covered in this class. Additionally, I will create a summary which answers the following questions based on the data collected: 1) What’s the total spending per month in 2017? 2)In April, August and December, how’s the spending in each category? 3) What’s the subtoal spending per transaction type in 2017? 4) For Merchandise, how’s the spending vary from month to month? 5) Compared to 2018, what’s the difference?

Data preparation

Sample raw data in 2017
Trans..Date Post.Date Month Year Description Amount Category
1/3/2017 1/3/2017 1 2017 AMAZON.COM AMZN.COM/BILLWANWO0HI9DROP 5.82 Merchandise
1/5/2017 1/5/2017 1 2017 PAYMENT - THANK YOU -1458.00 Payments and Credits
1/7/2017 1/7/2017 1 2017 TARGET.COM * 800-591-3869 MN 2.96 Merchandise
1/7/2017 1/7/2017 1 2017 TARGET.COM * 800-591-3869 MN 49.72 Merchandise
1/9/2017 1/9/2017 1 2017 LOWE’S OF MILFORD, MA MILFORD MA 16.20 Home Improvement
1/9/2017 1/9/2017 1 2017 STOP & SHOP 0040 MILFORD MA00903R 4.38 Supermarkets
1/9/2017 1/9/2017 1 2017 TARGET MILFORD MA 14.10 Supermarkets
1/9/2017 1/9/2017 1 2017 TJMAXX #0619 MILFORD MA 8.50 Merchandise
1/10/2017 1/10/2017 1 2017 TARGET.COM * 800-591-3869 MN 19.48 Merchandise
1/11/2017 1/11/2017 1 2017 SHELL 57544922503 WESTBOROUGH MA 65.03 Gasoline
1/11/2017 1/11/2017 1 2017 WAL-MART SC - #2158 NORTHBOROUGH MA 71.32 Merchandise
1/13/2017 1/13/2017 1 2017 DUNKIN #339844 Q35 MARLBOROUGH MA 1.06 Restaurants
1/13/2017 1/15/2017 1 2017 TARGET MARLBOROUGH MA -11.99 Payments and Credits
1/15/2017 1/15/2017 1 2017 H MART BURLINGTON BURLINGTON MA 123.81 Supermarkets
1/16/2017 1/16/2017 1 2017 TARGET WESTBOROUGH MA 21.24 Supermarkets
Sample raw data of 2017 and 2018
Category Amount Year
Automotive 0.00 2017
Department Stores 265.67 2017
Education 0.00 2017
Gasoline 287.63 2017
Government Services 0.00 2017
Home Improvement 101.04 2017
Medical Services 28.88 2017
Merchandise 7427.75 2017
Other/Miscellaneous 0.00 2017
Restaurants 725.83 2017
Services 77.63 2017
Supermarkets 1207.49 2017
Travel/Entertainment 412.51 2017
Wholesale Clubs 0.00 2017
Automotive 0.00 2018

Q1: What’s the total spending per month in 2017?


Since Month is categorical and Amount is continous, and I was aimed to compare the total amount in each month, bar chart was used to summarize the data. I summarized the total amount spent in each month and usd bar chart display the numbers. From the plot we could find that, average spending over month was around $800. Total spending in April, August and December were the most.

Q2: In April, August and December, wha’s the spending in each category?


From the plot, we can see that top 3 caterogies I spent frequently in were Merchandise, Supermarkets and Restaurants. I checked the detailed transaction history.I spent almost fifty percent in April for the purchase in IKEA. And most merchandises were online shopping.

Q3: What’s the subtoal spending per transaction type in 2017?


The name of each category is long and can not be displayed at the bottom, so a rotated dot plot was created. Here We see the category I spent most was Merchandise, around $7500, which was much higher than the other categories. The 2nd and 3rd top categories were Supermarkets and restaurants, around $1200 and $730 respectively.

Q4: For Merchandise, how’s the spending vary from month to month?


We see that for half of the year, the spending on Merchandise was less than $500. However, in spring and winter, it’s almost doubled than the other months.

Q5: Compared to 2018, what’s the difference?

```


Since the purpose was to compare the spending between 2017 and 2018, a grouped bar chart was used. From the plot we can see that, overall YTD spending in 2018 is much less than that in 2017, especially Merchandise. The main reason is that I’ve opened another two credit cards this year to take advantage of different rebate policies, but only 1 credit card was used in 2017.

Conclusion

Based on the visual analytics, following conclusions can be drawn

  1. I spent most money on merchandise, supermarkets and restautants.I didn’t travel a lot in 2017.

  2. Monthly cost on merchadise was $750 in average.Summer and winter season seemed the peak season for online shopping.

  3. There was not big change of my spending habit between 2017 and 2018.

  4. Though I used Discover credit card mostely, I’ve also had another Citibank card and Chase card applied in the mid/fall of 2018. The comparison may not be that accurate since the data for this year is not complete.

---
title: "ANLY 512 Final Project"
author: "Fuyan Li"
date: "Dec. 2018"
output: 
  flexdashboard::flex_dashboard:
    storyboard: true
    social: menu
    source: embed
    orientation: columns
    vertical_layout: fill
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(flexdashboard)
library(knitr)
library(ggplot2)
library(tidyverse)
library(readxl)
library(dplyr)
library(xts)
library(zoo)
library(lubridate) 
setwd("C:/Users/fli/Desktop/MyWork/Harrisburg/ANLY512/Final Project/")
```

###Overview of the Quantified Self movement

```{r}
knitr::include_graphics("/Users/fli/Desktop/MyWork/Harrisburg/ANLY512/Final Project/spending.jpg")
```



***
A project for ANLY512: Data Visualization
AT Hitch

The Quantified Self (QS) is a movement motivated to leverage the synergy of wearables, analytics, and "Big Data". Designing this project around the QS movement makes perfect sense because it offers the opportunity to be both the data and question provider, the data analyst, the vis designer, and the end user.In this individual project, I uses a collection of two years' data on spending and payments through discover credit card.

The goal of the project is to collect, analyze and visualize the data using the tools and methods covered in this class. Additionally, I will create a summary which answers the following questions based on the data collected: 
1) What's the total spending per month in 2017?
2)In April, August and December, how's the spending in each category?
3) What's the subtoal spending per transaction type in 2017?
4) For Merchandise, how's the spending vary from month to month?
5) Compared to 2018, what's the difference?

### Data preparation

```{r}
data <- read.csv("C:/Users/fli/Desktop/MyWork/Harrisburg/ANLY512/Final Project/Discover-2017-YearEndSummary.csv")
kable(data[1:15,], caption="Sample raw data in 2017")
comp <- read.csv("C:/Users/fli/Desktop/MyWork/Harrisburg/ANLY512/Final Project/Discover-2018-YearToDateSummary.csv")
kable(comp[1:15,], caption="Sample raw data of 2017 and 2018")
```

***

- There are 7 variables listed:

- Tran. Date: Transaction Date
- Post Date
- Month
- Description: Detailed Description of purchases
- Amount: The total amount spent	
- Category
- Year: 2017 or 2018


### Q1: What's the total spending per month in 2017?

```{r}
# plot
library(ggplot2)
fill <- "gold1"
line <- "goldenrod2"
p<- ggplot(data, aes(x = Month, y=Amount)) + 
  geom_bar(stat = "identity", fill = "LightBlue")+
  scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) +
  labs(title = "Monthly Spending 2017", x = "Month", y = "Amount $") +
  theme_bw()
p
```

***
Since Month is categorical and Amount is continous, and I was aimed to compare the total amount in each month, bar chart was used to summarize the data.
I summarized the total amount spent in each month and usd bar chart display the numbers.
From the plot we could find that, average spending over month was around $800. Total spending in April, August and December were the most.

### Q2: In April, August and December, wha's the spending in each category?

```{r}
library(ggplot2)
data2 <- subset(data, Month=='4')
 p1 <- ggplot(data2, aes(Amount,Category ))
p2 <-p1 + geom_point(aes(colour = factor(Category)))+
  labs(title = "Transactions in Apr.2017", x = "Amount $", y = "Category") +
  theme_bw()
p2
data3 <- subset(data, Month=='8')
p3 <- ggplot(data3, aes(Amount,Category ))
p4 <-p3 + geom_point(aes(colour = factor(Category)))+
  labs(title = "Transactions in Aug.2017", x = "Amount $", y = "Category") +
  theme_bw()
p4
data4 <- subset(data, Month=='12')
p5 <- ggplot(data3, aes(Amount,Category ))
p6 <-p5 + geom_point(aes(colour = factor(Category)))+
  labs(title = "Transactions in Dec.2017", x = "Amount $", y = "Category") +
  theme_bw()
p6


```

***
From the plot, we can see that top 3 caterogies I spent frequently in were Merchandise, Supermarkets and Restaurants. 
I checked the detailed transaction history.I spent almost fifty percent in April for the purchase in IKEA. And most merchandises were online shopping.


### Q3: What's the subtoal spending per transaction type in 2017? 

```{r}
agg2 <- aggregate(data$Amount, by=list(Category=data$Category), FUN=sum)
order_agg2  <- agg2[order(agg2$x),]
dotchart(order_agg2$x, labels=order_agg2$Category, cex=0.7, color="Purple", main="Subtotal $ per category in 2017")

```

***
The name of each category is long and can not be displayed at the bottom, so a rotated dot plot was created.
Here We see the category I spent most was Merchandise, around $7500, which was much higher than the other categories. The 2nd and 3rd top categories were Supermarkets and restaurants, around $1200 and $730 respectively.

### Q4: For Merchandise, how's the spending vary from month to month?

```{r}
data1<- subset(data, Category=='Merchandise')
p<- ggplot(data1, aes(x = Month, y=Amount)) + 
  geom_bar(stat = "identity", fill = "Orange")+
  scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) +
  labs(title = "Merchandise spending per month", x = "Month", y = "Amount $") +
  theme_minimal()
p



```

***
We see that for half of the year, the spending on Merchandise was less than $500. However, in spring and winter, it's almost doubled than the other months.



### Q5: Compared to 2018, what's the difference?

```{r echo=FALSE, warning=FALSE}

comp<-data.frame(comp)
comp$Year <- as.factor(comp$Year)
ggplot(data=comp, aes(x=Category, y=Amount)) +   
  geom_bar(aes(fill = Year), position = "dodge", stat="identity") +
    labs(title = "Spending 2017 vs 2018", x = "Category", y = "Amount $") +
  theme(legend.position = "Right") +
  scale_fill_discrete(name="Year") +
  theme(legend.position = "bottom") +
  coord_flip()
```
```


***
Since the purpose was to compare the spending between 2017 and 2018, a grouped bar chart was used.
From the plot we can see that, overall YTD spending in 2018 is much less than that in 2017, especially Merchandise. 
The main reason is that I've opened another two credit cards this year to take advantage of different rebate policies, but only 1 credit card was used in 2017.


###Conclusion
Based on the visual analytics, following conclusions can be drawn

1. I spent most money on merchandise, supermarkets and restautants.I didn't travel a lot in 2017.

2. Monthly cost on merchadise was $750 in average.Summer and winter season seemed the peak season for online shopping.

3. There was not big change of my spending habit between 2017 and 2018.

4. Though I used Discover credit card mostely, I've also had another Citibank card and Chase card applied in the mid/fall of 2018. The comparison may not be that accurate since the data for this year is not complete.