Overview of the Quantified Self movement

A project for ANLY512: Data Visualization

The data I have collected is for a year long, but it is from 2017 April to 2018 March. In the last step, I am trying to do the comparison between two years, although the time period is not the same, it is a clear way to see what I have spent from the Discover account was openned.

The goal of the project is to collect, analyze and visualize the data using the tools and methods covered in class. Additionally, using the data-driven approach, I will create a summary which answers the following questions based on the data collected

  1. What is the total spending by month?
  2. Which category costed the most?
  3. What is the spending by month?
  4. In July 2017, December 2017 and March 2018, what is the spending in each category?
  5. Comparison, what is the difference?

Data preparation

Sample raw data
Trans..Date Post.Date Month Description Amount Category
4/11/17 4/11/17 4 DUANE READE #14306 JERSEY CITY NJ 10.33 Merchandise
4/11/17 4/11/17 4 JIAN BING MAN. NEW YORK NY 10.88 Restaurants
4/12/17 4/12/17 4 KOBEYAKI 2 NEW YORK NY 10.85 Restaurants
4/13/17 4/13/17 4 DUANE READE #14306 JERSEY CITY NJ 15.28 Merchandise
4/13/17 4/13/17 4 KOBEQUE NEW YORK NY 16.32 Restaurants
4/14/17 4/14/17 4 AMC ONLINE #9640 888-440-4262 KS 50.58 Travel/ Entertainment
4/14/17 4/14/17 4 FRESCA DELI JERSEY CITY NJ 19.76 Supermarkets
4/14/17 4/14/17 4 MUJI FIFTH AVE NEW YORK NY 31.31 Merchandise
4/14/17 4/14/17 4 SQ *ARANJIRA NEWYORK NY0001152921507569922470 40.28 Merchandise
4/14/17 4/14/17 4 SUNRISE MART - 41ST ST NEW YORK NY 11.99 Supermarkets
4/15/17 4/15/17 4 AMC EMPIRE 25 #0552 NEW YORK NY 5.09 Travel/ Entertainment
4/15/17 4/15/17 4 SHANGHAI BEST JERSEY CITY NJ 47.72 Restaurants
4/16/17 4/16/17 4 DUANE READE #14123 NEW YORK NY 9.34 Merchandise
4/16/17 4/16/17 4 H MART NEW YORK NY01604R 19.96 Supermarkets
4/16/17 4/16/17 4 HER NAME IS HAN NEW YORK NY 36.57 Restaurants
Sample raw data by category
Category Amount Year
Awards and Rebate Credits 3 2017
Awards and Rebate Credits 1 2018
Department Stores 1 2017
Department Stores 1 2018
Education 3 2017
Education 0 2018
Home Improvement 1 2017
Home Improvement 0 2018
Merchandise 64 2017
Merchandise 18 2018
Payments and Credits 16 2017
Payments and Credits 4 2018
Restaurants 123 2017
Restaurants 8 2018

Q1: What is the total spending by month?


Comparing the total mounth in each month, bar chart was used to summarize the data. The total amount spent in each month and usd bar chart display the numbers. From the plot we could find that, average spending over month is around 500 dollars. Total spending in July 2017, December 2017, and March 2018 are the most.

Q2: Which category costed the TOP 3?


Due to the same reason, bar chart was used here. The name of each category is long and can not be displayed at the bottom, so a rotated bar chart was created. Here we see the category I spend the most money is Merchandise, around $5000, which is much higher than the other categories. The second category is Restaurant, around $1300. The third category is Supermarkets, around $600.

Q3: For each category, what is the spending by month?


We see that for most months, the spending was normal. However, in July 2017, December 2017 and March 2018, it is higher than the other months.

Q4: In July 2017, December 2017 and March 2018, what is the spending in each category?


From the plot we can see that, still, the most spending is Merchandise. While in July 2017, the spending of Restaurants is the most. By checking detailed information. I have ordered too much salad from Pret a Manager.

Q5: Comparison, what is the difference


Even the time period is not the same, but I would like to see the comparison between each category. Since the purpose was to compare the spending between 2017 and 2018, a grouped bar chart was used. The data for 2018 only covers 3 months, while for 2017, it covers 9 months. From the plot we can see that, anyway, spending in 2018 is much less than that in 2017, especially Restaurant. The main reason is that during 2017, the card was just opened. And I am trying to use it more. what’s more, I don’t have too much money to eat outside, shop at department stores, travel, etc. Start from 2018, I have to pay the tuition by myself, so there is no more extra money to spend and pay the balance:(

---
title: "ANLY 512 Final Project"
author: "Linfang Li"
output: 
  flexdashboard::flex_dashboard:
    storyboard: true
    social: menu
    source: embed
    orientation: columns
    vertical_layout: fill
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(flexdashboard)
library(knitr)
library(ggplot2)
library(tidyverse)
library(readxl)
library(dplyr)
library(xts)
library(zoo)
library(lubridate) 
```

###Overview of the Quantified Self movement

A project for ANLY512: Data Visualization

**The data I have collected is for a year long, but it is from 2017 April to 2018 March. In the last step, I am trying to do the comparison between two years, although the time period is not the same, it is a clear way to see what I have spent from the Discover account was openned.**

The goal of the project is to collect, analyze and visualize the data using the tools and methods covered in class. Additionally, using the data-driven approach, I will create a summary which answers the following questions based on the data collected 

1) What is the total spending by month?
2) Which category costed the most?
3) What is the spending by month?
4) In July 2017, December 2017 and March 2018, what is the spending in each category?
5) Comparison, what is the difference?

###Data preparation

```{r}
data <- read.csv("/Users/Wen/Downloads/Discover-AllAvailable-20180814.CSV")
kable(data[1:15,], caption="Sample raw data")
comp <- read.csv("/Users/Wen/Documents/Discover-AllAvailable-Category.CSV")
kable(comp[1:14,], caption="Sample raw data by category")
```

***

- There are 7 variables we are interested in:

- Tran. Date: Transaction Date
- Post Date
- **Month (Categorical)**
- Description: Detailed Description of purchases
- **Amount: The total amount spent (continous)**	
- Category
- Year: 2017 April and 2018 March


  
###Q1: What is the total spending by month?

```{r}
# plot
library(ggplot2)
fill <- "gold1"
line <- "goldenrod2"
p<- ggplot(data, aes(x = Month, y=Amount)) + 
  geom_bar(stat = "identity", fill = "LightGreen")+
  scale_x_discrete(limits=c("Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar")) +
  labs(title = "Spending per Month from 2017 April to 2018 March", x = "Month", y = "Amount") +
  theme_minimal()
p
```

***
Comparing the total mounth in each month, bar chart was used to summarize the data.
The total amount spent in each month and usd bar chart display the numbers.
From the plot we could find that, average spending over month is around 500 dollars. Total spending in July 2017, December 2017, and March 2018 are the most.

###Q2: Which category costed the TOP 3? 

```{r}
library(ggplot2)
p<- ggplot(data, aes(x = Category, y=Amount, fill=Category)) + 
  geom_bar(stat = "identity", fill = "LightGreen")+
  labs(title = "Spending per Category", x = "Category", y = "Amount") +
  coord_flip()
p
```

***
Due to the same reason, bar chart was used here. The name of each category is long and can not be displayed at the bottom, so a rotated bar chart was created.
Here we see the category I spend the most money is Merchandise, around $5000, which is much higher than the other categories. The second category is Restaurant, around $1300. The third category is Supermarkets, around $600.

###Q3: For each category, what is the spending by month?

```{r}
data1<- subset(data, Category=='Merchandise')
p<- ggplot(data1, aes(x = Month, y=Amount)) + 
  geom_bar(stat = "identity", fill = "LightGreen")+
  scale_x_discrete(limits=c("Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar")) +
  labs(title = "Merchandise", x = "Month", y = "Amount") +
  theme_minimal()
p

data5<- subset(data, Category=='Restaurants')
p4<- ggplot(data5, aes(x = Month, y=Amount)) + 
  geom_bar(stat = "identity", fill = "LightGreen")+
  scale_x_discrete(limits=c("Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar")) +
  labs(title = "Spending per Month from 2017 April to 2018 March", x = "Month", y = "Amount") +
  theme_minimal()
p4

data6<- subset(data, Category=='Awards and Rebate Credits')
p5<- ggplot(data6, aes(x = Month, y=Amount)) + 
  geom_bar(stat = "identity", fill = "LightGreen")+
  scale_x_discrete(limits=c("Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar")) +
  labs(title = "Restaurants", x = "Month", y = "Amount") +
  theme_minimal()
p5

data7<- subset(data, Category=='Department Stores')
p6<- ggplot(data7, aes(x = Month, y=Amount)) + 
  geom_bar(stat = "identity", fill = "LightGreen")+
  scale_x_discrete(limits=c("Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar")) +
  labs(title = "Department Stores", x = "Month", y = "Amount") +
  theme_minimal()
p6

data8<- subset(data, Category=='Education')
p7<- ggplot(data8, aes(x = Month, y=Amount)) + 
  geom_bar(stat = "identity", fill = "LightGreen")+
  scale_x_discrete(limits=c("Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar")) +
  labs(title = "Education", x = "Month", y = "Amount") +
  theme_minimal()
p7

data9<- subset(data, Category=='Home Improvement')
p8<- ggplot(data9, aes(x = Month, y=Amount)) + 
  geom_bar(stat = "identity", fill = "LightGreen")+
  scale_x_discrete(limits=c("Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar")) +
  labs(title = "Spending per Month from 2017 April to 2018 March", x = "Month", y = "Amount") +
  theme_minimal()
p8

data10<- subset(data, Category=='Services')
p9<- ggplot(data10, aes(x = Month, y=Amount)) + 
  geom_bar(stat = "identity", fill = "LightGreen")+
  scale_x_discrete(limits=c("Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar")) +
  labs(title = "Services", x = "Month", y = "Amount") +
  theme_minimal()
p9

data11<- subset(data, Category=='Supermarkets')
p10<- ggplot(data11, aes(x = Month, y=Amount)) + 
  geom_bar(stat = "identity", fill = "LightGreen")+
  scale_x_discrete(limits=c("Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar")) +
  labs(title = "Supermarkets", x = "Month", y = "Amount") +
  theme_minimal()
p10

data12<- subset(data, Category=='Travel/ Entertainment')
p11<- ggplot(data12, aes(x = Month, y=Amount)) + 
  geom_bar(stat = "identity", fill = "LightGreen")+
  scale_x_discrete(limits=c("Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar")) +
  labs(title = "Travel/ Entertainment", x = "Month", y = "Amount") +
  theme_minimal()
p11

data13<- subset(data, Category=='Payments and Credits')
p12<- ggplot(data13, aes(x = Month, y=Amount)) + 
  geom_bar(stat = "identity", fill = "LightGreen")+
  scale_x_discrete(limits=c("Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar")) +
  labs(title = "Payments and Credits", x = "Month", y = "Amount") +
  theme_minimal()
p12
```

***
We see that for most months, the spending was normal. However, in July 2017, December 2017 and March 2018, it is higher than the other months.

###Q4: In July 2017, December 2017 and March 2018, what is the spending in each category?

```{r}
data2<- subset(data, Month=='7')
p1<- ggplot(data2, aes(x = Category, y=Amount, fill=Category)) + 
  geom_bar(stat = "identity", fill = "LightGreen")+
  labs(title = "Spending of July 2017 per Category", x = "Category", y = "Amount") +
  coord_flip()
p1

data3<- subset(data, Month=='12' )
p2<- ggplot(data3, aes(x = Category, y=Amount, fill=Category)) + 
  geom_bar(stat = "identity", fill = "LightGreen")+
  labs(title = "Spending of December 2017 per Category", x = "Category", y = "Amount") +
  coord_flip()
p2

data4<- subset(data, Month=='3' )
p3<- ggplot(data4, aes(x = Category, y=Amount, fill=Category)) + 
  geom_bar(stat = "identity", fill = "LightGreen")+
  labs(title = "Spending of March 2018 per Category", x = "Category", y = "Amount") +
  coord_flip()
p3
```

***
From the plot we can see that, still, the most spending is Merchandise. While in July 2017, the spending of Restaurants is the most.
By checking detailed information. I have ordered too much salad from Pret a Manager.

###Q5: Comparison, what is the difference

```{r}
comp<-data.frame(comp)
comp$Year <- as.factor(comp$Year)
ggplot(data=comp, aes(x=Category, y=Amount)) +   
  geom_bar(aes(fill = Year), position = "dodge", stat="identity") +
    labs(title = "Spending per Category", x = "Category", y = "Amount") +
  theme(legend.position = "Right") +
  scale_fill_discrete(name="Year") +
  theme(legend.position = "bottom") +
  coord_flip()
```


***
Even the time period is not the same, but I would like to see the comparison between each category.
Since the purpose was to compare the spending between 2017 and 2018, a grouped bar chart was used.
The data for 2018 only covers 3 months, while for 2017, it covers 9 months.
From the plot we can see that, anyway, spending in 2018 is much less than that in 2017, especially Restaurant. 
The main reason is that during 2017, the card was just opened. And I am trying to use it more. what's more, I don't have too much money to eat outside, shop at department stores, travel, etc. Start from 2018, I have to pay the tuition by myself, so there is no more extra money to spend and pay the balance:(