Project Description

The Quantified Self movement grew from the popularity and growth of the internet of things, the mass collection of personal information, and mobile technologies (primarily wearable computing). This final class project uses a collection of 2 years of data on spending and payments captured by discover credit card.

The scope of thie project is visualize my personal creditcard expense using the flexdashboard,ggplots and other visualization package in RStudio.

To begin with, I would like to list five questions that I will use the data visualization to depict and unfold my analysis

  1. What is the total spending by month in for the past year?
  2. What is the top 10 merchant that I vist the most for the past year?
  3. What is the top 10 merchat that I spent the most for the past year?
  4. In which dates I spend the most in past year ?
  5. What is the top 10 Merchant Location I spend the most by month for the past year?
  6. What is the total expense per week of day for the past year?

Data Preprocessing

Data Description:

Sample data
Date Reference_Number Payee Address Amount
2019-05-25 24431069145400620001877 MITSUWA-GONBEI NJ EDGEWATER NJ EDGEWATER NJ 13.22
2019-05-25 24431069145207106801195 MITSUWA-ITO-EN NJ EDGEWATER NJ EDGEWATER NJ 9.75
2019-05-27 24427339146710010793368 CHICK-FIL-A #01064 NEWARK DE NEWARK DE 21.67
2019-05-27 24391229146091003000029 MICROSOFT - 16 NEWARK NEWARK DE NEWARK DE 599.00
2019-05-27 24492159145855376953941 SQ *MR.WISH NEWARK DE NEWARK DE 5.00
2019-05-27 24445009145500571247401 TST* SUNMERRY BAKERY - FTFORT LEE NJ FORT LEE NJ 19.00
2019-05-28 24431069148091687000873 TEA HEART INC LIVERMORE CA LIVERMORE CA 10.38
2019-05-28 24431069148207515900569 PRADA LIVERMORE CA LIVERMORE CA 437.00
2019-05-28 24138299148286199901233 WETZEL’S PRETZELS-LIV2 LIVERMORE CA LIVERMORE CA 17.99
2019-05-29 24231689149091871000222 PUNTA CANA RESTAURANT FORT LEE NJ FORT LEE NJ 38.46
2019-05-29 24431069148859104547019 GODIVA 266 NEWARK DE NEWARK DE 6.95
2019-05-30 24559309149900015400271 WOK BAR FORT LEE NJ FORT LEE NJ 52.00
2019-05-30 24692169150100737105009 THE UPS STORE 3070 FORT LEE NJ FORT LEE NJ 10.16
2019-05-31 24022449150900011100593 HAYWOOD SMOKEHOUSE WAYNESVILLE NC WAYNESVILLE NC 94.00
2019-05-31 24760629150470002499563 90319 Laundry Card FORT LEE NJ FORT LEE NJ 15.00
2019-05-31 24299109150004059245919 MARATHON PETRO192591 ASHEVILLE NC ASHEVILLE NC 1.92
2019-05-31 24299109150004059245927 MARATHON PETRO192591 ASHEVILLE NC ASHEVILLE NC 6.41
2019-06-01 24445009151300442198789 CKE*SMOKY MOUNTAIN T 410 GATLINBURG TN GATLINBURG TN 116.00
2019-06-01 24493989151206499100582 SIERRA NEVADA TAPRESTNC MILLS RIVER NC MILLS RIVER NC 82.00
2019-06-03 24492159153740279809987 SQ *BEAN N BEAN FORT LEE NJ FORT LEE NJ 3.73
2019-06-03 24431069153083353331874 AMZN MKTP US*M69GN9N01 AMAMZN.COM/BILLWA AMZN.COM/BILL WA 22.37

Q1:What is the total spending by month in for the past year?


From this graph above, I find that I will pay less during the spring. I might spend a lot during the pre holiday such as November, Februry. My expense spike on August,since I just move my apartment during that month.

Q2:What is the top 10 merchant that I vist the most for the past year?


The top visit is Costco, where I shop for most of my grocery.

Q3:What is the top 10 merchat that I spent the most for the past year?


The top expense is also Costco. But, we can also see the car insurance and other one time purchase on my list.

Q4:In which dates I spend the most in past year ?


In this graph, I can zoom in to see the specific date, I make high amount expense. For example, on 2020-02-16 I purchase a new car insurance on Sunday, but the cost is posted on 2020-02-17.

Q5:What is the top 10 Merchant Location I spend the most by month for the past year?


From this graph, I can see where I mostly visit and make purchase. As a commuter between New York and New Jersey, it really make sense to me.

Q6:What is the total expense per week of day for the past year?


From this graph, I can see the Monday category spike, However, I notices that all Sunday charge is post on Mondays. However, I do notice that I didn’t spend a lot on Saturday and Friday.

---
title: "Anly512_Quantified_Self"
author: "JIANWEI LI"
date: "4/21/2020"

output:
  flexdashboard::flex_dashboard:
    storyboard: true
    social: menu
    source_code: embed
    vertical_layout: fill
---

```{r setup,include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(flexdashboard)
library(knitr)
library(readxl)
library(dplyr)
library(tidyverse)
library(ggplot2)

```


### Project Description

The Quantified Self movement grew from the popularity and growth of the internet of things, the mass collection of personal information, and mobile technologies (primarily wearable computing). This final class project uses a collection of 2 years of data on spending and payments captured by discover credit card.

The scope of thie project is visualize my personal creditcard expense using the flexdashboard,ggplots and other visualization package in RStudio. 

To begin with, I would like to list five questions that I will use the data visualization to depict and unfold my analysis 

1) What is the total spending by month in for the past year?
2) What is the top 10 merchant that I vist the most for the past year?
3) What is the top 
10 merchat that I spent the most for the past year?
4) In which dates I spend the most in past year ?
5) What is the top 10 Merchant Location I spend the most by month for the past year?
6) What is the total expense per week of day for the past year?

### Data Preprocessing

Data Description:

- Date: Transaction Posted Date
- Reference_Number: Unique Transaction reference Number
- Payee: Merchant's Name
- Address: Merchant's Location
- Amount: Payment 

```{r input,echo=FALSE}
setwd('C:/Users/JIANWEI LI/Documents/Rstudio/HU/512/FinalProject/')

data <-
    list.files(path = "./data/",
               pattern = "*.csv", 
               full.names = T) %>% 
    map_df(~read_csv(., col_types = cols(.default = "c"))) 


data$`Posted Date`<-as.Date(data$`Posted Date`,"%m/%d/%Y")
data$Amount<- as.numeric(data$Amount)
data$Amount = -data$Amount
data<-data[data$Amount>0,]

# sort by Posted Date
data<-data[order(data$`Posted Date`),]
colnames(data)[1]<-'Date'
colnames(data)[2]<-'Reference_Number'

kable(data[80:100,],caption = 'Sample data')

```


### Q1:What is the total spending by month in for the past year?
```{r, echo=FALSE}

library(lubridate)
library(zoo)
data$month<-month(as.POSIXlt(data$Date,format = '%d/%m/%Y'))
data$year<- year(as.POSIXlt(data$Date,format = '%d/%m/%Y'))



monthlyAmount<-aggregate(Amount~year+month,data,sum)

monthlyAmount<- monthlyAmount[
  with(monthlyAmount,order(year,month)),
]



monthlyAmount<-within(monthlyAmount, Date <- sprintf("%d-%02d", year, month))

monthlyAmount<- monthlyAmount[2:13,]


p<- ggplot(monthlyAmount, aes(x = month, y=Amount)) + 
  geom_bar(stat = "identity", fill = "Blue")+
  scale_x_discrete(limits=monthlyAmount$Date) +
  labs(title = "Spending per Month", x = "Month", y = "Total Amount") +
  theme_minimal()
p+theme(axis.text.x=element_text(angle=90))


```

***

From this graph above, I find that I will pay less during the spring. I might spend a lot during the pre holiday such as November, Februry. My expense spike on August,since I just move my apartment during that month.

### Q2:What is the top 10 merchant that I vist the most for the past year?
```{r, echo=FALSE}


q2<-aggregate(Amount~Payee,data,FUN = length)
q2<-q2[order(-q2$Amount),]
q2<-q2[c(1:10),]

ggplot(q2, mapping = aes(x = reorder(Payee, Amount), Amount)) + 
  geom_bar(stat = "identity")+
  labs(title = "Top 10 Visit Merchant", x = "Merchant", y = "Total Amount") +
  coord_flip()+
  theme_minimal()


```

***

The top visit is Costco, where I shop for most of my grocery.


### Q3:What is the top 10 merchat that I spent the most for the past year?
```{r, echo=FALSE}

q3<-aggregate(Amount~Payee,data,FUN = sum)
q3<-q3[order(-q3$Amount),]
q3<-q3[c(1:10),]

ggplot(q3, mapping = aes(x = reorder(Payee, Amount), Amount)) + 
  geom_bar(stat = "identity")+
  labs(title = "Top 10 Merchant by Expense", x = "Merchant", y = "Total Amount") +
  coord_flip()+
  theme_minimal()

```

***

The top expense is also Costco. But, we can also see the car insurance and other one time purchase on my list.

### Q4:In which dates I spend the most in past year ?
```{r, echo=FALSE}

q4<-aggregate(Amount~Date,data,FUN = sum)
q4<-q4[order(-q4$Amount),]
q4<-q4[c(1:20),]

ggplot(q4, mapping = aes(x = reorder(Date, Amount), Amount)) + 
  geom_bar(stat = "identity")+
  labs(title = "Top 20 Date by Expense", x = "Date", y = "Total Amount") +
  coord_flip()+
  theme_minimal()

```

***

In this graph, I can zoom in to see the specific date, I make high amount expense. For example, on 2020-02-16 I purchase a new car insurance on Sunday, but the cost is posted on 2020-02-17. 

### Q5:What is the top 10 Merchant Location I spend the most by month for the past year?
```{r, echo=FALSE}

q5<-aggregate(Amount~Address,data,FUN = length)
q5<-q5[order(-q5$Amount),]
q5<-q5[c(1:10),]

ggplot(q5, mapping = aes(x = reorder(Address, Amount), Amount)) + 
  geom_bar(stat = "identity")+
  labs(title = "Top 10 Visit Merchant", x = "Merchant", y = "Total Amount") +
  coord_flip()+
  theme_minimal()

```

***

From this graph, I can see where I mostly visit and make purchase. As a commuter between New York and New Jersey, it really make sense to me.

### Q6:What is the total expense per week of day for the past year?
```{r, echo=FALSE}
library(dbplyr)

data$weekday<- weekdays(as.Date(data$Date))

q6<-aggregate(Amount~weekday,data,FUN = sum)
q6<-q6[order(-q6$Amount),]



ggplot(q6, mapping = aes(x = reorder(weekday, Amount), Amount)) + 
  geom_bar(stat = "identity")+
  labs(title = "Top 10 Visit Merchant", x = "Merchant", y = "Total Amount") +
  coord_flip()+
  theme_minimal()

```

***

From this graph, I can see the Monday category spike, However, I notices that all Sunday charge is post on Mondays. However, I do notice that I didn't spend a lot on Saturday and Friday.