For this project, I am using a collection of 1 years of data on spending on my credit card, and my goal of this project is to collect, analyze and visualize the credit card data using the tools and methods covered in the class, and give answers to the questions below:
The scope of thie project is visualize my personal creditcard expense using the flexdashboard,ggplots and other visualization package in RStudio. To begin with, I would like to list five questions that I will use the data visualization to depict and unfold my analysis
Data Description:
| Date | Reference_Number | Payee | Address | Amount |
|---|---|---|---|---|
| 2019-05-25 | 24431069145400620001877 | MITSUWA-GONBEI NJ EDGEWATER NJ | EDGEWATER NJ | 13.22 |
| 2019-05-25 | 24431069145207106801195 | MITSUWA-ITO-EN NJ EDGEWATER NJ | EDGEWATER NJ | 9.75 |
| 2019-05-27 | 24427339146710010793368 | CHICK-FIL-A #01064 NEWARK DE | NEWARK DE | 21.67 |
| 2019-05-27 | 24391229146091003000029 | MICROSOFT - 16 NEWARK NEWARK DE | NEWARK DE | 599.00 |
| 2019-05-27 | 24492159145855376953941 | SQ *MR.WISH NEWARK DE | NEWARK DE | 5.00 |
| 2019-05-27 | 24445009145500571247401 | TST* SUNMERRY BAKERY - FTFORT LEE NJ | FORT LEE NJ | 19.00 |
| 2019-05-28 | 24431069148091687000873 | TEA HEART INC LIVERMORE CA | LIVERMORE CA | 10.38 |
| 2019-05-28 | 24431069148207515900569 | PRADA LIVERMORE CA | LIVERMORE CA | 437.00 |
| 2019-05-28 | 24138299148286199901233 | WETZEL’S PRETZELS-LIV2 LIVERMORE CA | LIVERMORE CA | 17.99 |
| 2019-05-29 | 24231689149091871000222 | PUNTA CANA RESTAURANT FORT LEE NJ | FORT LEE NJ | 38.46 |
| 2019-05-29 | 24431069148859104547019 | GODIVA 266 NEWARK DE | NEWARK DE | 6.95 |
| 2019-05-30 | 24559309149900015400271 | WOK BAR FORT LEE NJ | FORT LEE NJ | 52.00 |
| 2019-05-30 | 24692169150100737105009 | THE UPS STORE 3070 FORT LEE NJ | FORT LEE NJ | 10.16 |
| 2019-05-31 | 24022449150900011100593 | HAYWOOD SMOKEHOUSE WAYNESVILLE NC | WAYNESVILLE NC | 94.00 |
| 2019-05-31 | 24760629150470002499563 | 90319 Laundry Card FORT LEE NJ | FORT LEE NJ | 15.00 |
| 2019-05-31 | 24299109150004059245919 | MARATHON PETRO192591 ASHEVILLE NC | ASHEVILLE NC | 1.92 |
| 2019-05-31 | 24299109150004059245927 | MARATHON PETRO192591 ASHEVILLE NC | ASHEVILLE NC | 6.41 |
| 2019-06-01 | 24445009151300442198789 | CKE*SMOKY MOUNTAIN T 410 GATLINBURG TN | GATLINBURG TN | 116.00 |
| 2019-06-01 | 24493989151206499100582 | SIERRA NEVADA TAPRESTNC MILLS RIVER NC | MILLS RIVER NC | 82.00 |
| 2019-06-03 | 24492159153740279809987 | SQ *BEAN N BEAN FORT LEE NJ | FORT LEE NJ | 3.73 |
| 2019-06-03 | 24431069153083353331874 | AMZN MKTP US*M69GN9N01 AMAMZN.COM/BILLWA | AMZN.COM/BILL WA | 22.37 |
From this graph above, I find that I will pay less during the spring. I might spend a lot during the pre holiday such as November, Februry. My expense spike on August,since I just move my apartment during that month.
The top visit is Costco, where I shop for most of my grocery.
The top expense is also Costco. But, we can also see the car insurance and other one time purchase on my list.
In this graph, I can zoom in to see the specific date, I make high amount expense. For example, on 2020-02-16 I purchase a new car insurance on Sunday, but the cost is posted on 2020-02-17.
From this graph, I can see where I mostly visit and make purchase. As a commuter between New York and New Jersey, it really make sense to me.
From this graph, I can see the Monday category spike, However, I notices that all Sunday charge is post on Mondays. However, I do notice that I didn’t spend a lot on Saturday and Friday.
---
title: "Anly512_Quantified_Self"
author: "JIANWEI LI"
date: "4/21/2020"
output:
flexdashboard::flex_dashboard:
storyboard: true
social: menu
source_code: embed
vertical_layout: fill
---
```{r setup,include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(flexdashboard)
library(knitr)
library(readxl)
library(dplyr)
library(tidyverse)
library(ggplot2)
```
### Project Description
For this project, I am using a collection of 1 years of data on spending on my credit card, and my goal of this project is to collect, analyze and visualize the credit card data using the tools and methods covered in the class, and give answers to the questions below:
The scope of thie project is visualize my personal creditcard expense using the flexdashboard,ggplots and other visualization package in RStudio. To begin with, I would like to list five questions that I will use the data visualization to depict and unfold my analysis
1) What is the total spending by month in for the past year?
2) What is the top 10 merchant that I vist the most for the past year?
3) What is the top
10 merchat that I spent the most for the past year?
4) In which dates I spend the most in past year ?
5) What is the top 10 Merchant Location I spend the most by month for the past year?
6) What is the total expense per week of day for the past year?
### Data Preprocessing
Data Description:
- Date: Transaction Posted Date
- Reference_Number: Unique Transaction reference Number
- Payee: Merchant's Name
- Address: Merchant's Location
- Amount: Payment
```{r input,echo=FALSE}
setwd('C:/Users/JIANWEI LI/Documents/Rstudio/HU/512/FinalProject/')
data <-
list.files(path = "./data/",
pattern = "*.csv",
full.names = T) %>%
map_df(~read_csv(., col_types = cols(.default = "c")))
data$`Posted Date`<-as.Date(data$`Posted Date`,"%m/%d/%Y")
data$Amount<- as.numeric(data$Amount)
data$Amount = -data$Amount
data<-data[data$Amount>0,]
# sort by Posted Date
data<-data[order(data$`Posted Date`),]
colnames(data)[1]<-'Date'
colnames(data)[2]<-'Reference_Number'
kable(data[80:100,],caption = 'Sample data')
```
### Q1:What is the total spending by month in for the past year?
```{r, echo=FALSE}
library(lubridate)
library(zoo)
data$month<-month(as.POSIXlt(data$Date,format = '%d/%m/%Y'))
data$year<- year(as.POSIXlt(data$Date,format = '%d/%m/%Y'))
monthlyAmount<-aggregate(Amount~year+month,data,sum)
monthlyAmount<- monthlyAmount[
with(monthlyAmount,order(year,month)),
]
monthlyAmount<-within(monthlyAmount, Date <- sprintf("%d-%02d", year, month))
monthlyAmount<- monthlyAmount[2:13,]
p<- ggplot(monthlyAmount, aes(x = month, y=Amount)) +
geom_bar(stat = "identity", fill = "Blue")+
scale_x_discrete(limits=monthlyAmount$Date) +
labs(title = "Spending per Month", x = "Month", y = "Total Amount") +
theme_minimal()
p+theme(axis.text.x=element_text(angle=90))
```
***
From this graph above, I find that I will pay less during the spring. I might spend a lot during the pre holiday such as November, Februry. My expense spike on August,since I just move my apartment during that month.
### Q2:What is the top 10 merchant that I vist the most for the past year?
```{r, echo=FALSE}
q2<-aggregate(Amount~Payee,data,FUN = length)
q2<-q2[order(-q2$Amount),]
q2<-q2[c(1:10),]
ggplot(q2, mapping = aes(x = reorder(Payee, Amount), Amount)) +
geom_bar(stat = "identity")+
labs(title = "Top 10 Visit Merchant", x = "Merchant", y = "Total Amount") +
coord_flip()+
theme_minimal()
```
***
The top visit is Costco, where I shop for most of my grocery.
### Q3:What is the top 10 merchat that I spent the most for the past year?
```{r, echo=FALSE}
q3<-aggregate(Amount~Payee,data,FUN = sum)
q3<-q3[order(-q3$Amount),]
q3<-q3[c(1:10),]
ggplot(q3, mapping = aes(x = reorder(Payee, Amount), Amount)) +
geom_bar(stat = "identity")+
labs(title = "Top 10 Merchant by Expense", x = "Merchant", y = "Total Amount") +
coord_flip()+
theme_minimal()
```
***
The top expense is also Costco. But, we can also see the car insurance and other one time purchase on my list.
### Q4:In which dates I spend the most in past year ?
```{r, echo=FALSE}
q4<-aggregate(Amount~Date,data,FUN = sum)
q4<-q4[order(-q4$Amount),]
q4<-q4[c(1:20),]
ggplot(q4, mapping = aes(x = reorder(Date, Amount), Amount)) +
geom_bar(stat = "identity")+
labs(title = "Top 20 Date by Expense", x = "Date", y = "Total Amount") +
coord_flip()+
theme_minimal()
```
***
In this graph, I can zoom in to see the specific date, I make high amount expense. For example, on 2020-02-16 I purchase a new car insurance on Sunday, but the cost is posted on 2020-02-17.
### Q5:What is the top 10 Merchant Location I spend the most by month for the past year?
```{r, echo=FALSE}
q5<-aggregate(Amount~Address,data,FUN = length)
q5<-q5[order(-q5$Amount),]
q5<-q5[c(1:10),]
ggplot(q5, mapping = aes(x = reorder(Address, Amount), Amount)) +
geom_bar(stat = "identity")+
labs(title = "Top 10 Visit Merchant", x = "Merchant", y = "Total Amount") +
coord_flip()+
theme_minimal()
```
***
From this graph, I can see where I mostly visit and make purchase. As a commuter between New York and New Jersey, it really make sense to me.
### Q6:What is the total expense per week of day for the past year?
```{r, echo=FALSE}
library(dbplyr)
data$weekday<- weekdays(as.Date(data$Date))
q6<-aggregate(Amount~weekday,data,FUN = sum)
q6<-q6[order(-q6$Amount),]
ggplot(q6, mapping = aes(x = reorder(weekday, Amount), Amount)) +
geom_bar(stat = "identity")+
labs(title = "Top 10 Visit Merchant", x = "Merchant", y = "Total Amount") +
coord_flip()+
theme_minimal()
```
***
From this graph, I can see the Monday category spike, However, I notices that all Sunday charge is post on Mondays. However, I do notice that I didn't spend a lot on Saturday and Friday.