Overview

Project Objective

Develop a visualization dashboard based on a series of data about your own life. As someone who is horrible at keep track of my own spending habit, I hope to use this opportunity to take a closer and deeper look into my spending patterns, percentages and amounts, therefore I choose the data of 6-month credit card payment records from Oct-2022 to Mar-2023. The hope is to discover some insights on how I can more efficiently use my money, and where I can reduce my expense effectively.

Data Collection Methods

I managed to pull my credit card statement with an export of transaction date and post date, description, category, type and amount. I then further process the Excel file to clean up unwanted rows and columns to have a more focused data collection of Transaction Date / Category / Sales Amount.

Project Questions

  1. What’s my frequency of using credit card each month?
  2. Is there a relationship between transaction counts and total spending?
  3. What’s are the categories that I spend the most money in?
  4. Is there any hidden spending patterns in the categories?
  5. Pick 2 top spending categories and compare the spending patterns for insights on spending habits.

Usage Frequency

Column

Month

Number vs. Amount

Column

Transaction counts vs. total spending

Spending Categories

Column

Categories

Spending Pattern

Column

Shopping

Food & Drink

Groceries

Comparing Patterns

Column

Comparing Food&Drink / Travel

---
title: "ANLY 512 - Course Project"
author: "Ye Shen"
output: 
 flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: fill
    source_code: embed
date: "2023-04-17"
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
knitr::opts_knit$set(root.dir = "/Users/SHenmou/Downloads")
library(flexdashboard)
library(DT)
library(ggplot2)
library(dplyr)
library(dygraphs)
library(tidyverse)
library(lubridate)
library(plotly)

data=read.csv("/Users/SHenmou/Downloads/credit.csv")
data$Transaction.Date <- as.Date(data$Transaction.Date, format="%m/%d/%y")
```


Overview
===================================== 

### Project Objective
Develop a visualization dashboard based on a series of data about your own life. As someone who is horrible at keep track of my own spending habit, I hope to use this opportunity to take a closer and deeper look into my spending patterns, percentages and amounts, therefore I choose the data of 6-month credit card payment records from Oct-2022 to Mar-2023. The hope is to discover some insights on how I can more efficiently use my money, and where I can reduce my expense effectively. 

### Data Collection Methods
I managed to pull my credit card statement with an export of transaction date and post date, description, category, type and amount. I then further process the Excel file to clean up unwanted rows and columns to have a more focused data collection of Transaction Date / Category / Sales Amount.

### Project Questions
1. What's my frequency of using credit card each month? 
2. Is there a relationship between transaction counts and total spending?
3. What's are the categories that I spend the most money in?
4. Is there any hidden spending patterns in the categories?
5. Pick 2 top spending categories and compare the spending patterns for insights on spending habits.



```{r cars}
```


Usage Frequency
===================================== 

Inputs {.sidebar}
-------------------------------------

### Question
What's my frequency of using credit card in each month? 

### Summary
From the data and the visualization we can see my usage frequency of my credit card varies from as high as 88 counts to as low as 40 counts, but has an average of 65 counts each month. This shows that I at least uses my credit card twice daily. 

It is shown on the visualization that the time of spending occurred more during the first 3 months. This reflects the excitement and increase of usage since I just opened this credit card in September 2022. After then my usage count is showing a trend of decrease, cooling off from the purchases made in previous months. 

However, A travel was planned for Feb so we see the spike of usage counts in the month of February. From there I also noticed a pattern: in the month of Oct and Feb, where the travel spending occurs more frequently, we also see food and drink usage counts increases in the 2 months. This makes sense as I am much more likely to eat out while traveling. 



Column {.tabset}
-------------------------------------
### Month
```{r pressure, echo=FALSE}
data$Transaction.Month.Year <- lubridate::floor_date(data$Transaction.Date, "month")

bar1 <- ggplot (data, aes(x = Transaction.Month.Year, fill = Category)) +
  geom_bar() +
  theme_bw() +
  theme(axis.text.x=element_text(angle = -45, hjust = 0)) +
  labs(title = "Credit Card Usage By Month", 
       x = "Month-Year", 
       y = "Count")
bar1
```



Number vs. Amount
===================================== 

Inputs {.sidebar}
-------------------------------------

### Question
Is there a relationship between transaction counts and total spending?

### Summary
Being curious about the summary from the last question, I hope to dig deeper to see whether there's a connection between the transaction amount of the day and the spending value. By comparing the visualization of "Transaction per day" and "Amount spend", we sees that higher transaction counts does not translate to higher amount of spending. This is reasonable since my credit card is not only used to book travel tickets (singular, high value transaction) but also used to purchase subway tickets while traveling to NYC (repetitive, low value transactions).

We can also observe that in most days where the credit card was used, the usage counts mostly lays under 5, with lots of 2s and 3d, which corresponds to the calculation in previous page. 




Column {.tabset}
-------------------------------------
### Transaction counts vs. total spending

```{r}
scale = 100
ggplot(data = data)+
  geom_bar(aes(x = Transaction.Date))+ 
  xlab("Date")+
  ylab("Transactions per Day (black)")+
  geom_line(aes(x = Transaction.Date, y = Amount/scale), size = 0.5, color="red", group = 1)+
  scale_y_continuous(sec.axis = sec_axis(~./scale, name = "Amount Spent (Thousands, red)"))+
  theme_bw()
  
```




Spending Categories
===================================== 

Inputs {.sidebar}
-------------------------------------

### Question
What's are the categories that I spend the most money in?

### Summary
It's a straight forward observation that I spend the most money on the categories of traveling and shopping, with the third one being food and drink.

My most instinctive take-away is that the amount spend on shopping could very much be reduced. Unlike traveling expense which is highly focused on transportation and lodging which are more strategically planned ahead, spending on shopping happens more spontaneously thus is harder to intervene without setting straight my spending mindset. 
Therefore I think working on reducing the amount I spend on shopping can effectively reduce my overall cost. Reducing shopping spending allows me to allocate more towards traveling to have better travel experiences. 

On the other hand, the amount I spend on food and drink is almost triple the amount of grocery shopping. This is also a good alarm to hint at me that if I spend more time cooking at home which is more economical anyway, I can reduce my chance of eating out, thus saving a good amount in the food and drink category. 

Column {.tabset}
-------------------------------------
### Categories
```{r}
bar2 <- ggplot(data, aes(x="Category", y = Amount, fill = Category)) +
  geom_bar(stat="identity", width=1) +
  theme_bw() +
  labs(title = "Credit Card Spending By Category", 
       x = "Cateogry")

pie <- bar2 + 
  coord_polar("y", start = 0) +
  theme(axis.text = element_blank(),
        axis.ticks = element_blank(),
        panel.grid  = element_blank())
pie

```

Spending Pattern
===================================== 

Inputs {.sidebar}
-------------------------------------

### Question
Is there any hidden spending patterns in the categories?

### Summary
Shopping Category:
Around the end of November I invested in a digital embroidery machine as partly my own birthday gift and partly an investment for my streetwear brand. By observing that singular spike we can see that I actually have a quite controlled spending habit when it comes to shopping.

Food & Drink:
From the visualization we can almost see that most data bounces between close to zero and 75, with a few exceptions spiking over 150 and even over 300. Data close to zero are most likely beverage purchases outside.Since this is a joint credit card of my partner and I, we can see the average spending of us eating out sits at 52. This means that in average, we each spend 26 dollars for a meal when we eat out. In the current state of inflation, it seems like a bad idea economically to keep eating out. From this pattern in general, I can see major potential of saving by eating out less. 

Groceries:
Grocery shopping, on the other hand, seems to form a quite periodically consistent pattern, which correspond to me doing grocery shopping about once a week. It would be worth conducting a test of a month with all grocery spending instead of eating-out spending.



Column {.tabset}
-------------------------------------

### Shopping
```{r}
shopping <- data[data$Category == "Shopping", ]
ggplot(shopping, aes(x=Transaction.Date, y=Amount)) +
  geom_line(color="cyan", size=1, alpha=0.9, linetype=1) +
  ggtitle("Card Charges in Shopping From Oct 2022 to Mar 2023")
```


### Food & Drink
```{r}
FoodandDrink <- data[data$Category == "Food & Drink", ]
ggplot(FoodandDrink, aes(x=Transaction.Date, y=Amount)) +
  geom_line(color="pink", size=1, alpha=0.9, linetype=1) +
  ggtitle("Card Charges in Food & Drink From Oct 2022 to Mar 2023")
```


### Groceries
```{r}
Groceries <- data[data$Category == "Groceries", ]
ggplot(Groceries, aes(x=Transaction.Date, y=Amount)) +
  geom_line(color="green", size=1, alpha=0.9, linetype=1) +
  ggtitle("Card Charges in Groceries From Oct 2022 to Mar 2023")
```



Comparing Patterns
===================================== 

Inputs {.sidebar}
-------------------------------------

### Question
Pick 2 top spending categories and compare the spending patterns for insights on spending habits.

### Summary
From the data and the visualization we can see while travel category has more concentrated higher expenses that reads flights and hotel bookings, Food&Drink has a much higher frequency and consistency with lower mean, as shown by lines indicating average spend of 52 on dining and 234 for traveling.

Moreover, I aimed to look for a relation between travel spending and Food&Drink spending by observing how the data points are groups. However, there shows no sign that more and higher F&D expenses occurred during when travel expenses were made. This insights consolidated the theory mentioned before that much savings and cost reduction can be worked on in the category of Food & Drink during regular times (non-travel) and allocate the spending towards travel time to bettwe treat myself without burning a hole in my pocket.


Column {.tabset}
-------------------------------------
### Comparing Food&Drink / Travel

```{r}
spend_food <- subset(data, Category == "Food & Drink")
food <- plot_ly(spend_food, x = spend_food$"Transaction.Date", y = spend_food$Amount, name = "Food & Drink", type = 'scatter', mode = 'markers')
spend_travel <- subset(data, Category == "Travel")
food %>% add_trace(spend_travel, x = spend_travel$"Transaction.Date", y = spend_travel$Amount, name ="Travel",type = 'scatter',mode = 'markers') %>%
  add_lines(
    y = mean(spend_food$Amount),
    x = range(spend_food$"Transaction.Date"),
    name ="Food & Drink Mean Spending",
    line = list(
      color = "blue"
    ),
    inherit = FALSE,
    showlegend = TRUE
  )%>%
  add_lines(
    y = mean(spend_travel$Amount),
    x = range(spend_food$"Transaction.Date"),
    name ="Travel Mean Spending",
    line = list(
      color = "orange"
    ),
    inherit = FALSE,
    showlegend = TRUE
  )%>%  
  layout(xaxis = list(title = 'Dates'),
         yaxis = list(title = 'Spendings by category'))
```