To get a feel for how the credit card could impact us on cultivating spending habits, which finally turns into wasting money, we collected credit card purchase history from the three members of our group. In this project we will firstly take a look at the spending distribution and try to explore the habits. And then dig deeper into the data to figure out what would be the most important trigger to effect people’s spending.
Dataset has been downloaded from chase.com from from 1.1.2018 to 12.31.2018. The source is my online bank account, and it was surprisingly easy to get fairly clean datasets in CSV format.
Next let’s take a look at the dataset and 5 following questions.
Category: travel, shopping, Food & Drink, Groceries, Entertainment, Health & Wellness, Bills & Utilities
Do these three members have different eating habit? Or different food culture in their city?
To answer this question, we are using bar chart to compare three people’s total spending distribution by month. From this chart, more money will be spent in November, December, January and August. Considering that the main holidays, eg: Thanks Giving, Christmans, New Year Eve, happens in November, December and January, we could come to the conclusion that people’s spending is highly related with the holidays.
We all know that people have huge difference of where they spent the money. How is it looks like? Considering our observers are now living in different cities, we can also figure out whether cities is also a factor to cause spending distribution difference?
As the plot shows shopping is the top category for all of individuals. Travel is the catogry occupied the 2nd or 3rd position on the list. But eveyone’s travel spending has different proportion of the total amount. First individual’s travel spending is extrem high among others. Compared to the second individuals, the first one and the third one might live in more luxury place since they paid more on bills and Utiilities. There is another interesting finding, when an indivdual has higher utilities (renting) fee his gas spending will be lower, vice versa. There’s a probable reason to explain this, people live near commute might pay higher for the renting while they could save more on the costing of driving. When people choose to live further with a cheaper renting price might cost more on driving. This conclusion also verifies that the houses/Apartemnts prices near commute or working district are usually higher.
In this part, we will figure out three observers’ frequency of using cards. Is it related to the total amount of spending? The answer will help us to better
It is related with money spending. Individual 1 spent more frequently during the holiday seasons. It is same as individual 2. Individual 3’s spend frequence circle also matched with his spendings.
Please note: there is no Sunday transaction record since banks shut down.
People’s mood might largely affect the spending behavior. Whether people with a happy Friday mood will spend more than Tuesday blue? Or on the contrary way? Or people just want to use credit card no matter what their mood which date it is?
Since there is no Sunday transaction, we can see the data is reflecting on Monday. They tend to spent more on Weekends than Weekdays which matches the common sense.
Also it is obviously showed by the chart that people tend to spend more on Friday. Finishing 5 day working flow, poeple might choose to go dinner or happy hour with friends which largely affect their spending amount on Friday. On the contrary, people spend least on Wednesday, the bad mood propbably the reason to explain that. In a nutshell, we could conclude that the total spending is highly realted to people’s mood. When people are hyper they tend to spend more mindless vice versa. This finding is vital, because it verifies that better mood adjusting will largely decrease mindless spending and unnecessary money wasting.
In this part, We use the Dumbbell plot to visualize the data, which proves to be a nice tool to show differences among pepople’s eating habit.
From the chart, the percetage of eating of individuals 1 is less than the other two. The individual 3 has the most obviosu gap when compared to others.
It’s probably that the first erson is using another creit card for restaurant which has more cash back.
Considering the gap chart performance, we could get the conclusion easily that people do have different spending habit when it comes to eating. The difference might cause by internal reason such as the individual living habit, or the external reason such as the city culture.
Lets recall Category analysis. Individual 1 spend more on living. It matched the percentage as well.
---
title: "The Power of Credit Card - Final Project ANLY 512"
author: "Yang Lu, jiayi Zhang, Yingwen Xue"
output:
flexdashboard::flex_dashboard:
orientation: columns
social: menu
storyboard: true
source: embed
vertical_layout: fill
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
#install.packages("ggplot2")
library(flexdashboard)
library(ggplot2)
library(tidyverse)
```
### Introduction
To get a feel for how the **credit card** could impact us on cultivating spending habits, which finally turns into wasting money, we collected credit card purchase history from the three members of our group. In this project we will firstly take a look at the spending distribution and try to explore the habits. And then dig deeper into the data to figure out what would be the most important trigger to effect people’s spending.
Dataset has been downloaded from chase.com from **from 1.1.2018 to 12.31.2018**. The source is my online bank account, and it was surprisingly easy to get fairly clean datasets in CSV format.
Next let's take a look at the dataset and 5 following questions.
### Data variables and Five questions
- **Variables**
- PostDate = Exact date the transaction was settled
- Month = Represented by number; e.g. April is 4.
- Amount
- Category: travel, shopping, Food & Drink, Groceries, Entertainment, Health & Wellness, Bills & Utilities
- **Questions**
- How spending amount will be affected by the month? is it relate to the holiday or some other reasons?
- What do those people typical purchases look like in terms of category?
- How often do we use their credit cards?
- Which date of the week people tend to spend more?
- Do these three members have different eating habit? Or different food culture in their city?
### How spending amount will be affected by the month? is it relate to the holiday or some other reasons?
```{r}
# Input Prepared Quantified Life Dataset
data <- read.csv(file="C:/Users/jlu/Downloads/ALNY_512_Raw.csv", header=TRUE)
data <- subset(data, data$Amount<0)
data$Amount=data$Amount*-1
# 1. Total Expense in Month
summon0 <- aggregate(data$Amount, by=list(Category=data$Month, data$ID),FUN=sum)
summon <- summon0[order(summon0$Group.2, summon0$x),]
summon$Category=as.factor(summon$Category)
summon$id=as.factor(summon$Group.2)
library(ggplot2)
p <- ggplot(data=summon, aes(x=Category, y=x, fill=id)) +
geom_bar(stat="identity", color="black", position=position_dodge())+
theme_minimal()+
labs(title="Total in Month - 01/01/2018 to 12/31/2018", y = "Total Expense", x = "Month")
# Use custom colors
p + scale_fill_manual(values=c("#E69F00", "#56B4E9", "#009E73"))
```
***
- How spending amount will be affected by the month? is it relate to the holiday or some other reasons?
To answer this question, we are using bar chart to compare three people's total spending distribution by month.
From this chart, more money will be spent in November, December, January and August. Considering that the main holidays, eg: Thanks Giving, Christmans, New Year Eve, happens in November, December and January, we could come to the conclusion that people's spending is highly related with the holidays.
### What do those people typical purchases look like in terms of category?
```{r}
sumcat0 <- aggregate(data$Amount, by=list(Category=data$Category,data$ID), FUN=sum)
sumcat <- sumcat0[order(sumcat0$x),]
sumcat$Category=as.factor(sumcat$Category)
sumcat$id=as.factor(sumcat$Group.2)
# Dotplot: Grouped Sorted and Colored
# Sort by amount, category and color by ID
x <- sumcat[order(sumcat$x),] # sort by amount
x$color[x$id==1] <- "red"
x$color[x$id==6] <- "blue"
x$color[x$id==8] <- "darkgreen"
grps <- as.factor(x$id)
my_cols <- c("#E69F00", "#56B4E9", "#009E73")
dotchart(x$x, labels=sumcat$Category,cex=0.65, groups= x$id,
main="Sum of Purchase Category - 01/01/2018 to 12/31/2018", xlab="Amount Spent", gcolor="black", color = my_cols[grps])
```
***
- What do those people typical purchases look like in terms of category?
We all know that people have huge difference of where they spent the money. How is it looks like? Considering our observers are now living in different cities, we can also figure out whether cities is also a factor to cause spending distribution difference?
As the plot shows shopping is the top category for all of individuals.
Travel is the catogry occupied the 2nd or 3rd position on the list. But eveyone's travel spending has different proportion of the total amount. First individual's travel spending is extrem high among others.
Compared to the second individuals, the first one and the third one might live in more luxury place since they paid more on bills and Utiilities.
There is another interesting finding, when an indivdual has higher utilities (renting) fee his gas spending will be lower, vice versa. There's a probable reason to explain this, people live near commute might pay higher for the renting while they could save more on the costing of driving. When people choose to live further with a cheaper renting price might cost more on driving. This conclusion also verifies that the houses/Apartemnts prices near commute or working district are usually higher.
### How often do they use their credit cards? ?
```{r}
library(dplyr)
library(magrittr)
count=data %>% count(Month,ID)
count=count[order(count$ID, count$Month),]
count$Month=as.factor(count$Month)
count$ID=as.factor(count$ID)
library(ggplot2)
p<-ggplot(count, aes(x=Month, y=n, group=ID)) +
geom_line(aes(color=ID))+
geom_point(aes(color=ID))+
theme_minimal()+
labs(title="Fequency in Month - 01/01/2018 to 12/31/2018", y = "Frequency Count", x = "Month")
# Use custom color palettes
p+scale_color_manual(values=c("#E69F00", "#56B4E9", "#009E73"))
```
***
- How often do we use their credit cards?
In this part, we will figure out three observers' frequency of using cards. Is it related to the total amount of spending? The answer will help us to better
It is related with money spending. Individual 1 spent more frequently during the holiday seasons. It is same as individual 2.
Individual 3's spend frequence circle also matched with his spendings.
### Which date of the week people tend to spend more?
```{r}
# Total Expense in Weekday
sumweek0 <- aggregate(data$Amount, by=list(Category=data$Weekday, data$ID),FUN=sum)
sumweek <- sumweek0[order(sumweek0$Group.2, sumweek0$x),]
sumweek$Category=as.factor(sumweek$Category)
sumweek$id=as.factor(sumweek$Group.2)
library(ggplot2)
p <- ggplot(data=sumweek, aes(x=Category, y=x, fill=id)) +
geom_bar(stat="identity", color="black", position=position_dodge())+
theme_minimal()+
labs(title="Total in Week - 01/01/2018 to 12/31/2018", y = "Total Expense", x = "Weekday")
# Use custom colors
p + scale_fill_manual(values=c("#E69F00", "#56B4E9", "#009E73"))
```
***
- Which date of the week people tend to spend more?
Please note: there is no Sunday transaction record since banks shut down.
People's mood might largely affect the spending behavior. Whether people with a happy Friday mood will spend more than Tuesday blue? Or on the contrary way? Or people just want to use credit card no matter what their mood which date it is?
Since there is no Sunday transaction, we can see the data is reflecting on Monday.
They tend to spent more on Weekends than Weekdays which matches the common sense.
Also it is obviously showed by the chart that people tend to spend more on Friday. Finishing 5 day working flow, poeple might choose to go dinner or happy hour with friends which largely affect their spending amount on Friday. On the contrary, people spend least on Wednesday, the bad mood propbably the reason to explain that.
In a nutshell, we could conclude that the total spending is highly realted to people's mood. When people are hyper they tend to spend more mindless vice versa. This finding is vital, because it verifies that better mood adjusting will largely decrease mindless spending and unnecessary money wasting.
### Do these three members have different eating habit? Or different food culture in their city?
```{r}
# Total Expense in Weekday
datax=subset(data, data$Category=='Food & Drink')
sum0=aggregate(datax$Amount, by=list(datax$ID),FUN=sum)
sum=aggregate(data$Amount, by=list(data$ID),FUN=sum)
total=merge(sum0,sum,by="Group.1")
total$ID=as.factor(total$Group.1)
total$sub=total$x.x
total$total=total$x.y
total$ratiofood=total$x.x/total$x.y
total$ratiorest=1-total$x.x/total$x.y
total=total[, c("sub","total", "ratiofood", "ratiorest", "ID")]
# load package
library(ggplot2) # devtools::install_github("hadley/ggplot2")
library(ggalt) # devtools::install_github("hrbrmstr/ggalt")
library(dplyr) # for data_frame() & arrange()
# manually re-assign to df
df <- data_frame(country=c("1", "2", "3"),
ages_18_to_34=c(0.143, 0, 0.017),
ages_35=c(0.857, 1, 0.983),
diff=(ages_35-ages_18_to_34)*100)
# we want to keep the order in the plot, so we use a factor for country
df <- arrange(df, desc(diff))
df$country <- factor(df$country, levels=rev(df$country))
# we only want the first line values with "%" symbols (to avoid chart junk)
# quick hack; there is a more efficient way to do this
percent_first <- function(x) {
x <- sprintf("%d%%", round(x*100))
x[2:length(x)] <- sub("%$", "", x[2:length(x)])
x
}
gg <- ggplot()
# doing this vs y axis major grid line
gg <- gg + geom_segment(data=df, aes(y=country, yend=country, x=0, xend=1), color="#b2b2b2", size=0.15)
# dum…dum…dum!bell
gg <- gg + geom_dumbbell(data=df, aes(y=country, x=ages_18_to_34, xend=ages_35),
size=1.5, color="#b2b2b2", point.size.l=3, point.size.r=3,
point.colour.l="#9fb059", point.colour.r="#edae52")
# text below points
gg <- gg + geom_text(data=filter(df, country=="1"),
aes(x=ages_35, y=country, label="Non-Food"),
color="#9fb059", size=2, vjust=-1, fontface="bold", family="Calibri")
gg <- gg + geom_text(data=filter(df, country=="1"),
aes(x=ages_18_to_34, y=country, label="Food"),
color="#edae52", size=2, vjust=-1, fontface="bold", family="Calibri")
# text above points
gg <- gg + geom_text(data=df, aes(x=ages_35, y=country, label=percent_first(ages_35)),
color="#9fb059", size=2.75, vjust=2.5, family="Calibri")
gg <- gg + geom_text(data=df, color="#edae52", size=2.75, vjust=2.5, family="Calibri",
aes(x=ages_18_to_34, y=country, label=percent_first(ages_18_to_34)))
# difference column
gg <- gg + geom_rect(data=df, aes(xmin=1.05, xmax=1.175, ymin=-Inf, ymax=Inf), fill="white")
gg <- gg + geom_text(data=df, aes(label=diff, y=country, x=1.1125), fontface="bold", size=3, family="Calibri")
gg <- gg + geom_text(data=filter(df, country=="1"), aes(x=1.1125, y=country, label="DIFF"),
color="#7a7d7e", size=1.5, vjust=-1, fontface="bold", family="Calibri")
gg <- gg + scale_x_continuous(expand=c(0,0), limits=c(0, 1.175))
gg <- gg + scale_y_discrete(expand=c(0.075,0))
gg <- gg + labs(x=NULL, y=NULL, title="The Food and Non-food Gap by Different People")
gg <- gg + theme_bw(base_family="Calibri")
gg <- gg + theme(panel.grid.major=element_blank())
gg <- gg + theme(panel.grid.minor=element_blank())
gg <- gg + theme(panel.border=element_blank())
gg <- gg + theme(axis.ticks=element_blank())
gg <- gg + theme(axis.text.x=element_blank())
gg <- gg + theme(plot.title=element_text(face="bold"))
gg <- gg + theme(plot.subtitle=element_text(face="italic", size=9, margin=margin(b=12)))
gg <- gg + theme(plot.caption=element_text(size=7, margin=margin(t=12), color="#7a7d7e"))
gg <- gg + theme_minimal()
gg
```
***
- Do these three members have different eating habit? Or different food culture in their city?
In this part, We use the Dumbbell plot to visualize the data, which proves to be a nice tool to show differences among pepople's eating habit.
From the chart, the percetage of eating of individuals 1 is less than the other two. The individual 3 has the most obviosu gap when compared to others.
It's probably that the first erson is using another creit card for restaurant which has more cash back.
Considering the gap chart performance, we could get the conclusion easily that people do have different spending habit when it comes to eating. The difference might cause by internal reason such as the individual living habit, or the external reason such as the city culture.
Lets recall Category analysis. Individual 1 spend more on living. It matched the percentage as well.