Date Description Amount Who Category
1 2017-07-01 NINTENDO OF AMERICA-US 21.24 P Play; Recreation
2 2017-07-01 Amazon.com 13.98 P Other: miscellaneous
3 2017-07-01 AMAZON MKTPLACE PMTS 17.90 P Other: miscellaneous
4 2017-07-01 HANNAFORD #8017 59.97 XP Grocery
5 2017-07-04 SEPHORA 356 91.91 X Beauty and Fashion;
6 2017-07-04 NORDSTROM RACK #0547 21.97 X Beauty and Fashion;
7 2017-07-04 MBTA BOYLSTON 5.50 P Transporation related
8 2017-07-04 GULF OIL 92038495 26.75 P Transporation related
9 2017-07-04 TST* CROSS BRIDGE NOODLE 38.19 XP Food and Drink
10 2017-07-04 STARBUCKS STORE 49088 15.14 XP Food and Drink
The data set is from our bank statement from July to December last year; My boyfriend and I share the same Chase card so some of spendings are made by him or us. In the Who columne, there are three categories: X means purchases made by me; P means purchases made by my boyfirend and XP means purchases made by us together;
Graph 1 includes my tuition
Graph 2 is our daily expenditure without tuition
I always thought that the reason I have less savings than my boyfriend does is because I have to pay for the tuition for Harrisburg, but it turns out I spend much more than he does even without the cost from HU.
The second graph is a monthly base spending bar charts which is divided into three panels, each represent X, P and XP. The grey area in the back represent total spending. In most cases, I found myself spend more than P except for July and September. My spending in September could be considered an outlier because it was much less than other months. I remember clearly the reason being that my saving was at an all-time low and I can hardly pay back the credit card debt. I also notice that XP could be higher than X and P sometimes. In August, our total expenditure was very high because we usually took vacations in August and it was counted as a common expense.
In order to understand why I spend more, I need to take a look at the categories. I have 7 categories in total: Beauty and fashion (mostly clothes, shoes, cosmetic products, accessories, etc.) Food and drink; Grocery; Learning (any costs related to learning not including the cost occurs in Harrisburg) Other, miscellaneous; Play, recreation (vacation; movie; amusement parks, video games etc.) Transportation (parking fee; insurance; subway fee, car repair and maintenance etc.) In this chart, I meant to see our spending distribution on different categories; It seems that I have a huge portion of beauty and fashion expenditure which my boyfriend does not have; If I got rid of these, my expenditure would be much less than his; I also notice that he tends to spend more on food and recreation; It could be explained by that I often take salad from home while he tends to eat outside; He also likes to spend money on video games, which is a cost that I don’t have.
The most expensive expenditure comes from beauty and fasion.
Play and recreation has many outliers while food and drink; other; and grocery are compratively consistent
Combining X, P and XP together, I then plot a boxplot of each categories in the hope of finding the distribution of each category. According to the chart, beauty and fashion/ learning/play, recreation have more outliers than other categories. It means that we tend to purchase expensive products in these categories; Beauty and fashion has the highest average, it was probably due to the fact that most of the clothes and cosmetic products I bought are quite expensive; Food and drink has the least average, it makes perfect sense since most of the lunch meals are less than 15 dollars and it constitute most of the expenditure in this category.
The amount for each categories varies but some categories like food and drink,grocery are comparatively more stable than play/recreation and learning.
My most expensive purchase from July to December last year was my Canada goose winter coat; followed by the registration fee for CFA; The insurance fare comes third.
Pinlun’s most expensive purchase was his One-Plus phone; Followed by gift he bought for his family from LLBean and the third is the gift he bought me for my birthday.
Category n() sum(Amount) mean(Amount) sd(Amount)
1 Beauty and Fashion; 33 3977.91 120.54273 165.25486
2 Food and Drink 167 4230.45 25.33204 32.21409
3 Grocery 39 1805.96 46.30667 26.71442
4 Learning 7 785.36 112.19429 238.30006
5 Other: miscellaneous 53 1631.49 30.78283 36.47715
6 Play; Recreation 27 2340.06 86.66889 125.46066
7 Transporation related 26 1013.60 38.98462 87.38352
---
title: "Final Project-- Xizi Tong"
output:
flexdashboard::flex_dashboard:
storyboard: true
social: menu
source: embed
theme: spacelab
---
```{r setup, include=FALSE}
library(flexdashboard)
library("shinythemes")
```
```{r}
library(ggplot2)
library(dplyr)
library(data.table)
library(ggthemes)
library(knitr)
library(plotly)
library(readxl)
library(lubridate)
library(doBy)
```
### Spending Habit Analysis
```{r}
spending <- read_excel("E:/512 Hitch/final project/spending from 7-12.xlsx")
spending<-as.data.frame(spending)
colnames(spending)=c("Date","Description","Amount","Who","Category","Online","Month","Day")
head(spending[,c(1,2,3,4,5)],10)
```
***
The data set is from our bank statement from July to December last year; My boyfriend and I share the same Chase card so some of spendings are made by him or us. In the Who columne, there are three categories: X means purchases made by me; P means purchases made by my boyfirend and XP means purchases made by us together;
### Q1: Do I really spend more money than my boyfriend P does?
```{r}
mfrow=c(1, 2)
pl1=ggplot(spending,aes(x=Who,y=Amount))+geom_col()+ggtitle("Purchasing Behavior--Total Amount with HU")+labs(x="Who made this purchase",y="Amount of Money Spend")
pl1
spending2=filter(spending,Category!="HU cost")
pl2=ggplot(spending2,aes(x=Who,y=Amount))+geom_col()+ggtitle("Purchasing Behavior--Total Amount without HU")+labs(x="Who made this purchase",y="Amount of Money Spend")
pl2
```
***
- Graph 1 includes my tuition
- Graph 2 is our daily expenditure without tuition
I always thought that the reason I have less savings than my boyfriend does is because I have to pay for the tuition for Harrisburg, but it turns out I spend much more than he does even without the cost from HU.
### Q2: On montly bases, how is the spending distributed among X,P and XP?
```{r}
per_month_who <- spending2 %>%
select(Amount, Who, Month) %>%
group_by(Month,Who) %>%
summarise(Amount=sum(Amount))
ggplot(per_month_who, aes(x = Month,y=Amount)) +
geom_bar(data =per_month_who[,-2] , fill = "grey",alpha = .8,stat = "identity") +
geom_bar(aes(fill = factor(Who)),stat = "identity") +
facet_wrap(~ Who)
```
***
The second graph is a monthly base spending bar charts which is divided into three panels, each represent X, P and XP. The grey area in the back represent total spending. In most cases, I found myself spend more than P except for July and September. My spending in September could be considered an outlier because it was much less than other months. I remember clearly the reason being that my saving was at an all-time low and I can hardly pay back the credit card debt. I also notice that XP could be higher than X and P sometimes. In August, our total expenditure was very high because we usually took vacations in August and it was counted as a common expense.
### Why do I spend more than he does? What is the cause for this?
```{r}
spending2$Category=as.factor(spending2$Category)
pl3=ggplot(spending2,aes(x=Who,y=Amount))+geom_col(aes(fill=Category))+labs(x="Who spend the money",y="Money Spent($)")
pl3
```
***
In order to understand why I spend more, I need to take a look at the categories. I have 7 categories in total: Beauty and fashion (mostly clothes, shoes, cosmetic products, accessories, etc.) Food and drink; Grocery; Learning (any costs related to learning not including the cost occurs in Harrisburg) Other, miscellaneous; Play, recreation (vacation; movie; amusement parks, video games etc.) Transportation (parking fee; insurance; subway fee, car repair and maintenance etc.)
In this chart, I meant to see our spending distribution on different categories; It seems that I have a huge portion of beauty and fashion expenditure which my boyfriend does not have; If I got rid of these, my expenditure would be much less than his; I also notice that he tends to spend more on food and recreation; It could be explained by that I often take salad from home while he tends to eat outside; He also likes to spend money on video games, which is a cost that I don’t have.
### Q3: What are the distribution of these categories? Which categories have more outliers?
```{r}
base<-
plot_ly(spending2, y = ~Amount, color = ~Category, type = "box")%>%
layout(title=(text = "Spending Boxplot"),
xaxis=list(title="Category"),
yaxis = list(title = "Spending ($)"))
base
```
***
- The most expensive expenditure comes from beauty and fasion.
- Play and recreation has many outliers while food and drink; other; and grocery are compratively consistent
Combining X, P and XP together, I then plot a boxplot of each categories in the hope of finding the distribution of each category. According to the chart, beauty and fashion/ learning/play, recreation have more outliers than other categories. It means that we tend to purchase expensive products in these categories; Beauty and fashion has the highest average, it was probably due to the fact that most of the clothes and cosmetic products I bought are quite expensive; Food and drink has the least average, it makes perfect sense since most of the lunch meals are less than 15 dollars and it constitute most of the expenditure in this category.
### Q4: On montly bases, does the amount of these category changes a lot?
```{r}
base2<-
plot_ly(spending2, y = ~Amount, color = ~Category, type = "box",
frame = ~Month) %>%
layout(title=(text = "Spending Boxplot from July to December"),
xaxis=list(title="Month"),
yaxis = list(title = "Spending")) %>%
animation_button(
x = 0.1, xanchor = "right", y = -0.2, yanchor = "bottom"
) %>%
animation_slider(
currentvalue = list(prefix = "Month ", font = list(color="red"))
)%>%
animation_opts(
2000, easing = "elastic", redraw = FALSE
)
base2
```
***
The amount for each categories varies but some categories like food and drink,grocery are comparatively more stable than play/recreation and learning.
### To confirm my thougt, I plot the below bar chart
```{r}
Month=as.factor(spending2$Month)
table1=subset(spending2,Category=="Food and Drink"|Category=="Grocery")
pl7=
table1%>%ggplot(aes(Month, Amount)) +geom_col(aes(fill=Category))+labs(title="Food/Drink and Grocery")
pl7
Play<-subset(spending2,Category=="Play; Recreation")
pl8=
Play%>%ggplot(aes(Month,Amount))+geom_col(aes(fill=Category))+labs(title="Play and Recreation")
pl8
```
***
- To confirm my thoughts, I plot one plots of food and drink combined with grocery because they are all expenditures on necessities and thus more stable; If we have more groceries, we are less likely to eat outside because we have food in the fridge. Another plot is a monthly spending on play and recreation. Recreation is mostly a one-time thing and we can control it if I want to. For example, we did not spend a lot in Play and Recreation in September because my financials were tight but it was very hard to reduce spending on necessities.
### Q5: What are my most expensive purchases?
```{r}
amount_sorted <- spending2[order(-spending2$Amount), ]
df1=head(
data.frame(
Who= amount_sorted$Who,
Month= amount_sorted$Month,
Amount= amount_sorted$Amount,
Description= amount_sorted$Description,
Category=amount_sorted$Category
),10
)
mylegend<-
list(
title = "Description",
zeroline = FALSE,
showline = FALSE,
showticklabels = FALSE,
showgrid = FALSE)
df1$Description <- factor(df1$Description)
plot_ly(df1, x = df1$Description, y = df1$Amount,
type='bar', mode = "markers",
text = ~paste("Who: ", df1$Who,
"Description: ", df1$Description))%>%
layout(title=(text = "Top 10 Most Expensive"), xaxis=mylegend, yaxis = list(title= "Amount" ),showlegend=FALSE)
```
***
- My most expensive purchase from July to December last year was my Canada goose winter coat; followed by the registration fee for CFA; The insurance fare comes third.
- Pinlun's most expensive purchase was his One-Plus phone; Followed by gift he bought for his family from LLBean and the third is the gift he bought me for my birthday.
### Q6: How many purchases do we make for each categories in total?
```{r}
groupeddata<-group_by(spending2,Category)
df3<-summarise(groupeddata,n(),sum(Amount),mean(Amount),sd(Amount))
as.data.frame(df3)
```
***
- As expected,we made together 206 food and grocery purchases, the highest count in all categories. We also spend 33 times on Beauty and fashion category and the avereage amount is 120, wich is the highest average; In 6 months, we spend almost 4000 in beauty and fashion. It means that on average, we make more than 5 purchases every month. If we can cut this to 4 purchases very month, we can presumabaly save 5*120=600 dollars.