Quantified Self Finance Management
ANLY512: Data Visualization Final Project
Self-quantifying is a way to combine technology into data inception of a person’s daily life aspects. Based on the Wikipedia definition, self-quantifying can be collecting data in terms of people ’s food consuming, quality of surrounding air, mood and etc or can defines in the other terms of self-tracking, auto-analytics, personal informatics and etc. Gary Wolf has described quantified self is “self-knowledge through self-tracking with technology”. Quantified Self also has known as self-knowledge through numbers and it is a great assistance to find out how our life events and things are going or passed. Since visualizing data is a powerful way to derive high-level insights about the underlying patterns in the data and provides helpful clues as to where we need to dig down deeper for the gold [Data Mining Applications with R by Yanchang Zhao, Yonghua Cen], so mixture of recording our self-life routine and data visualization can give us the powerful insight and knowledge of how we act. Plotting and graphing the life rubric can show us the more crystal and clear view of how we live. Do we need to change our life style or keep and stay on current track. I collected data from my credit cards and checking account expenses from October 2017 to March 2018.I am going to find the answers of following questions 1.What are the total expenses per month and Which month has the maximum expense? 2.What is the maximum purchase for?(in each month) 3.Compare the rate of total expenses per income. 4.How does the season impact on utility purchase? 5.What is composition of moving expenses? 6.Is there any correlation between dining out and grocery? 7.What are the top 10 expenses?
[1] 7014.80 7091.33 7393.26 7507.54 8146.68 9531.86
1.What are the total expenses per month and Which month has the maximum expense?
Donut chart shows the percentage of total purchases per category over 6 months.The bar graphs show the total amount of purchase per month over 6 months. Since we purchased for a house in January 2017 and had to pay two mortgages in March 2018, so this month bar graph has the maximum height and maximum expense(except health insurance and college which are the fixed amount) belongs to this months and mortgage overall has the big portion of expenses.
[1] October November March January December February
Levels: December February January March November October
2.What is the maximum purchase for?(in each month)
This bar plot shows what has been the most purchase in each month with the cost of that purchase. it seems most of expenses are related with grocery if something does not happen unexpectedly.Leakaging roof and hospital bill and yearly house insurance were 3 unexpected events which covered the most expenses in 3 months.(I eliminated Months,TotalExpensesperMonths,Health.Insurance,Mortgage and College, because all these numeric variables(except Month) are fixed amount and may effect the result)
3.Compare the rate of total expenses per income.
All three graphs show the amount of purchase per each month has the ascending rate, and in December, January, February and March are even more than the total of income and even box plot has shown the median of total amount of purchase over 6 months is higher than the total income.This is because of some surprised events and need to be managed in future.
4.How does the season impact on utility purchase?
This plot clearly shows that each season significantly has effect on utility cost and as is shown, during the cold weather period the amount of utility purchase goes high and during warmer months the cost of utility is going down.
[1] 10981.00 140.00 1425.15 644.00 792.00 182.66
5.What is composition of moving expenses?
Obviously moving to new house is along with some cost. I was wondering which category that is related with moving has the maximum cost. This colorful buble chart shows the big portion of expenses blong to the Mortgage, since we need to pay the rent of current house we had and the first mortgage of new house.
6.Is there any correlation between dining out and grocery?
Since the big part of each month belongs to grocery purcheses Even though includes many other things not only food, I was wondering if there is any correlation between Grocery and dining out.Two graphs show there is negative correlation between these two.
7.What are the top 10 expenses?
According to the part 3, I wanted to find the first 10 of most expenses over 6 months to manage my shopping list and cost. Here are listed first 10 purchases which need to take care of.
---
title: "Personal Financial Status"
output:
flexdashboard::flex_dashboard:
social: menu
source_code: embed
orientation: rows
storyboard: true
---
```{r setup, include=FALSE}
library(flexdashboard)
```
```{r}
setwd("C:/Users/User/Desktop/ANLY512/Final Project")
```
###Overview of the Quantified Self Finance Management
######Photo credit:Google

***
ANLY512: Data Visualization Final Project
Self-quantifying is a way to combine technology into data inception of a person's daily life aspects. Based on the Wikipedia definition, self-quantifying can be collecting data in terms of people 's food consuming, quality of surrounding air, mood and etc or can defines in the other terms of self-tracking, auto-analytics, personal informatics and etc.
Gary Wolf has described quantified self is "self-knowledge through self-tracking with technology". Quantified Self also has known as self-knowledge through numbers and it is a great assistance to find out how our life events and things are going or passed.
Since visualizing data is a powerful way to derive high-level insights about the underlying patterns in the data and provides helpful clues as to where we need to dig down deeper for the gold [Data Mining Applications with R by Yanchang Zhao, Yonghua Cen], so mixture of recording our self-life routine and data visualization can give us the powerful insight and knowledge of how we act. Plotting and graphing the life rubric can show us the more crystal and clear view of how we live. Do we need to change our life style or keep and stay on current track.
I collected data from my credit cards and checking account expenses from October 2017 to March 2018.I am going to find the answers of following questions
1.What are the total expenses per month and Which month has the maximum expense?
2.What is the maximum purchase for?(in each month)
3.Compare the rate of total expenses per income.
4.How does the season impact on utility purchase?
5.What is composition of moving expenses?
6.Is there any correlation between dining out and grocery?
7.What are the top 10 expenses?
### Q1.Total Expenses(Fig.1 Percentage Of Total Expenses Over 6 Months))
```{r}
NewExpenses=read.csv("New6MonthExpensesDataset.csv")
```
```{r}
library("ggplot2")
library("plotly")
```
```{r}
library("reshape")
md=melt(NewExpenses[,])
library("cowplot")
library("flexdashboard")
library("dplyr")
library("plotly")
md=as.data.frame(md[-c(1:6,169:174),-1])
colnames(md)=c("type","amount")
type=c(NewExpenses[-c(1:6),c(3:29)])
p <- md %>% group_by(type)
f=p%>% summarize(amount=sum(md[,2]))
q <- p %>% plot_ly(labels = ~ type , values = ~amount) %>% add_pie(hole = 0.6)
q
```
### Q1.Total Expenses(Fig.2 and Fig.3 Total Expenses Over 6 Months)
```{r}
NewExpenses$TotalExpensesperMonths=(NewExpenses$Grocery)+(NewExpenses$Utility)+(NewExpenses$Mortgage)+(NewExpenses$Trash.and.sewer)+(NewExpenses$Entertainment)+(NewExpenses$Book)+(NewExpenses$College)+(NewExpenses$Internet)+(NewExpenses$Maintenance)+(NewExpenses$Cloth)+(NewExpenses$Toy.and.Gift)+(NewExpenses$Dining.Ot)+(NewExpenses$Fuel)+(NewExpenses$Car.Leasing)+(NewExpenses$Car.Insurance)+(NewExpenses$Health.Insurance)+(NewExpenses$House.Insurance)+(NewExpenses$Doctor.and.Medication.Bills)+(NewExpenses$Saving.for.buying.stock)+(NewExpenses$Moving.Expenses)+(NewExpenses$Fitness)+(NewExpenses$Music.Class)+(NewExpenses$Gymnastic.Class)+(NewExpenses$Gardening.and.Painting.Equipments)+(NewExpenses$Furniture.Finance.Purchase)+(NewExpenses$School.Expenses)+(NewExpenses$Other.Checks.and.Fees)
NewExpenses$TotalExpensesperMonths
par(mfrow = c(1,2),mar=c(4,0,4,0)+2, las=2,mgp=c(3, 0.5, 0))
b=data.frame(Month=c("October","November","December","January", "February", "March"),NewExpenses$TotalExpensesperMonths)
plot(b$NewExpenses.TotalExpensesperMonths, type = "b", xlab="Month", xaxt="n", ylab="Total Expenses", main="Total Expenses per months",ylim= c(6000, 10000))
ticks <- as.character(b$Month)
val2 <- dim(b)[1]
axis(1, at=seq(1:val2), labels=ticks, las=2 )
c<- barplot(NewExpenses$TotalExpensesperMonths,names.arg = as.character(b$Month),col=topo.colors(12),xlab="Month",ylab="Total Expenses", main="Total Expenses per months", ylim= c(0, 10000))
text(c, 3, labels=NewExpenses$TotalExpensesperMonths, srt = 90, pos = 4.9, cex = 0.8, col = "red")
```
***
**1.What are the total expenses per month and Which month has the maximum expense?**
Donut chart shows the percentage of total purchases per category over 6 months.The bar graphs show the total amount of purchase per month over 6 months.
Since we purchased for a house in January 2017 and had to pay two mortgages in March 2018, so this month bar graph has the maximum height and maximum expense(except health insurance and college which are the fixed amount) belongs to this months and mortgage overall has the big portion of expenses.
### Q2.Maximum Purchases In Each Month
```{r}
par(mar=c(4,6,4,0))
df = NewExpenses
df = df[ , !(names(df) %in% c("Months","TotalExpensesperMonths",
"Health.Insurance","Mortgage","College"))]
Month_names <- NewExpenses$Months
max_values <- apply(X=df, MARGIN=1, FUN=max)
Expense_names <- colnames(df)[apply(df,1,which.max)]
sorted_values <- sort(max_values, index.return=TRUE)
Month_names[sorted_values$ix]
xx <- barplot(sorted_values$x, main="Most expenses each month", horiz=TRUE,
names.arg=Month_names[sorted_values$ix], las=2)
text_labels= paste(Expense_names[sorted_values$ix], sorted_values$x, sep="=")
text(200, seq(0.5,12,1.2), label = text_labels, pos = 3, cex = 0.8, col = "red")
```
***
**2.What is the maximum purchase for?(in each month)**
This bar plot shows what has been the most purchase in each month with the cost of that purchase. it seems most of expenses are related with grocery if something does not happen unexpectedly.Leakaging roof and hospital bill and yearly house insurance were 3 unexpected events which covered the most expenses in 3 months.(I eliminated Months,TotalExpensesperMonths,Health.Insurance,Mortgage and College, because all these numeric variables(except Month) are fixed amount and may effect the result)
### Q3.Total Expenses VS Income(Fig.1)
```{r}
NewExpenses$TotalExpensesperMonths=(NewExpenses$Grocery)+(NewExpenses$Utility)+(NewExpenses$Mortgage)+(NewExpenses$Trash.and.sewer)+(NewExpenses$Entertainment)+(NewExpenses$Book)+(NewExpenses$College)+(NewExpenses$Internet)+(NewExpenses$Maintenance)+(NewExpenses$Cloth)+(NewExpenses$Toy.and.Gift)+(NewExpenses$Dining.Ot)+(NewExpenses$Fuel)+(NewExpenses$Car.Leasing)+(NewExpenses$Car.Insurance)+(NewExpenses$Health.Insurance)+(NewExpenses$House.Insurance)+(NewExpenses$Doctor.and.Medication.Bills)+(NewExpenses$Saving.for.buying.stock)+(NewExpenses$Moving.Expenses)+(NewExpenses$Fitness)+(NewExpenses$Music.Class)+(NewExpenses$Gymnastic.Class)+(NewExpenses$Gardening.and.Painting.Equipments)+(NewExpenses$Furniture.Finance.Purchase)+(NewExpenses$School.Expenses)+(NewExpenses$Other.Checks.and.Fees)
income=7276
```
```{r}
library(plotly)
Months <- c("October", "November", "December","January","February","March")
TotalExpensesperMonths <- c(7014.80, 7091.33, 7393.26,7507.54,8146.68,9531.86)
income <- c(7276,7276,7276,7276,7276,7276)
data <- data.frame(Months, TotalExpensesperMonths, income)
p <- plot_ly(data, x = ~Months, y = ~TotalExpensesperMonths, type ='bar', name = 'TotalExpensesperMonths') %>%
add_trace(y = ~income, name = 'income') %>%
layout(yaxis = list(title = 'Count'), barmode = 'group' )
p
```
### Q3.Total Expenses VS Income(Fig.2 and Fig.3)
```{r}
par(mfrow = c(1,2),mar=c(4,0,4,0)+2)
H <- c(7014.80,7091.33,7393.26, 7507.54, 8146.68, 9531.86)
M <- c("OCT","NOV","DEC","JAN","FEB","MAR")
barplot(H,names.arg = M,xlab = "Month",ylab = "Total Purchase of Each MOnth",col = "blue",
main = "Total Purchases Vs Income",border = "red",ylim = c(0,10000))
abline(h=7276)
boxplot(NewExpenses$TotalExpensesperMonths, col = "blue" , main="")
abline(h = 7276)
```
***
**3.Compare the rate of total expenses per income.**
All three graphs show the amount of purchase per each month has the ascending rate, and in December, January, February and March are even more than the total of income and even box plot has shown the median of total amount of purchase over 6 months is higher than the total income.This is because of some surprised events and need to be managed in future.
### Q4.Seasonal Impact On Utility Cost
```{r}
b=data.frame(Month=c("October","November","December","January", "February", "March"),NewExpenses$Utility)
h=plot(b$NewExpenses.Utility, type = "b", xlab="Month", xaxt="n", ylab="Total Utility Expenses", main="Total Utility Expenses Per Month",ylim= c(0,300))
ticks <- as.character(b$Month)
val2 <- dim(b)[1]
axis(1, at=seq(1:val2), labels=ticks, las=2 )
```
***
**4.How does the season impact on utility purchase?**
This plot clearly shows that each season significantly has effect on utility cost and as is shown, during the cold weather period the amount of utility purchase goes high and during warmer months the cost of utility is going down.
### Q5.Moving Expenses
```{r}
library(plotly)
TotalMovingExpenses= c(sum(NewExpenses$Mortgage),sum(NewExpenses$Trash.and.sewer),sum(NewExpenses$Maintenance),sum(NewExpenses$House.Insurance),sum(NewExpenses$Moving.Expenses),sum(NewExpenses$Gardening.and.Painting.Equipments))
TotalMovingExpenses
q=colSums(Filter(is.numeric, NewExpenses[,c(5,6,11,19,22,26)]))
w <- as.factor(c('Mortgage', 'Trash.and.sewer', 'Maintenance' ,'ouse.Insurance','Moving.Expenses',
'Gardening.and.Painting.Equipments'))
p <- plot_ly(NewExpenses, x = ~w, y = ~TotalMovingExpenses, text = ~w,color = ~w, type = 'scatter', mode = 'markers', size = ~q, colors = 'Paired',
marker = list(opacity = 0.5, sizemode = 'diameter')) %>%
layout(title = 'Cost Of Moving To New House',
xaxis = list(showgrid = FALSE),
yaxis = list(showgrid = FALSE),
showlegend = TRUE)
p
```
***
**5.What is composition of moving expenses?**
Obviously moving to new house is along with some cost. I was wondering which category that is related with moving has the maximum cost. This colorful buble chart shows the big portion of expenses blong to the Mortgage, since we need to pay the rent of current house we had and the first mortgage of new house.
### Q6.Dinning Out Cost VS Grocery Purchase(Fig.1)
```{r}
library(GGally)
ggpairs(NewExpenses,
columns = c("Grocery", "Dining.Ot"),
upper = list(continuous = wrap("cor",
size = 5)),
lower = list(continuous = "smooth"))
```
### Q7.Dinning Out Cost VS Grocery Purchase(Fig.2)
```{r}
library(PerformanceAnalytics)
my_data <- NewExpenses[, c(3,14)]
chart.Correlation(my_data, histogram=TRUE, pch=19)
```
***
**6.Is there any correlation between dining out and grocery?**
Since the big part of each month belongs to grocery purcheses Even though includes many other things not only food, I was wondering if there is any correlation between Grocery and dining out.Two graphs show there is negative correlation between these two.
### Q7.Top 10 Purchases Over 6 Months
```{r}
a=colSums(Filter(is.numeric, NewExpenses[,-c(1:2,30)]))
library(dplyr)
b=as.data.frame(table(a))
a_sort=arrange(b,desc(a))
Top10Expenses=a_sort[1:10,]
Top10Expenses["Category"] <- 0
for ( i in 1:10){
if(Top10Expenses[i,1]==10981.00){Top10Expenses[i,3]<-"Mortgage"}
if(Top10Expenses[i,1]==10080.00){Top10Expenses[i,3]<-"Health.Insurance"}
if(Top10Expenses[i,1]==8400.00){Top10Expenses[i,3]<-"College"}
if(Top10Expenses[i,1]==3129.56){Top10Expenses[i,3]<-"Grocery"}
if(Top10Expenses[i,1]==2224.38){Top10Expenses[i,3]<-"Car.Leasing"}
if(Top10Expenses[i,1]==1425.15){Top10Expenses[i,3]<-"Maintenance"}
if(Top10Expenses[i,1]==1175.00){Top10Expenses[i,3]<-" Doctor.and.Medication.Bills"}
if(Top10Expenses[i,1]==1129.57){Top10Expenses[i,3]<-"Dining.Ot"}
if(Top10Expenses[i,1]==1124.53){Top10Expenses[i,3]<-"Utility"}
if(Top10Expenses[i,1]==792.00){Top10Expenses[i,3]<-"Moving.Expenses"}
}
View(Top10Expenses)
sortTop10Expenses=Top10Expenses[,c(1,3)]
library(ggplot2)
ggplot(sortTop10Expenses,aes(x=Category, y=a )) +
geom_boxplot()+
theme(axis.text.x = element_text(angle = 45, hjust = 1))+labs(title="Top 10 Most Expenses in 6 Months",x="Top 10 Categories Of Purchases",y="Total Amount Of Purchase in 6 months")
```
***
**7.What are the top 10 expenses?**
According to the part 3, I wanted to find the first 10 of most expenses over 6 months to manage my shopping list and cost. Here are listed first 10 purchases which need to take care of.