The Quantified Self (QS) is a movement motivated to leverage the synergy of wearables, analytics, and “Big Data”. This movement exploits the ease and convenience of data acquisition through the internet of things (IoT) to feed the growing obsession of personal informatics and quotidian data. The website http://quantifiedself.com/ is a great place to start to understand more about the QS movement.
The value of the QS for our class is that its core mandate is to visualize and generate questions and insights about a topic that is of immense importance to most people – themselves. It also produces a wealth of data in a variety of forms. Therefore, designing this project around the QS movement makes perfect sense because it offers you the opportunity to be both the data and question provider, the data analyst, the vis designer, and the end user. This means you will be in the unique position of being capable of providing feedback and direction at all points along the data visualization/analysis life cycle.
Main motivation to select the below particular questions, is to self inspect the spending behaviour as it will help to lead a conscious decisions with the reflections from the visualizations.
The dataset is year long statement of my credit card, which is used for purchases on regular basis to pay my needs and wants. The dataset has transactions from 2019 year and required to clean and prepare data for visualization. To start with, when inspecting the data, there are total 6 variables and 507 observations. The 6 variables include:
For visualization convenience, transaction date variable is separated into month, day and year. In addition, with the “transaction date” values, day of the week is extracted using “Lubridate” library and added as a column to the dataset along with month, day and year variables. At this point the dataset has 10 variables.
The type of transaction includes, returns, spent, payments. Since I’m interested to know about my spending behavior, observations(rows) associated with payments and returns are removed. Finally, all the month numbers are converted to month names for better labelling during visualization.
The dataset after preprocessing has 468 variables and 10 observations
[1] 507 6
day_week
Sun Mon Tue Wed Thu Fri Sat
87 90 68 60 70 78 54
Transaction.Date Post.Date Description
2 12/30/2019 12/31/2019 FILA OUTLET #1541
3 12/30/2019 12/31/2019 FILA OUTLET #1541
4 12/31/2019 12/31/2019 ROB WILEY PC
6 12/29/2019 12/30/2019 SIP SAAM THAI RESTAURANT
7 12/28/2019 12/30/2019 MAIN EVENT - AUSTIN NO
8 12/29/2019 12/30/2019 WHOLEFDS DOM # 10316
Category Type Amount Month Day Year day_week
2 Shopping Sale 21.63 Dec 30 2019 Mon
3 Shopping Sale 21.63 Dec 30 2019 Mon
4 Professional Services Sale 150.00 Dec 31 2019 Tue
6 Food & Drink Sale 24.27 Dec 29 2019 Sun
7 Entertainment Sale 26.24 Dec 28 2019 Sat
8 Groceries Sale 4.70 Dec 29 2019 Sun
[1] 468 10
A visualization of spending by week of day and also by monthly, will provide the basic overview of spending pattern within different days of the week as well as by month. By vizualizing through bar chart, I desired to get a cumulative amount spent in two time dimensions. Interestingly, from the bar chart we can observe that highest cumulative amount spent is on Monday followed by Friday and Sunday & least was on a Tuesday. In the monthly spedning report, most spending was on months of December followed by July and November and the least were March and May.
Here, we’re visualzing spending pattern by category with a horizontal bar chart, as it would be easier to perceive all the different categories lined up vertical fashion. We see that the top most categories of spending are shopping, travel and food & drink. Least spent are on education, professional services and home. I would like to makefurther visualization analysis on the top three spending categories in the next charts.
This is such an insight to have a view by category. As it certainly gives me an idea about my lifestyle that I’m having. Personally, at this point, I would rather spend much on education and groceries and lead a healthy lifestyle, but if we compare groceries and food & drink, it is more than spent double than former. This speaks a lot about my eating habits, eating outside than homecooking and this visualization helps me to lead a healthy lifestyle atleast from now.
In both the months of July and December, the amount spent on “food & drink” is more than double than “groceries.” Additionally, July month includes “Automotive” category amount spent significantly. Altough the month was top months to spent money, the “automotive” category is not a consistent. If there is a preparedness to control it, then it would be much more less in spending.
There is no specific pattern can be observed from stacked plot, but interesting points are there are months where, shopping and travel are stacked over entire month among the top 3.
Reason to choose “geom_point()” here is that I’m trying to compare continuous variable against categorical variables, Month and Day of the Week. The type of charts used for visualization will create an interactive environment to remove/select categories and focus on the one’s I want to.
In the “Top Categories Spent by day of the week”, the food & drink category is densed on weekdays along with saturday. Which made me realize that I need to manage my time effectively during workdays and avoid eating outside.
Interesting observation is, though the “Food & Drink” is among the top three spent categories, there are three months I could avoid spending outside. And in the months that I’ve spent eating outside, scatter points are much densed, which reflects my resistance to not to eat outside is very low.
This visualization helped to understand the spending pattern at a certain extent. And also speaks about the spending time effectively that would foster my desire to spend wisely, lead a healthy life.
This approach of visualizing is developed, by keeping my desire of conscious lifestyle in center and how accurate am I acting to achieve it. Financial education is essential and analyzing it will certainly bring out reflections that will make us think and question and motivate to improve ourselves.
---
title: "ANLY 512 Final Project"
author: "Venkata Pulipati"
Date: "02/19/2020"
output:
flexdashboard::flex_dashboard:
theme: sandstone
social: menu
source_code: embed
orientation: rows
vertical_layout: fill
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(flexdashboard)
library(knitr)
library(ggplot2)
library(tidyverse)
library(epitools)
library(dplyr)
library(plotly)
library(base)
library(tidyr)
library(lubridate)
library(dygraphs)
setwd("/Users/venkatasarath/Downloads/HU/anly_512")
```
Background
===========
### **Introduction**
The Quantified Self (QS) is a movement motivated to leverage the synergy of wearables, analytics, and “Big Data”. This movement exploits the ease and convenience of data acquisition through the internet of things (IoT) to feed the growing obsession of personal informatics and quotidian data. The website http://quantifiedself.com/ is a great place to start to understand more about the QS movement.
The value of the QS for our class is that its core mandate is to visualize and generate questions and insights about a topic that is of immense importance to most people – themselves. It also produces a wealth of data in a variety of forms. Therefore, designing this project around the QS movement makes perfect sense because it offers you the opportunity to be both the data and question provider, the data analyst, the vis designer, and the end user. This means you will be in the unique position of being capable of providing feedback and direction at all points along the data visualization/analysis life cycle.
Row {data-height=400}
----------------------
### **Objective**
Main motivation to select the below particular questions, is to self inspect the spending behaviour as it will help to lead a conscious decisions with the reflections from the visualizations.
- What is the spending trend by day of week & month?
- What is the spending trend by category?
- In the top 2 most spending months, what is the spending in each category?
- Among the top 3 spending categories, what is the share of each by category?
- Explore money spent on food & drinks (restaurants) on day of every week?
Data Preparation
================
The dataset is year long statement of my credit card, which is used for purchases on regular basis to pay my needs and wants. The dataset has transactions from 2019 year and required to clean and prepare data for visualization. To start with, when inspecting the data, there are total 6 variables and 507 observations. The 6 variables include:
- Transaction.Date
- Post.Date
- Decription
- Category
- Type
- Amount
For visualization convenience, transaction date variable is separated into month, day and year. In addition, with the "transaction date" values, day of the week is extracted using "Lubridate" library and added as a column to the dataset along with month, day and year variables. At this point the dataset has 10 variables.
The type of transaction includes, returns, spent, payments. Since I'm interested to know about my spending behavior, observations(rows) associated with payments and returns are removed. Finally, all the month numbers are converted to month names for better labelling during visualization.
The dataset after preprocessing has 468 variables and 10 observations
```{r}
data <- read.csv("./Chase3165_Activity20190101_20191231_20200218.csv")
dim(data)
df <- data.frame(date = data$Transaction.Date, stringsAsFactors = FALSE)
df <- df %>%
separate(date, sep="/", into = c("Month", "Day", "Year"))
day_week <- wday(as.Date(data$Transaction.Date,'%m/%d/%Y'), label=TRUE)
table(day_week)
fin_data <- cbind(data,df,day_week)
df_fin_data <- as.data.frame(fin_data)
df_fin_data <- df_fin_data[!(df_fin_data$Type=="Payment" | df_fin_data$Type=="Return"),]
df_fin_data$Amount <- abs(df_fin_data$Amount)
df_fin_data$Month <- as.character(c("01" = "Jan", "02" = "Feb","03" = "Mar", "04" = "Apr","05" = "May", "06" = "Jun","07" = "Jul", "08" = "Aug","09" = "Sep", "10" = "Oct","11" = "Nov", "12" = "Dec" )[df_fin_data$Month])
head(df_fin_data)
dim(df_fin_data)
```
Visualizations P1{.storyboard}
=============================
Spending Pattern by Category & time? {.tabset data-width=1000}
-----------------------------------
### Spending Pattern by Day of the week
```{r}
SpentByWeek <- ggplot(df_fin_data, aes(x = day_week, y=Amount)) +
geom_bar(stat = "identity", fill = "gold", )+
labs(title = "Spending by day of week", x = "Day", y = "Amount spent") +
theme_dark()
SpentByWeek
```
### Spending Pattern by Month
```{r}
SpentByMonth <- ggplot(df_fin_data, aes(x = Month, y=Amount)) +
geom_bar(stat = "identity", fill = "gold", )+
labs(title = "Spending by Month", x = "Month", y = "Amount spent") +
theme_dark()
SpentByMonth
```
### Spending Pattern by Category
```{r}
library(ggplot2)
SpendByCat<- ggplot(df_fin_data, aes(x = Category, y=Amount, fill=Month)) +
geom_bar(stat = "identity", fill = "Gold")+
labs(title = "Spending per Category", x = "Category", y = "Amount") +
coord_flip()+
theme_dark()
SpendByCat
```
### Insights
A visualization of spending by week of day and also by monthly, will provide the basic overview of spending pattern within different days of the week as well as by month. By vizualizing through bar chart, I desired to get a cumulative amount spent in two time dimensions.
Interestingly, from the bar chart we can observe that highest cumulative amount spent is on Monday followed by Friday and Sunday & least was on a Tuesday.
In the monthly spedning report, most spending was on months of December followed by July and November and the least were March and May.
Here, we're visualzing spending pattern by category with a horizontal bar chart, as it would be easier to perceive all the different categories lined up vertical fashion. We see that the top most categories of spending are shopping, travel and food & drink. Least spent are on education, professional services and home. I would like to makefurther visualization analysis on the top three spending categories in the next charts.
This is such an insight to have a view by category. As it certainly gives me an idea about my lifestyle that I'm having. Personally, at this point, I would rather spend much on education and groceries and lead a healthy lifestyle, but if we compare groceries and food & drink, it is more than spent double than former. This speaks a lot about my eating habits, eating outside than homecooking and this visualization helps me to lead a healthy lifestyle atleast from now.
Visualizations P2{.storyboard}
==============================
Highest spending months and highest spent categories
-----------------------------------------------------
### Top 1st month spending category
```{r}
data2<- subset(df_fin_data, Month=='Dec')
dec_spending<- ggplot(data2, aes(x = Category, y=Amount, fill=Description)) +
geom_bar(stat = "identity", fill = "gold")+
labs(title = "Spending by Category in December", x = "Category", y = "Amount") +
coord_flip()+
theme_dark()
dec_spending
```
### Top 2nd most month spending category
```{r}
data3<- subset(df_fin_data, Month=='Jul' )
jul_spending<- ggplot(data3, aes(x = Category, y=Amount, fill=Category)) +
geom_bar(stat = "identity", fill = "gold")+
labs(title = "Spending by Category in July", x = "Category", y = "Amount") +
coord_flip()+
theme_dark()
jul_spending
```
### Top three categories monthly spent
```{r}
ggplot(df_fin_data[df_fin_data$Category=='Shopping' | df_fin_data$Category=='Food & Drink' | df_fin_data$Category=='Travel',],aes(x=Month,y=Amount,fill=Category)) + geom_bar(stat="identity",width=0.7,position="fill")+labs(ylab="Month",title ="Stacked Plot of amount spent on top 3 categories") + coord_flip()
```
### Insights
In both the months of July and December, the amount spent on "food & drink" is more than double than "groceries." Additionally, July month includes "Automotive" category amount spent significantly. Altough the month was top months to spent money, the "automotive" category is not a consistent. If there is a preparedness to control it, then it would be much more less in spending.
There is no specific pattern can be observed from stacked plot, but interesting points are there are months where, shopping and travel are stacked over entire month among the top 3.
Visualizations P3{.storyboard}
==================
### **Top Category by Day of the Week**
```{r}
tp_categories <- ggplot(df_fin_data, aes(x = day_week, y = Amount)) +
geom_point(aes(col=Category, size=Amount)) +
labs(title = "Top Categories Spent by day of the week", x = "Day", y = "Amount")
(tp_categories1 <- ggplotly(tp_categories))
```
### **Top Category by Month**
```{r}
purchases1 <- ggplot(df_fin_data, aes(x =Month, y = Amount)) +
geom_point(aes(col=Category, size=Amount)) +
labs(title = "Top Categories spent by month", x = "Date", y = "Amount")
(ggpurchases1 <- ggplotly(purchases1))
```
### **Insights**
Reason to choose "geom_point()" here is that I'm trying to compare continuous variable against categorical variables, Month and Day of the Week. The type of charts used for visualization will create an interactive environment to remove/select categories and focus on the one's I want to.
In the "Top Categories Spent by day of the week", the food & drink category is densed on weekdays along with saturday. Which made me realize that I need to manage my time effectively during workdays and avoid eating outside.
Interesting observation is, though the "Food & Drink" is among the top three spent categories, there are three months I could avoid spending outside. And in the months that I've spent eating outside, scatter points are much densed, which reflects my resistance to not to eat outside is very low.
Conclusions
==================
This visualization helped to understand the spending pattern at a certain extent. And also speaks about the spending time effectively that would foster my desire to spend wisely, lead a healthy life.
This approach of visualizing is developed, by keeping my desire of conscious lifestyle in center and how accurate am I acting to achieve it. Financial education is essential and analyzing it will certainly bring out reflections that will make us think and question and motivate to improve ourselves.