Background

Introduction

The Quantified Self (QS) is a movement motivated to leverage the synergy of wearables, analytics, and “Big Data”. This movement exploits the ease and convenience of data acquisition through the internet of things (IoT) to feed the growing obsession of personal informatics and quotidian data. The website http://quantifiedself.com/ is a great place to start to understand more about the QS movement.

The value of the QS for our class is that its core mandate is to visualize and generate questions and insights about a topic that is of immense importance to most people – themselves. It also produces a wealth of data in a variety of forms. Therefore, designing this project around the QS movement makes perfect sense because it offers you the opportunity to be both the data and question provider, the data analyst, the vis designer, and the end user. This means you will be in the unique position of being capable of providing feedback and direction at all points along the data visualization/analysis life cycle.

Row

Objective

Main motivation to select the below particular questions, is to self inspect the spending behaviour as it will help to lead a conscious decisions with the reflections from the visualizations.

What is the spending trend by day of week & month?
What is the spending trend by category?
In the top 2 most spending months, what is the spending in each category?
Among the top 3 spending categories, what is the share of each by category?
Explore money spent on food & drinks (restaurants) on day of every week?

Data Preparation

The dataset is year long statement of my credit card, which is used for purchases on regular basis to pay my needs and wants. The dataset has transactions from 2019 year and required to clean and prepare data for visualization. To start with, when inspecting the data, there are total 6 variables and 507 observations. The 6 variables include:

Transaction.Date
Post.Date
Decription
Category
Type
Amount

For visualization convenience, transaction date variable is separated into month, day and year. In addition, with the “transaction date” values, day of the week is extracted using “Lubridate” library and added as a column to the dataset along with month, day and year variables. At this point the dataset has 10 variables.

The type of transaction includes, returns, spent, payments. Since I’m interested to know about my spending behavior, observations(rows) associated with payments and returns are removed. Finally, all the month numbers are converted to month names for better labelling during visualization.

The dataset after preprocessing has 468 variables and 10 observations

[1] 507   6

day_week
Sun Mon Tue Wed Thu Fri Sat 
 87  90  68  60  70  78  54

  Transaction.Date  Post.Date              Description
2       12/30/2019 12/31/2019        FILA OUTLET #1541
3       12/30/2019 12/31/2019        FILA OUTLET #1541
4       12/31/2019 12/31/2019             ROB WILEY PC
6       12/29/2019 12/30/2019 SIP SAAM THAI RESTAURANT
7       12/28/2019 12/30/2019   MAIN EVENT - AUSTIN NO
8       12/29/2019 12/30/2019     WHOLEFDS DOM # 10316
               Category Type Amount Month Day Year day_week
2              Shopping Sale  21.63   Dec  30 2019      Mon
3              Shopping Sale  21.63   Dec  30 2019      Mon
4 Professional Services Sale 150.00   Dec  31 2019      Tue
6          Food & Drink Sale  24.27   Dec  29 2019      Sun
7         Entertainment Sale  26.24   Dec  28 2019      Sat
8             Groceries Sale   4.70   Dec  29 2019      Sun

[1] 468  10

Visualizations P1

Spending Pattern by Category & time?

Spending Pattern by Day of the week

Spending Pattern by Month

Spending Pattern by Category

Insights

A visualization of spending by week of day and also by monthly, will provide the basic overview of spending pattern within different days of the week as well as by month. By vizualizing through bar chart, I desired to get a cumulative amount spent in two time dimensions. Interestingly, from the bar chart we can observe that highest cumulative amount spent is on Monday followed by Friday and Sunday & least was on a Tuesday. In the monthly spedning report, most spending was on months of December followed by July and November and the least were March and May.

Here, we’re visualzing spending pattern by category with a horizontal bar chart, as it would be easier to perceive all the different categories lined up vertical fashion. We see that the top most categories of spending are shopping, travel and food & drink. Least spent are on education, professional services and home. I would like to makefurther visualization analysis on the top three spending categories in the next charts.

This is such an insight to have a view by category. As it certainly gives me an idea about my lifestyle that I’m having. Personally, at this point, I would rather spend much on education and groceries and lead a healthy lifestyle, but if we compare groceries and food & drink, it is more than spent double than former. This speaks a lot about my eating habits, eating outside than homecooking and this visualization helps me to lead a healthy lifestyle atleast from now.

Visualizations P2

Highest spending months and highest spent categories

Top 1st month spending category

Top 2nd most month spending category

Top three categories monthly spent

Insights

In both the months of July and December, the amount spent on “food & drink” is more than double than “groceries.” Additionally, July month includes “Automotive” category amount spent significantly. Altough the month was top months to spent money, the “automotive” category is not a consistent. If there is a preparedness to control it, then it would be much more less in spending.

There is no specific pattern can be observed from stacked plot, but interesting points are there are months where, shopping and travel are stacked over entire month among the top 3.

Visualizations P3

Top Category by Day of the Week

Top Category by Month

Insights

Reason to choose “geom_point()” here is that I’m trying to compare continuous variable against categorical variables, Month and Day of the Week. The type of charts used for visualization will create an interactive environment to remove/select categories and focus on the one’s I want to.

In the “Top Categories Spent by day of the week”, the food & drink category is densed on weekdays along with saturday. Which made me realize that I need to manage my time effectively during workdays and avoid eating outside.

Interesting observation is, though the “Food & Drink” is among the top three spent categories, there are three months I could avoid spending outside. And in the months that I’ve spent eating outside, scatter points are much densed, which reflects my resistance to not to eat outside is very low.

Conclusions

This visualization helped to understand the spending pattern at a certain extent. And also speaks about the spending time effectively that would foster my desire to spend wisely, lead a healthy life.

This approach of visualizing is developed, by keeping my desire of conscious lifestyle in center and how accurate am I acting to achieve it. Financial education is essential and analyzing it will certainly bring out reflections that will make us think and question and motivate to improve ourselves.

---
title: "ANLY 512 Final Project"
author: "Venkata Pulipati"
Date: "02/19/2020"
output: 
  flexdashboard::flex_dashboard:
    theme: sandstone
    social: menu
    source_code: embed
    orientation: rows
    vertical_layout: fill
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(flexdashboard)
library(knitr)
library(ggplot2)
library(tidyverse)
library(epitools)
library(dplyr)
library(plotly)
library(base)
library(tidyr) 
library(lubridate)
library(dygraphs)
setwd("/Users/venkatasarath/Downloads/HU/anly_512")
```

Background
===========

### **Introduction**

The Quantified Self (QS) is a movement motivated to leverage the synergy of wearables, analytics, and “Big Data”. This movement exploits the ease and convenience of data acquisition through the internet of things (IoT) to feed the growing obsession of personal informatics and quotidian data. The website http://quantifiedself.com/ is a great place to start to understand more about the QS movement.

The value of the QS for our class is that its core mandate is to visualize and generate questions and insights about a topic that is of immense importance to most people – themselves. It also produces a wealth of data in a variety of forms. Therefore, designing this project around the QS movement makes perfect sense because it offers you the opportunity to be both the data and question provider, the data analyst, the vis designer, and the end user. This means you will be in the unique position of being capable of providing feedback and direction at all points along the data visualization/analysis life cycle.

Row {data-height=400}
----------------------
### **Objective**

Main motivation to select the below particular questions, is to self inspect the spending behaviour as it will help to lead a conscious decisions with the reflections from the visualizations.

- What is the spending trend by day of week & month?
- What is the spending trend by category?
- In the top 2 most spending months, what is the spending in each category?
- Among the top 3 spending categories, what is the share of each by category?
- Explore money spent on food & drinks (restaurants) on day of every week? 

Data Preparation
================

The dataset is year long statement of my credit card, which is used for purchases on regular basis to pay my needs and wants. The dataset has transactions from 2019 year and required to clean and prepare data for visualization. To start with, when inspecting the data, there are total 6 variables and 507 observations. The 6 variables include: 

- Transaction.Date
- Post.Date
- Decription
- Category
- Type
- Amount

For visualization convenience, transaction date variable is separated into month, day and year. In addition, with the "transaction date" values, day of the week is extracted using "Lubridate" library and added as a column to the dataset along with month, day and year variables. At this point the dataset has 10 variables. 

The type of transaction includes, returns, spent, payments. Since I'm interested to know about my spending behavior, observations(rows) associated with payments and returns are removed. Finally, all the month numbers are converted to month names for better labelling during visualization. 

The dataset after preprocessing has 468 variables and 10 observations

```{r}
data <- read.csv("./Chase3165_Activity20190101_20191231_20200218.csv")
dim(data)
df <- data.frame(date = data$Transaction.Date, stringsAsFactors = FALSE)
df <- df %>%
  separate(date, sep="/", into = c("Month", "Day", "Year"))
day_week <- wday(as.Date(data$Transaction.Date,'%m/%d/%Y'), label=TRUE)
table(day_week)
fin_data <- cbind(data,df,day_week)
df_fin_data <- as.data.frame(fin_data)
df_fin_data <- df_fin_data[!(df_fin_data$Type=="Payment" | df_fin_data$Type=="Return"),]
df_fin_data$Amount <- abs(df_fin_data$Amount)
df_fin_data$Month <- as.character(c("01" = "Jan", "02" = "Feb","03" = "Mar", "04" = "Apr","05" = "May", "06" = "Jun","07" = "Jul", "08" = "Aug","09" = "Sep", "10" = "Oct","11" = "Nov", "12" = "Dec" )[df_fin_data$Month])
head(df_fin_data)
dim(df_fin_data)
```

Visualizations P1{.storyboard}
=============================
  
Spending Pattern by Category & time? {.tabset data-width=1000}
-----------------------------------

### Spending Pattern by Day of the week
```{r}
SpentByWeek <- ggplot(df_fin_data, aes(x = day_week, y=Amount)) + 
  geom_bar(stat = "identity", fill = "gold", )+
  labs(title = "Spending by day of week", x = "Day", y = "Amount spent") +
  theme_dark()
SpentByWeek
```

### Spending Pattern by Month
```{r}
SpentByMonth <- ggplot(df_fin_data, aes(x = Month, y=Amount)) + 
  geom_bar(stat = "identity", fill = "gold", )+
  labs(title = "Spending by Month", x = "Month", y = "Amount spent") +
  theme_dark()
SpentByMonth
```

### Spending Pattern by Category
```{r}
library(ggplot2)
SpendByCat<- ggplot(df_fin_data, aes(x = Category, y=Amount, fill=Month)) + 
  geom_bar(stat = "identity", fill = "Gold")+
  labs(title = "Spending per Category", x = "Category", y = "Amount") +
  coord_flip()+
  theme_dark()
SpendByCat
```

### Insights
A visualization of spending by week of day and also by monthly, will provide the basic overview of spending pattern within different days of the week as well as by month. By vizualizing through bar chart, I desired to get a cumulative amount spent in two time dimensions.
Interestingly, from the bar chart we can observe that highest cumulative amount spent is on Monday followed by Friday and Sunday & least was on a Tuesday.
In the monthly spedning report, most spending was on months of December followed by July and November and the least were March and May.

Here, we're visualzing spending pattern by category with a horizontal bar chart, as it would be easier to perceive all the different categories lined up vertical fashion. We see that the top most categories of spending are shopping, travel and food & drink. Least spent are on education, professional services and home. I would like to makefurther visualization analysis on the top three spending categories in the next charts.

This is such an insight to have a view by category. As it certainly gives me an idea about my lifestyle that I'm having. Personally, at this point, I would rather spend much on education and groceries and lead a healthy lifestyle, but if we compare groceries and food & drink, it is more than spent double than former. This speaks a lot about my eating habits, eating outside than homecooking and this visualization helps me to lead a healthy lifestyle atleast from now.  

Visualizations P2{.storyboard}
==============================

Highest spending months and highest spent categories
-----------------------------------------------------

### Top 1st month spending category
```{r}
data2<- subset(df_fin_data, Month=='Dec')
dec_spending<- ggplot(data2, aes(x = Category, y=Amount, fill=Description)) + 
  geom_bar(stat = "identity", fill = "gold")+
  labs(title = "Spending by Category in December", x = "Category", y = "Amount") +
  coord_flip()+
  theme_dark()
dec_spending
```

### Top 2nd most month spending category
```{r}
data3<- subset(df_fin_data, Month=='Jul' )
jul_spending<- ggplot(data3, aes(x = Category, y=Amount, fill=Category)) + 
  geom_bar(stat = "identity", fill = "gold")+
  labs(title = "Spending by Category in July", x = "Category", y = "Amount") +
  coord_flip()+
  theme_dark()
jul_spending
```

### Top three categories monthly spent
```{r}
ggplot(df_fin_data[df_fin_data$Category=='Shopping' | df_fin_data$Category=='Food & Drink' | df_fin_data$Category=='Travel',],aes(x=Month,y=Amount,fill=Category)) + geom_bar(stat="identity",width=0.7,position="fill")+labs(ylab="Month",title ="Stacked Plot of amount spent on top 3 categories") + coord_flip()
```

### Insights

In both the months of July and December, the amount spent on "food & drink" is more than double than "groceries." Additionally, July month includes "Automotive" category amount spent significantly. Altough the month was top months to spent money, the "automotive" category is not a consistent. If there is a preparedness to control it, then it would be much more less in spending. 

There is no specific pattern can be observed from stacked plot, but interesting points are there are months where, shopping and travel are stacked over entire month among the top 3.

Visualizations P3{.storyboard}
==================

### **Top Category by Day of the Week**
```{r}
tp_categories <- ggplot(df_fin_data, aes(x = day_week, y = Amount)) +
  geom_point(aes(col=Category, size=Amount)) +
  labs(title = "Top Categories Spent by day of the week", x = "Day", y = "Amount")
 
(tp_categories1 <- ggplotly(tp_categories))

```

### **Top Category by Month**
```{r}
purchases1 <- ggplot(df_fin_data, aes(x =Month, y = Amount)) +
  geom_point(aes(col=Category, size=Amount)) +
    labs(title = "Top Categories spent by month", x = "Date", y = "Amount")
 
(ggpurchases1 <- ggplotly(purchases1))

```

### **Insights**
Reason to choose "geom_point()" here is that I'm trying to compare continuous variable against categorical variables, Month and Day of the Week. The type of charts used for visualization will create an interactive environment to remove/select categories and focus on the one's I want to. 

In the "Top Categories Spent by day of the week", the food & drink category is densed on weekdays along with saturday. Which made me realize that I need to manage my time effectively during workdays and avoid eating outside.

Interesting observation is, though the "Food & Drink" is among the top three spent categories, there are three months I could avoid spending outside. And in the months that I've spent eating outside, scatter points are much densed, which reflects my resistance to not to eat outside is very low.

Conclusions
==================

This visualization helped to understand the spending pattern at a certain extent. And also speaks about the spending time effectively that would foster my desire to spend wisely, lead a healthy life.

This approach of visualizing is developed, by keeping my desire of conscious lifestyle in center and how accurate am I acting to achieve it. Financial education is essential and analyzing it will certainly bring out reflections that will make us think and question and motivate to improve ourselves.