Image Credit: Google
A project for ANLY512: Data Visualization
The Quantified Self movement grew from the popularity and growth of the internet of things, the mass collection of personal information, and mobile technologies (primarily wearable computing). This final class project uses a collection of last 24 months of data on spending and payments captured by discover card.
The goal of the project is to collect, analyze and visualize the data using the tools and methods covered in class. Additionally, using the data-driven approach, I will create a summary which answers the following questions based on the data collected 1) What is the total spending by month in 2017? 2) Which category costed the most in 2017? 3) For Merchandise, what is the spending by month in 2017? 4) In march and November, what is the spending in each category? 5) Compared to 2017, what is the difference?
| Trans..Date | Post.Date | Description | Amount | Category | month | year |
|---|---|---|---|---|---|---|
| 09/04/2016 | 09/04/2016 | DIRECTPAY MINIMUM PAYMENTSEE DETAILS OF YOUR NEXT DIRECTPAY BELOW | -35.00 | Payments and Credits | 9 | 2016 |
| 10/04/2016 | 10/04/2016 | DIRECTPAY MINIMUM PAYMENTSEE DETAILS OF YOUR NEXT DIRECTPAY BELOW | -35.00 | Payments and Credits | 10 | 2016 |
| 11/04/2016 | 11/04/2016 | DIRECTPAY MINIMUM PAYMENTSEE DETAILS OF YOUR NEXT DIRECTPAY BELOW | -35.00 | Payments and Credits | 11 | 2016 |
| 12/04/2016 | 12/04/2016 | DIRECTPAY MINIMUM PAYMENTSEE DETAILS OF YOUR NEXT DIRECTPAY BELOW | -35.00 | Payments and Credits | 12 | 2016 |
| 01/03/2017 | 01/03/2017 | CASHBACK BONUS REDEMPTION PYMT/STMT CRDT | -4.02 | Awards and Rebate Credits | 1 | 2017 |
| 01/04/2017 | 01/04/2017 | DIRECTPAY MINIMUM PAYMENTSEE DETAILS OF YOUR NEXT DIRECTPAY BELOW | -30.98 | Payments and Credits | 1 | 2017 |
| 02/04/2017 | 02/04/2017 | DIRECTPAY MINIMUM PAYMENTSEE DETAILS OF YOUR NEXT DIRECTPAY BELOW | -35.00 | Payments and Credits | 2 | 2017 |
| 02/05/2017 | 02/05/2017 | INTEREST CHARGE ON PURCHASES | 11.82 | Interest | 2 | 2017 |
| 02/08/2017 | 02/04/2017 | ACH PAYMENT ADJUSTMENT | 35.00 | Payments and Credits | 2 | 2017 |
| 02/08/2017 | 02/04/2017 | ADJUSTMENT - INTEREST CHARGE | -0.04 | Payments and Credits | 2 | 2017 |
| 02/08/2017 | 02/04/2017 | DPP PAYMENT REPRESENTMENT - THANK YOU | -35.00 | Payments and Credits | 2 | 2017 |
| 02/08/2017 | 02/08/2017 | RETURNED CHECK CHARGE | 27.00 | Fees | 2 | 2017 |
| 03/04/2017 | 03/04/2017 | LATE FEE | 27.00 | Fees | 3 | 2017 |
| 03/05/2017 | 03/05/2017 | INTEREST CHARGE ON PURCHASES | 10.73 | Interest | 3 | 2017 |
| 03/08/2017 | 03/08/2017 | INTERNET PAYMENT - THANK YOU | -752.32 | Payments and Credits | 3 | 2017 |
There are 5 variables we are interested in:
Year
Since Month is categorical and Amount is continous, and I aimed to compare the total mounth in each month, bar chart was used to summarize the data. I summarized the total amount spent in each month and usd in bar chart to display the numbers. From the plot we could find that, average spending over month is around 800 dollars. Total spending in March and November are the most.
Due to the same reason, bar chart is used here. The name of each category is long and can not be displayed at the bottom, so a rotated bar chart is created. Here We see the category I spend the most money is Travel/Entertainment, arount $1000, which is much higher than the other categories. The second category is either Supermarkets or merchandise, around $200.
We see that for most months, the spending on Merchandise was less than 50 dollars. However, in April, June and July, it is higher than the other months.
From the plot we can see that, still, the most spending is Travel/Entertainment. Besides, in JUne, the second spending is Merchandise. Travel/Entertainment and Merchandies spending is high because I moved to a new place where I needed a rental car and joined a new job where formal clothing is mandatory, so I had to purchase office dress.
Since the purpose was to compare the spending between during last 24 months, a grouped bar chart was used. From the plot we can see that, overall spending increased in 2018, The main reason is that I got a job after long time.
---
title: "ANLY 512 Final Project"
author: "Raavi Anvesh"
date: "August, 2018"
output:
flexdashboard::flex_dashboard:
storyboard: true
social: menu
source: embed
orientation: columns
vertical_layout: fill
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(flexdashboard)
library(knitr)
library(ggplot2)
library(tidyverse)
library(readxl)
library(dplyr)
library(xts)
library(zoo)
library(lubridate)
setwd("C:\\Users\\raavi\\Dropbox\\Harrisburg University\\Semester 2\\Data Visualization\\Project")
```
###Overview of the Quantified Self movement
Image Credit: Google

***
A project for ANLY512: Data Visualization
The Quantified Self movement grew from the popularity and growth of the internet of things, the mass collection of personal information, and mobile technologies (primarily wearable computing). This final class project uses a collection of last 24 months of data on spending and payments captured by discover card.
The goal of the project is to collect, analyze and visualize the data using the tools and methods covered in class. Additionally, using the data-driven approach, I will create a summary which answers the following questions based on the data collected
1) What is the total spending by month in 2017?
2) Which category costed the most in 2017?
3) For Merchandise, what is the spending by month in 2017?
4) In march and November, what is the spending in each category?
5) Compared to 2017, what is the difference?
### Data preparation
```{r}
data <- read.csv("transactions.csv")
data$month <- month(as.POSIXlt(data$Post.Date, format="%m/%d/%Y",origin='01/01/1970'))
data$year <- year(as.POSIXlt(data$Post.Date, format="%m/%d/%Y", origin='01/01/1970'))
kable(data[1:15,], caption="Sample raw data")
```
***
- There are 5 variables we are interested in:
- Tran. Date: Transaction Date
- Post Date
- Description: Detailed Description of purchases
- Amount: The total amount spent
- Category
- Month
- Year
###Q1: What is the total spending by month in last 24 months?
```{r}
# plot
fill <- "gold1"
line <- "goldenrod2"
p<- ggplot(data, aes(x = month, y=Amount)) +
geom_bar(stat = "identity", fill = "Blue")+
scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) +
labs(title = "Spending per Month", x = "Month", y = "Amount") +
theme_minimal()
p
```
***
Since Month is categorical and Amount is continous, and I aimed to compare the total mounth in each month, bar chart was used to summarize the data.
I summarized the total amount spent in each month and usd in bar chart to display the numbers.
From the plot we could find that, average spending over month is around 800 dollars. Total spending in March and November are the most.
###Q2: Which category costed the most in last 24 months?
```{r}
p<- ggplot(data, aes(x = Category, y=Amount, fill=Category)) +
geom_bar(stat = "identity", fill = "Blue")+
labs(title = "Spending per Category", x = "Category", y = "Amount") +
coord_flip()
p
```
***
Due to the same reason, bar chart is used here. The name of each category is long and can not be displayed at the bottom, so a rotated bar chart is created.
Here We see the category I spend the most money is Travel/Entertainment, arount $1000, which is much higher than the other categories. The second category is either Supermarkets or merchandise, around $200.
###Q3: For Merchandise, what is the spending by month in last 24 months?
```{r}
data1<- subset(data, Category=='Merchandise')
p<- ggplot(data1, aes(x = month, y=Amount)) +
geom_bar(stat = "identity", fill = "Blue")+
scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) +
labs(title = "Spending per Month", x = "Month", y = "Amount") +
theme_minimal()
p
```
***
We see that for most months, the spending on Merchandise was less than 50 dollars. However, in April, June and July, it is higher than the other months.
###Q4: In march and November, what is the spending in each category?
```{r}
data2<- subset(data, month=='4')
p1<- ggplot(data2, aes(x = Category, y=Amount, fill=Category)) +
geom_bar(stat = "identity", fill = "Blue")+
labs(title = "Spending per Category", x = "Category", y = "Amount") +
coord_flip()
p1
```
```{r}
data3<- subset(data, month=='6' )
p2<- ggplot(data3, aes(x = Category, y=Amount, fill=Category)) +
geom_bar(stat = "identity", fill = "DarkGreen")+
labs(title = "Spending per Category", x = "Category", y = "Amount") +
coord_flip()
p2
```
***
From the plot we can see that, still, the most spending is Travel/Entertainment. Besides, in JUne, the second spending is Merchandise.
Travel/Entertainment and Merchandies spending is high because I moved to a new place where I needed a rental car and joined a new job where formal clothing is mandatory, so I had to purchase office dress.
###Q5: Compared to 2017, what is the difference
```{r}
comp<-data.frame(data)
comp$Year <- as.factor(comp$year)
ggplot(data=comp, aes(x=Category, y=Amount)) +
geom_bar(aes(fill = Year), position = "dodge", stat="identity") +
labs(title = "Spending per Category", x = "Category", y = "Amount") +
theme(legend.position = "Right") +
scale_fill_discrete(name="Year") +
theme(legend.position = "bottom") +
coord_flip()
```
***
Since the purpose was to compare the spending between during last 24 months, a grouped bar chart was used.
From the plot we can see that, overall spending increased in 2018, The main reason is that I got a job after long time.