The Quantified Self (QS) is a movement motivated to leverage the synergy of wearables, analytics, and “Big Data”. Designing this project around the QS movement makes perfect sense because it offers the opportunity to be both the data and question provider, the data analyst, the vis designer, and the end user.In this group project, we used a collection of a years’ worth of data on spending and payments through discover credit card.
The goal of the project is to collect, analyze and visualize the data using the tools and methods covered in this class. We have attempted to answer the following questions based on the data:
What’s the total spending per month in 2022?
In the second quarter, how’s the spending in each category?
What’s the total spending per category in 2022?
For Merchandise, how does the spending vary from month to month?
For Services, how does the spending vary from month to month?
| Trans..date | Post.date | Month | Year | Description | Amount | Category |
|---|---|---|---|---|---|---|
| 1/1/22 | 1/1/22 | 1 | 2022 | COMPASSION INTERNATION 800-336-7676 CO | 76.00 | Services |
| 1/1/22 | 1/1/22 | 1 | 2022 | HP *INSTANT INK 855-785-2777 CAT 54DOVMNS221231225746 | 4.27 | Merchandise |
| 1/1/22 | 1/1/22 | 1 | 2022 | VESTA *AT&T PREPAID 866-608-3007 ORY6ST5N9JP02 | 44.40 | Services |
| 1/5/22 | 1/5/22 | 1 | 2022 | APPLE.COM/BILL 866-712-7753 CAAPPLE PAY ENDING IN 7474MXG1KXV3HXA0 | 20.38 | Merchandise |
| 1/7/22 | 1/7/22 | 1 | 2022 | APPLE.COM/BILL 866-712-7753 CAAPPLE PAY ENDING IN 7474MXG1LFY2XFA0 | 18.35 | Merchandise |
| 1/9/22 | 1/9/22 | 1 | 2022 | APPLE.COM/BILL 111-111-1111 CAAPPLE PAY ENDING IN 7474 | 7.99 | Merchandise |
| 1/9/22 | 1/9/22 | 1 | 2022 | APPLE.COM/BILL 866-712-7753 CAAPPLE PAY ENDING IN 7474MXG1LK6Q68A0 | 10.99 | Merchandise |
| 1/14/22 | 1/14/22 | 1 | 2022 | LATE FEE | 35.00 | Fees |
| 1/15/22 | 1/15/22 | 1 | 2022 | APPLE.COM/BILL 866-712-7753 CAAPPLE PAY ENDING IN 7474MXG1LWT266A0 | 32.39 | Merchandise |
| 1/18/22 | 1/18/22 | 1 | 2022 | APPLE.COM/BILL 866-712-7753 CAAPPLE PAY ENDING IN 7474MXG1M06N3GA0 | 60.47 | Merchandise |
| 1/18/22 | 1/18/22 | 1 | 2022 | INTERNET PAYMENT - THANK YOU | -577.76 | Payments and Credits |
| 1/24/22 | 1/24/22 | 1 | 2022 | APPLE.COM/BILL 111-111-1111 CAAPPLE PAY ENDING IN 7474 | 6.99 | Merchandise |
| 1/25/22 | 1/25/22 | 1 | 2022 | NETFLIX.COM NETFLIX.COM CA9220832671303507 | 16.73 | Services |
| 1/31/22 | 1/31/22 | 1 | 2022 | VESTA *AT&T PREPAID 866-608-3007 ORYGNA2N8QZG2 | 44.40 | Services |
| 2/1/22 | 2/1/22 | 2 | 2022 | COMPASSION INTERNATION 800-336-7676 CO | 76.00 | Services |
There are 7 variables listed:
Tran. Date: Transaction Date
Post Date
Month
Description: Detailed Description of purchases
Amount: The total amount spent
Category
Year: 2022
Since Month is categorical and Amount is continuous, a bar chart was used to summarize the data. We summarized the total amount spent in each month and used the bar chart to displays the numbers.Total spending was the maximum in the month of May, followed by April and August.There are negative values since this also includes payment data.
We looked at each month’s spent category for Q2. From the plot, we can see that top 3 categories I spent on in April was Services. This makes sense since I was moving houses and I made a large payment through my credit card to the moving company. Most of the spending in May and June was in the Travel/entertainment bucket.
The name of each category is long and can not be displayed at the bottom, so a rotated dot plot was created. Here We see the category I spent most was Merchandise, followed by Services and Travel/Entertainment.
Maximum spending on Merchandise is in April, May and August and this is also closely relflected in the Overall Spending data. We can see that merchandise spending is actually driving overall data up where the same months show the most expenditure for all categories.
Maximum expenditure on Services has been in March/April (the months of the house move, tax season, and renewal of renters insurance!)
Based on the visual analytics, following conclusions can be drawn:
I spent most money on merchandise and services.
I saw a big jump in expenditure in the months of March and April when I was moving houses.
For the rest of the year there wasnt a significant change in spending habits.
This doesn’t give a complete picture of my spending habits though, since I have another credit card which I primarily use for Restaurants to collect points. Which is why when we saw total spending by category restuarants didnt feature among the top categories but I personally know it is my top expenditure every month!
So this a good visualization to understand how I spent money on my Discover card. However, in order to have a holistic view of spending, one would need to collate data from all credit/debit cards.
---
title: "Project"
author: "Samkit Dhanki and Sonakshi Khuntia"
date: "`r Sys.Date()`"
output:
flexdashboard::flex_dashboard:
storyboard: true
social: menu
source: embed
orientation: columns
vertical_layout: fill
editor_options:
markdown:
wrap: sentence
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(flexdashboard)
library(knitr)
library(ggplot2)
library(tidyverse)
library(readxl)
library(dplyr)
library(xts)
library(zoo)
library(lubridate)
```
### A project for ANLY512: Data Visualization
------------------------------------------------------------------------
The Quantified Self (QS) is a movement motivated to leverage the synergy of wearables, analytics, and "Big Data".
Designing this project around the QS movement makes perfect sense because it offers the opportunity to be both the data and question provider, the data analyst, the vis designer, and the end user.In this group project, we used a collection of a years' worth of data on spending and payments through discover credit card.
The goal of the project is to collect, analyze and visualize the data using the tools and methods covered in this class.
We have attempted to answer the following questions based on the data:
1) What's the total spending per month in 2022?
2) In the second quarter, how's the spending in each category?
3) What's the total spending per category in 2022?
4) For Merchandise, how does the spending vary from month to month?
5) For Services, how does the spending vary from month to month?
------------------------------------------------------------------------
### Data preparation
```{r}
data <- read.csv("Spendingdata.csv")
kable(data[1:15,], caption="Sample raw data in 2022")
```
------------------------------------------------------------------------
- There are 7 variables listed:
- Tran.
Date: Transaction Date
- Post Date
- Month
- Description: Detailed Description of purchases
- Amount: The total amount spent
- Category
- Year: 2022
### Q1: What's the total spending per month in 2022?
```{r}
# plot
library(ggplot2)
fill <- "gold1"
line <- "goldenrod2"
p<- ggplot(data, aes(x = Month, y=Amount)) +
geom_bar(stat = "identity", fill = "red")+
scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) +
labs(title = "Monthly Spending 2022", x = "Month", y = "Amount $") +
theme_bw()
p
```
------------------------------------------------------------------------
Since Month is categorical and Amount is continuous, a bar chart was used to summarize the data. We summarized the total amount spent in each month and used the bar chart to displays the numbers.Total spending was the maximum in the month of May, followed by April and August.There are negative values since this also includes payment data.
### Q2: In the second quarter, how's the spending in each category?
```{r}
library(ggplot2)
data2 <- subset(data, Month=='4')
p1 <- ggplot(data2, aes(Amount,Category ))
p2 <-p1 + geom_point(aes(colour = factor(Category)))+
labs(title = "Transactions in Apr.2022", x = "Amount $", y = "Category") +
theme_bw()
p2
data3 <- subset(data, Month=='5')
p3 <- ggplot(data3, aes(Amount,Category ))
p4 <-p3 + geom_point(aes(colour = factor(Category)))+
labs(title = "Transactions in May.2022", x = "Amount $", y = "Category") +
theme_bw()
p4
data4 <- subset(data, Month=='6')
p5 <- ggplot(data3, aes(Amount,Category ))
p6 <-p5 + geom_point(aes(colour = factor(Category)))+
labs(title = "Transactions in June.2022", x = "Amount $", y = "Category") +
theme_bw()
p6
```
------------------------------------------------------------------------
We looked at each month's spent category for Q2. From the plot, we can see that top 3 categories I spent on in April was Services. This makes sense since I was moving houses and I made a large payment through my credit card to the moving company. Most of the spending in May and June was in the Travel/entertainment bucket.
### Q3: What's the total spending per category in 2022?
```{r}
agg2 <- aggregate(data$Amount, by=list(Category=data$Category), FUN=sum)
order_agg2 <- agg2[order(agg2$x),]
dotchart(order_agg2$x, labels=order_agg2$Category, cex=0.7, color="Blue", main="Total amount per category in 2022")
```
------------------------------------------------------------------------
The name of each category is long and can not be displayed at the bottom, so a rotated dot plot was created.
Here We see the category I spent most was Merchandise, followed by Services and Travel/Entertainment.
### Q4: For Merchandise, how's the spending vary from month to month?
```{r}
data1<- subset(data, Category=='Merchandise')
p<- ggplot(data1, aes(x = Month, y=Amount)) +
geom_bar(stat = "identity", fill = "Orange")+
scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) +
labs(title = "Merchandise spending per month", x = "Month", y = "Amount") +
theme_minimal()
p
```
------------------------------------------------------------------------
Maximum spending on Merchandise is in April, May and August and this is also closely relflected in the Overall Spending data. We can see that merchandise spending is actually driving overall data up where the same months show the most expenditure for all categories.
### Q5: For Services, how's the spending vary from month to month?
```{r}
data1<- subset(data, Category=='Services')
Pic<- ggplot(data1, aes(x = Month, y=Amount)) +
geom_bar(stat = "identity", fill = "Blue")+
scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) +
labs(title = "Services spending per month", x = "Month", y = "Amount") +
theme_minimal()
Pic
```
------------------------------------------------------------------------
Maximum expenditure on Services has been in March/April (the months of the house move, tax season, and renewal of renters insurance!)
### Conclusion
Based on the visual analytics, following conclusions can be drawn:
1. I spent most money on merchandise and services.
2. I saw a big jump in expenditure in the months of March and April when I was moving houses.
3. For the rest of the year there wasnt a significant change in spending habits.
4. This doesn't give a complete picture of my spending habits though, since I have another credit card which I primarily use for Restaurants to collect points. Which is why when we saw total spending by category restuarants didnt feature among the top categories but I personally know it is my top expenditure every month!
5. So this a good visualization to understand how I spent money on my Discover card. However, in order to have a holistic view of spending, one would need to collate data from all credit/debit cards.