quant_self
Final Project for ANLY-512: Data Visualization
Professor Alan Hitch, Ph.D.
The Quantified Self movement grew from the popularity and growth of the internet of things, the mass collection of personal information, and mobile technologies (primarily wearable computing).
This final class project uses a collection of One Year of data (Oct. 2020 - Sep. 2021) on spending and payments captured by my personal Amex card.
The goal of the project is to collect, analyze and visualize the data using the tools and methods covered in class. Additionally, using the data-driven approach, I will create a summary which answers the following questions based on the data collected.
What is the total spending by month in Oct. 2020 - Sep. 2021?
Which category costed the most in Oct. 2020 - Sep. 2021?
For Dining category, what is the spending by month in Oct. 2020 - Sep. 2021?
In February and July, what is the spending in each category?
Compared Q4 2020 and Q1 2021, what is the difference?
# A tibble: 12 x 7
`Tran. Date` `Post Date` Month Description Amount Category Year
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl>
1 10/01/2020 10/01/2020 10 AMAZON SHOP WITH ~ -53.3 Fees & Adjust~ 2020
2 10/01/2020 10/01/2020 10 AplPay APPLE.COM/~ 5.43 Merchandise &~ 2020
3 10/01/2020 10/01/2020 10 PRIME VIDEO*MK1LK~ 5.99 Merchandise &~ 2020
4 10/01/2020 10/01/2020 10 UBER EATS ~ 28.9 Restaurant-Re~ 2020
5 10/02/2020 10/02/2020 10 UBER EATS ~ 22.6 Restaurant-Re~ 2020
6 10/02/2020 10/02/2020 10 AMAZON MKTPLACE P~ 53.3 Merchandise &~ 2020
7 10/02/2020 10/02/2020 10 UBER EATS ~ 90.6 Restaurant-Re~ 2020
8 10/03/2020 10/03/2020 10 AMEX Streaming Su~ -5.99 Merchandise &~ 2020
9 10/03/2020 10/03/2020 10 UBER EATS ~ 20.0 Restaurant-Re~ 2020
10 10/03/2020 10/03/2020 10 TMOBILE WEB UPGRA~ 241. Communication~ 2020
11 10/04/2020 10/04/2020 10 AMAZON SHOP WITH ~ -40.6 Fees & Adjust~ 2020
12 10/04/2020 10/04/2020 10 AMEX Streaming Su~ -5.43 Merchandise &~ 2020
# A tibble: 12 x 7
`Tran. Date` `Post Date` Month Description Amount Category Year
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl>
1 09/19/2021 09/19/2021 9 "NYCT PAYGO ~ 2.75e0 Transportatio~ 2021
2 09/19/2021 09/19/2021 9 "ARABICA ~ 2.72e1 Restaurant-Ba~ 2021
3 09/21/2021 09/21/2021 9 "Amazon Marketpla~ 1.09e1 Merchandise &~ 2021
4 09/21/2021 09/21/2021 9 "UBER EATS ~ 8.78e1 Restaurant-Re~ 2021
5 09/22/2021 09/22/2021 9 "T-MOBILE.COM\\CU~ 2.06e2 Communication~ 2021
6 09/23/2021 09/23/2021 9 "UBER EATS ~ 4.00e1 Restaurant-Re~ 2021
7 09/25/2021 09/25/2021 9 "NYCT PAYGO ~ 2.75e0 Transportatio~ 2021
8 09/25/2021 09/25/2021 9 "Amazon Marketpla~ 1.20e1 Merchandise &~ 2021
9 09/25/2021 09/25/2021 9 "T-MOBILE ~ 4.35e1 Communication~ 2021
10 09/28/2021 09/28/2021 9 "Interest Charge ~ 5.11e1 Fees & Adjust~ 2021
11 09/28/2021 09/28/2021 9 "CLCKPAY25 PARK R~ 4.09e3 Business Serv~ 2021
12 09/30/2021 09/30/2021 9 "UBER EATS ~ 7.39e1 Restaurant-Re~ 2021
There are 7 variables we are interested in:
Tran. Date: Transaction Date
Post Date
Month
Description: Detailed Description of purchases
Amount: The total amount spent
Category
Year: 2020 or 2021
Since Month is categorical and Amount is continuous, and I aim to compare the total amount in each month, bar chart was used to summarize the data.
I summarized the total amount spent in each month and used bar chart to display the numbers.
From the plot we could find that average spending per month is around 3,800 dollars. June has the highest spending of ~ $5,800.00.
Due to the same reason, bar chart was used here. The name of each category is long and can not be displayed at the bottom, so a rotated bar chart was created.
Here We see the category I spend the most money is Restaurant, around $18,000 for the 12-month period between Oct. 2020 - 2021.
We see that for most months, the spending on Restaurant was more than 1,000 dollars. However, in February and July, it is almost 3 times higher than the other months.
I will perform an in-depth view of these two months, but the quick answer would be:
Feb. has Chinese New Year so traditionally we go out and celebrate.
July 28th is my girlfriend’s birthday so fancy dinners were in order. Spent $700+ at two different fine establishments.
Since the purpose was to compare the spending between Q4 2020 and Q1 2021, a grouped bar chart was used.
There was a Year-end holiday trip in January that explains the travel & lodge expenses for Q4 2020.
The surge in Education expense for Q1 2021 was related to the deposit charged by Harrisburg U.
.
---
title: "ANLY 512 Final Project"
author: "Yuxuan Zhao"
date: "`r Sys.Date()`"
output:
flexdashboard::flex_dashboard:
storyboard: true
social: menu
source: embed
orientation: columns
vertical_layout: fill
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(flexdashboard)
library(knitr)
library(ggplot2)
library(tidyverse)
library(readxl)
library(dplyr)
library(xts)
library(zoo)
library(lubridate)
setwd("~/Work Files - R")
```
### Overview of the Quantified Self movement

***
Final Project for ANLY-512: Data Visualization
Professor Alan Hitch, Ph.D.
The Quantified Self movement grew from the popularity and growth of the internet of things, the mass collection of personal information, and mobile technologies (primarily wearable computing).
This final class project uses a collection of One Year of data (Oct. 2020 - Sep. 2021) on spending and payments captured by my personal Amex card.
The goal of the project is to collect, analyze and visualize the data using the tools and methods covered in class. Additionally, using the data-driven approach, I will create a summary which answers the following questions based on the data collected.
1) What is the total spending by month in Oct. 2020 - Sep. 2021?
2) Which category costed the most in Oct. 2020 - Sep. 2021?
3) For Dining category, what is the spending by month in Oct. 2020 - Sep. 2021?
4) In February and July, what is the spending in each category?
5) Compared Q4 2020 and Q1 2021, what is the difference?
### Data Preparation
```{r}
setwd("~/Work Files - R")
library(readxl)
data <- read_excel("amexdata.xlsx")
View(data)
head(data,12)
tail(data,12)
```
***
- There are 7 variables we are interested in:
- Tran. Date: Transaction Date
- Post Date
- Month
- Description: Detailed Description of purchases
- Amount: The total amount spent
- Category
- Year: 2020 or 2021
### Q1: What is the total spending by month in Oct. 2020 - Sep. 2021?
```{r fig.height=8, fig.width=10}
# plot
library(ggplot2)
fill <- "gold2"
line <- "goldenrod2"
p1 <- ggplot(data, aes(x = Month, y=Amount)) +
geom_bar(stat = "identity", fill = "Grey")+
scale_x_discrete(limits=c("Oct","Nov","Dec","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep")) +
labs(title = "Spending per Month", x = "Month", y = "Amount") +
theme_minimal()
p1
```
***
Since Month is categorical and Amount is continuous, and I aim to compare the total amount in each month, bar chart was used to summarize the data.
I summarized the total amount spent in each month and used bar chart to display the numbers.
From the plot we could find that average spending per month is around 3,800 dollars. June has the highest spending of ~ $5,800.00.
### Q2: Which category costed the most in Oct. 2020 - Sep. 2021?
```{r fig.height=12, fig.width=10}
library(ggplot2)
p2 <- ggplot(data, aes(x = Category, y=Amount, fill=Category)) +
geom_bar(stat = "identity", fill = "Grey")+
labs(title = "Spending per Category", x = "Category", y = "Amount") +
coord_flip()
p2
```
***
Due to the same reason, bar chart was used here. The name of each category is long and can not be displayed at the bottom, so a rotated bar chart was created.
Here We see the category I spend the most money is Restaurant, around $18,000 for the 12-month period between Oct. 2020 - 2021.
### Q3: For Restaurant, what is the spending by month in Oct. 2020 - Sep. 2021?
```{r fig.height=10, fig.width=12}
data1 <- subset(data, Category=='Restaurant-Restaurant')
p3 <- ggplot(data1, aes(x = Month, y = Amount)) +
geom_bar(stat = "identity", fill = "Grey") +
scale_x_discrete(limits=c("Oct","Nov","Dec","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep")) +
labs(title = "Spending per Month", x = "Month", y = "Amount") +
theme_minimal()
p3
```
***
We see that for most months, the spending on Restaurant was more than 1,000 dollars. However, in February and July, it is almost 3 times higher than the other months.
I will perform an in-depth view of these two months, but the quick answer would be:
Feb. has Chinese New Year so traditionally we go out and celebrate.
July 28th is my girlfriend's birthday so fancy dinners were in order. Spent $700+ at two different fine establishments.
### Q4: In February and July, what is the spending in each category?
```{r fig.height=8, fig.width=12}
data2 <- subset(data, Month == '2')
p4 <- ggplot(data2, aes(x = Category, y=Amount, fill=Category)) +
geom_bar(stat = "identity", fill = "Grey")+
labs(title = "Spending per Category - February", x = "Category", y = "Amount") +
coord_flip()
data3 <- subset(data, Month == '7')
p5 <- ggplot(data3, aes(x = Category, y=Amount, fill=Category)) +
geom_bar(stat = "identity", fill = "Grey")+
labs(title = "Spending per Category - July", x = "Category", y = "Amount") +
coord_flip()
require(gridExtra)
grid.arrange(p4, p5, ncol=2)
```
### Q5: Compare Oct - Dec, 2020 and Jan - Mar, 2021, what is the difference?
```{r fig.height=12, fig.width=10}
setwd("~/Work Files - R")
library(readxl)
amexdata_Copy <- read_excel("amexdata - Copy.xlsx")
View(amexdata_Copy)
comp <- data.frame(amexdata_Copy)
comp$Year <- as.factor(comp$Year)
ggplot(data=comp, aes(x=Category, y=Amount)) +
geom_bar(aes(fill = Year), position = "dodge", stat="identity") +
labs(title = "Spending per Category", x = "Category", y = "Amount") +
theme(legend.position = "Right") +
scale_fill_discrete(name="Year") +
theme(legend.position = "bottom") +
coord_flip()
```
***
Since the purpose was to compare the spending between Q4 2020 and Q1 2021, a grouped bar chart was used.
There was a Year-end holiday trip in January that explains the travel & lodge expenses for Q4 2020.
The surge in Education expense for Q1 2021 was related to the deposit charged by Harrisburg U.
.