quant_self
A project for ANLY512: Data Visualization
The Quantified Self movement grew from the popularity and growth of the internet of things, the mass collection of personal information, and mobile technologies (primarily wearable computing). This final class project uses a collection of 2 year of data on spending and payments captured by discover credit card.
The goal of the project is to collect, analyze and visualize the data using the tools and methods covered in class. Additionally, using the data-driven approach, I will create a summary which answers the following questions based on the data collected 1) What is the total spending by month in 2018? 2) Which category costed the most in 2018? 3) For Merchandise, what is the spending by month in 2018? 4) In march and November, what is the spending in each category? 5) Compared to 2019, what is the difference?
| Trans. Date | Post Date | Month | Description | Amount | Category |
|---|---|---|---|---|---|
| 2018-01-03 | 2018-01-03 | 1 | POPEYE’S #5961 DALLAS TX | 9.72 | Restaurants |
| 2018-01-03 | 2018-01-03 | 1 | QT 891 DALLAS TX | 22.22 | Gasoline |
| 2018-01-07 | 2018-01-07 | 1 | BEIWEI CUISINE PLANO TX | 29.31 | Restaurants |
| 2018-01-07 | 2018-01-07 | 1 | PANDA EXPRESS #2447 THE COLONY TX | 9.96 | Restaurants |
| 2018-01-08 | 2018-01-08 | 1 | PARK’S NOODLE DALLAS CARROLLTON TX | 9.19 | Restaurants |
| 2018-01-09 | 2018-01-09 | 1 | CHEVRON 0209930 DALLAS TX00209930 3089853 | 13.52 | Gasoline |
| 2018-01-09 | 2018-01-09 | 1 | JACK IN THE BOX 3838 RICHARDSON TX00921R | 10.27 | Restaurants |
| 2018-01-09 | 2018-01-09 | 1 | USA*MCLIFF COFFEE VEN RICHARDSON TXEV219853-1515526393 | 1.25 | Restaurants |
| 2018-01-10 | 2018-01-10 | 1 | PARK’S NOODLE DALLAS CARROLLTON TX | 17.30 | Restaurants |
| 2018-01-10 | 2018-01-10 | 1 | TASTE OF KOREA CARROLLTON TX | 10.00 | Restaurants |
| 2018-01-11 | 2018-01-11 | 1 | FATNI BBQ PLANO TX | 15.61 | Restaurants |
| 2018-01-12 | 2018-01-12 | 1 | MASTER RICE INC CARROLLTON TX | 12.85 | Restaurants |
| 2018-01-12 | 2018-01-12 | 1 | TASTE OF KOREA CARROLLTON TX | 11.00 | Restaurants |
| 2018-01-13 | 2018-01-13 | 1 | AVENUE K VALERO PLANO TX | 4.64 | Gasoline |
| 2018-01-13 | 2018-01-13 | 1 | QT 936 DALLAS TX | 30.01 | Gasoline |
| Category | Amount | Year |
|---|---|---|
| Automotive | 400.20 | 2019 |
| Awards and Rebate Credits | -48.53 | 2019 |
| Fees | 27.00 | 2019 |
| Gasoline | 411.81 | 2019 |
| Interest | 154.76 | 2019 |
| Merchandise | 93.55 | 2019 |
| Payments and Credits | -4451.47 | 2019 |
| Restaurants | 1005.58 | 2019 |
| Services | 85.75 | 2019 |
| Supermarkets | 403.35 | 2019 |
| Travel/ Entertainment | 223.49 | 2019 |
| Automotive | 565.19 | 2018 |
| Awards and Rebate Credits | -295.65 | 2018 |
| Department Stores | 408.73 | 2018 |
| Fees | 60.50 | 2018 |
There are 7 variables we are interested in:
Year: 2018 or 2019
Since Month is categorical and Amount is continous, and I was aimed to compare the total mounth in each month, bar chart was used to summarize the data. I summarized the total amount spent in each month and usd bar chart display the numbers. From the plot we could find that, average spending over month is around 1,000 dollars. Total spending in October and December are the most
Due to the same reason, bar chart was used here. The name of each category is long and can not be displayed at the bottom, so a rotated bar chart was created. Here We see the category I spend the most money is Restaurants, around $4000, which is much higher than the other catefories. The second category is Marchandise, around $1800.
We see that for most months, the spending on Restaurant was less than 375 dollars. However, in October , it is 2 times higher than the other months due to I proposed to my girlfriend in a fancy but very expensive restaurant lol.
From the plot we can see that Gasoline is the my most spending in March and Supermarkets is my most spending in November. I spend more in supermarkets for November is because Thanksgiving holiday was in November and I spend a lot on preparing the thanksgiving holiday meal.
Since the purpose was to compare the spending between 2018 and the first 4 monthes of 2019, a grouped bar chart was used. From the plot we can see that, overall spending in 2019 is around 1/3 of what I spend in 2018, which make sense to me.
---
title: "ANLY 512 Final Project"
author: "Luoxi Hao"
date: "April 23, 2019"
output:
flexdashboard::flex_dashboard:
storyboard: true
social: menu
source: embed
orientation: columns
vertical_layout: fill
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(flexdashboard)
library(knitr)
library(ggplot2)
library(tidyverse)
library(readxl)
library(dplyr)
library(xts)
library(zoo)
library(lubridate)
```
###Overview of the Quantified Self movement

***
A project for ANLY512: Data Visualization
The Quantified Self movement grew from the popularity and growth of the internet of things, the mass collection of personal information, and mobile technologies (primarily wearable computing). This final class project uses a collection of 2 year of data on spending and payments captured by discover credit card.
The goal of the project is to collect, analyze and visualize the data using the tools and methods covered in class. Additionally, using the data-driven approach, I will create a summary which answers the following questions based on the data collected
1) What is the total spending by month in 2018?
2) Which category costed the most in 2018?
3) For Merchandise, what is the spending by month in 2018?
4) In march and November, what is the spending in each category?
5) Compared to 2019, what is the difference?
### Data Preparation
```{r}
data <- read_excel("C:/Users/lhao/Desktop/Discover-2018-YearEndSummary.xlsx")
kable(data[1:15,], caption="Sample raw data in 2018")
comp <- read_excel("C:/Users/lhao/Desktop/Compare.xlsx")
kable(comp[1:15,], caption="Sample raw data of 2018 and 2019")
```
***
- There are 7 variables we are interested in:
- Tran. Date: Transaction Date
- Post Date
- Month
- Description: Detailed Description of purchases
- Amount: The total amount spent
- Category
- Year: 2018 or 2019
###Q1: What is the total spending by month in 2018?
```{r}
# plot
library(ggplot2)
fill <- "gold1"
line <- "goldenrod2"
p<- ggplot(data, aes(x = Month, y=Amount)) +
geom_bar(stat = "identity", fill = "Blue")+
scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) +
labs(title = "Spending per Month", x = "Month", y = "Amount") +
theme_minimal()
p
```
***
Since Month is categorical and Amount is continous, and I was aimed to compare the total mounth in each month, bar chart was used to summarize the data.
I summarized the total amount spent in each month and usd bar chart display the numbers.
From the plot we could find that, average spending over month is around 1,000 dollars. Total spending in October and December are the most
###Q2: Which category costed the most in 2018?
```{r}
library(ggplot2)
p<- ggplot(data, aes(x = Category, y=Amount, fill=Category)) +
geom_bar(stat = "identity", fill = "Blue")+
labs(title = "Spending per Category", x = "Category", y = "Amount") +
coord_flip()
p
```
***
Due to the same reason, bar chart was used here. The name of each category is long and can not be displayed at the bottom, so a rotated bar chart was created.
Here We see the category I spend the most money is Restaurants, around $4000, which is much higher than the other catefories. The second category is Marchandise, around $1800.
###Q3: For Restaurants, what is the spending by month in 2018?
```{r}
data1<- subset(data, Category=='Restaurants')
p<- ggplot(data1, aes(x = Month, y=Amount)) +
geom_bar(stat = "identity", fill = "Blue")+
scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) +
labs(title = "Spending per Month", x = "Month", y = "Amount") +
theme_minimal()
p
```
***
We see that for most months, the spending on Restaurant was less than 375 dollars. However, in October , it is 2 times higher than the other months due to I proposed to my girlfriend in a fancy but very expensive restaurant lol.
###Q4: In march and November of 2018, what is the spending in each category?
```{r}
data2<- subset(data, Month=='3')
p1<- ggplot(data2, aes(x = Category, y=Amount, fill=Category)) +
geom_bar(stat = "identity", fill = "Blue")+
labs(title = "Spending per Category", x = "Category", y = "Amount") +
coord_flip()
p1
data3<- subset(data, Month=='11' )
p2<- ggplot(data3, aes(x = Category, y=Amount, fill=Category)) +
geom_bar(stat = "identity", fill = "DarkGreen")+
labs(title = "Spending per Category", x = "Category", y = "Amount") +
coord_flip()
p2
```
***
From the plot we can see that Gasoline is the my most spending in March and Supermarkets is my most spending in November. I spend more in supermarkets for November is because Thanksgiving holiday was in November and I spend a lot on preparing the thanksgiving holiday meal.
###Q5: Compared to 2019, what is the difference
```{r}
comp<-data.frame(comp)
comp$Year <- as.factor(comp$Year)
ggplot(data=comp, aes(x=Category, y=Amount)) +
geom_bar(aes(fill = Year), position = "dodge", stat="identity") +
labs(title = "Spending per Category", x = "Category", y = "Amount") +
theme(legend.position = "Right") +
scale_fill_discrete(name="Year") +
theme(legend.position = "bottom") +
coord_flip()
```
***
Since the purpose was to compare the spending between 2018 and the first 4 monthes of 2019, a grouped bar chart was used.
From the plot we can see that, overall spending in 2019 is around 1/3 of what I spend in 2018, which make sense to me.