Overview of the Quantified Self movement

quant_self

quant_self


A project for ANLY512: Data Visualization

The Quantified Self movement grew from the popularity and growth of the internet of things, the mass collection of personal information, and mobile technologies (primarily wearable computing). This final class project uses a collection of 2 year of data on spending and payments captured by discover credit card.

The goal of the project is to collect, analyze and visualize the data using the tools and methods covered in class. Additionally, using the data-driven approach, I will create a summary which answers the following questions based on the data collected 1) What is the total spending by month in 2018? 2) Which category costed the most in 2018? 3) For Merchandise, what is the spending by month in 2018? 4) In march and November, what is the spending in each category? 5) Compared to 2019, what is the difference?

Data Preparation

Sample raw data in 2018
Trans. Date Post Date Month Description Amount Category
2018-01-03 2018-01-03 1 POPEYE’S #5961 DALLAS TX 9.72 Restaurants
2018-01-03 2018-01-03 1 QT 891 DALLAS TX 22.22 Gasoline
2018-01-07 2018-01-07 1 BEIWEI CUISINE PLANO TX 29.31 Restaurants
2018-01-07 2018-01-07 1 PANDA EXPRESS #2447 THE COLONY TX 9.96 Restaurants
2018-01-08 2018-01-08 1 PARK’S NOODLE DALLAS CARROLLTON TX 9.19 Restaurants
2018-01-09 2018-01-09 1 CHEVRON 0209930 DALLAS TX00209930 3089853 13.52 Gasoline
2018-01-09 2018-01-09 1 JACK IN THE BOX 3838 RICHARDSON TX00921R 10.27 Restaurants
2018-01-09 2018-01-09 1 USA*MCLIFF COFFEE VEN RICHARDSON TXEV219853-1515526393 1.25 Restaurants
2018-01-10 2018-01-10 1 PARK’S NOODLE DALLAS CARROLLTON TX 17.30 Restaurants
2018-01-10 2018-01-10 1 TASTE OF KOREA CARROLLTON TX 10.00 Restaurants
2018-01-11 2018-01-11 1 FATNI BBQ PLANO TX 15.61 Restaurants
2018-01-12 2018-01-12 1 MASTER RICE INC CARROLLTON TX 12.85 Restaurants
2018-01-12 2018-01-12 1 TASTE OF KOREA CARROLLTON TX 11.00 Restaurants
2018-01-13 2018-01-13 1 AVENUE K VALERO PLANO TX 4.64 Gasoline
2018-01-13 2018-01-13 1 QT 936 DALLAS TX 30.01 Gasoline
Sample raw data of 2018 and 2019
Category Amount Year
Automotive 400.20 2019
Awards and Rebate Credits -48.53 2019
Fees 27.00 2019
Gasoline 411.81 2019
Interest 154.76 2019
Merchandise 93.55 2019
Payments and Credits -4451.47 2019
Restaurants 1005.58 2019
Services 85.75 2019
Supermarkets 403.35 2019
Travel/ Entertainment 223.49 2019
Automotive 565.19 2018
Awards and Rebate Credits -295.65 2018
Department Stores 408.73 2018
Fees 60.50 2018

Q1: What is the total spending by month in 2018?


Since Month is categorical and Amount is continous, and I was aimed to compare the total mounth in each month, bar chart was used to summarize the data. I summarized the total amount spent in each month and usd bar chart display the numbers. From the plot we could find that, average spending over month is around 1,000 dollars. Total spending in October and December are the most

Q2: Which category costed the most in 2018?


Due to the same reason, bar chart was used here. The name of each category is long and can not be displayed at the bottom, so a rotated bar chart was created. Here We see the category I spend the most money is Restaurants, around $4000, which is much higher than the other catefories. The second category is Marchandise, around $1800.

Q3: For Restaurants, what is the spending by month in 2018?


We see that for most months, the spending on Restaurant was less than 375 dollars. However, in October , it is 2 times higher than the other months due to I proposed to my girlfriend in a fancy but very expensive restaurant lol.

Q4: In march and November of 2018, what is the spending in each category?


From the plot we can see that Gasoline is the my most spending in March and Supermarkets is my most spending in November. I spend more in supermarkets for November is because Thanksgiving holiday was in November and I spend a lot on preparing the thanksgiving holiday meal.

Q5: Compared to 2019, what is the difference


Since the purpose was to compare the spending between 2018 and the first 4 monthes of 2019, a grouped bar chart was used. From the plot we can see that, overall spending in 2019 is around 1/3 of what I spend in 2018, which make sense to me.

---
title: "ANLY 512 Final Project"
author: "Luoxi Hao"
date: "April 23, 2019"
output: 
  flexdashboard::flex_dashboard:
    storyboard: true
    social: menu
    source: embed
    orientation: columns
    vertical_layout: fill
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(flexdashboard)
library(knitr)
library(ggplot2)
library(tidyverse)
library(readxl)
library(dplyr)
library(xts)
library(zoo)
library(lubridate)
```

###Overview of the Quantified Self movement
![quant_self](images.jpg)

***
A project for ANLY512: Data Visualization

The Quantified Self movement grew from the popularity and growth of the internet of things, the mass collection of personal information, and mobile technologies (primarily wearable computing). This final class project uses a collection of 2 year of data on spending and payments captured by discover credit card.

The goal of the project is to collect, analyze and visualize the data using the tools and methods covered in class. Additionally, using the data-driven approach, I will create a summary which answers the following questions based on the data collected 
1) What is the total spending by month in 2018?
2) Which category costed the most in 2018?
3) For Merchandise, what is the spending by month in 2018?
4) In march and November, what is the spending in each category?
5) Compared to 2019, what is the difference?

### Data Preparation

```{r}
data <- read_excel("C:/Users/lhao/Desktop/Discover-2018-YearEndSummary.xlsx")
kable(data[1:15,], caption="Sample raw data in 2018")
comp <- read_excel("C:/Users/lhao/Desktop/Compare.xlsx")
kable(comp[1:15,], caption="Sample raw data of 2018 and 2019")
```

***

- There are 7 variables we are interested in:

- Tran. Date: Transaction Date
- Post Date
- Month
- Description: Detailed Description of purchases
- Amount: The total amount spent	
- Category
- Year: 2018 or 2019



###Q1: What is the total spending by month in 2018?

```{r}
# plot
library(ggplot2)
fill <- "gold1"
line <- "goldenrod2"
p<- ggplot(data, aes(x = Month, y=Amount)) + 
  geom_bar(stat = "identity", fill = "Blue")+
  scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) +
  labs(title = "Spending per Month", x = "Month", y = "Amount") +
  theme_minimal()
p
```

***
Since Month is categorical and Amount is continous, and I was aimed to compare the total mounth in each month, bar chart was used to summarize the data.
I summarized the total amount spent in each month and usd bar chart display the numbers.
From the plot we could find that, average spending over month is around 1,000 dollars. Total spending in October and December are the most

###Q2: Which category costed the most in 2018? 

```{r}
library(ggplot2)
p<- ggplot(data, aes(x = Category, y=Amount, fill=Category)) + 
  geom_bar(stat = "identity", fill = "Blue")+
  labs(title = "Spending per Category", x = "Category", y = "Amount") +
  coord_flip()
p
```

***
Due to the same reason, bar chart was used here. The name of each category is long and can not be displayed at the bottom, so a rotated bar chart was created.
Here We see the category I spend the most money is Restaurants, around $4000, which is much higher than the other catefories. The second category is Marchandise, around $1800.

###Q3: For Restaurants, what is the spending by month in 2018?

```{r}
data1<- subset(data, Category=='Restaurants')
p<- ggplot(data1, aes(x = Month, y=Amount)) + 
  geom_bar(stat = "identity", fill = "Blue")+
  scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) +
  labs(title = "Spending per Month", x = "Month", y = "Amount") +
  theme_minimal()
p
```


***
We see that for most months, the spending on Restaurant was less than 375 dollars. However, in October , it is 2 times higher than the other months due to I proposed to my girlfriend in a fancy but very expensive restaurant lol.

###Q4: In march and November of 2018, what is the spending in each category?

```{r}
data2<- subset(data, Month=='3')
p1<- ggplot(data2, aes(x = Category, y=Amount, fill=Category)) + 
  geom_bar(stat = "identity", fill = "Blue")+
  labs(title = "Spending per Category", x = "Category", y = "Amount") +
  coord_flip()
p1

data3<- subset(data, Month=='11' )
p2<- ggplot(data3, aes(x = Category, y=Amount, fill=Category)) + 
  geom_bar(stat = "identity", fill = "DarkGreen")+
  labs(title = "Spending per Category", x = "Category", y = "Amount") +
  coord_flip()
p2
```

***
From the plot we can see that Gasoline is the my most spending in March and Supermarkets is my most spending in November. I spend more in supermarkets for November is because Thanksgiving holiday was in November and I spend a lot on preparing the thanksgiving holiday meal.

###Q5: Compared to 2019, what is the difference

```{r}
comp<-data.frame(comp)
comp$Year <- as.factor(comp$Year)
ggplot(data=comp, aes(x=Category, y=Amount)) +   
  geom_bar(aes(fill = Year), position = "dodge", stat="identity") +
    labs(title = "Spending per Category", x = "Category", y = "Amount") +
  theme(legend.position = "Right") +
  scale_fill_discrete(name="Year") +
  theme(legend.position = "bottom") +
  coord_flip()
```


***
Since the purpose was to compare the spending between 2018 and the first 4 monthes of 2019, a grouped bar chart was used.
From the plot we can see that, overall spending in 2019 is around 1/3 of what I spend in 2018, which make sense to me.