quant_self
A project for ANLY512: Data Visualization
The Quantified Self movement grew from the popularity and growth of the internet of things, the mass collection of personal information, and mobile technologies (primarily wearable computing). This final class project uses a collection of 2 years of data on spending and payments captured by discover credit card.
The goal of the project is to collect, analyze and visualize the data using the tools and methods covered in class. Additionally, using the data-driven approach, I will create a summary which answers the following questions based on the data collected 1) What is the total spending by month in 2017? 2) Which category costed the most in 2017? 3) For Merchandise, what is the spending by month in 2017? 4) In march and November, what is the spending in each category? 5) Compared to 2017, what is the difference?
| Trans..Date | Post.Date | Month | Description | Amount | Category |
|---|---|---|---|---|---|
| 1/1/2017 | 1/1/2017 | 1 | 114 WALTHAM YMCA 7818945295 MA | 42.00 | Travel/ Entertainment |
| 1/5/2017 | 1/6/2017 | 1 | MADEWELL.COM LYNCHBURG VA | -37.49 | Payments and Credits |
| 1/6/2017 | 1/6/2017 | 1 | AMAZON MKTPLACE PMTS AMZN.COM/BILLWAPJ5XQSQ01KR | 22.72 | Merchandise |
| 1/9/2017 | 1/9/2017 | 1 | RCN*CABLE PHONE INTERN 800-RINGRCN PA29126132 | 42.43 | Services |
| 1/10/2017 | 1/10/2017 | 1 | BACKCOUNTRY.COM 800-409-4502 UT438515825XO9ZZOO | 100.00 | Merchandise |
| 1/11/2017 | 1/11/2017 | 1 | KILLINGTON TICKET SALES KILLINGTON VT | 349.00 | Travel/ Entertainment |
| 1/13/2017 | 1/14/2017 | 1 | WOO JEON BURLINGTON MA | 9.58 | Restaurants |
| 1/20/2017 | 1/20/2017 | 1 | HANNAFORD #8017 WALTHAM MA | 22.02 | Supermarkets |
| 1/21/2017 | 1/21/2017 | 1 | MANGO NEW YORK CITYNY | 205.96 | Merchandise |
| 1/30/2017 | 1/30/2017 | 1 | REG OF MOTOR VEHICLE0900 617-351-9162 MA | 60.00 | Government Services |
| 2/1/2017 | 2/1/2017 | 2 | 114 WALTHAM YMCA 7818945295 MA | 42.00 | Travel/ Entertainment |
| 2/5/2017 | 2/5/2017 | 2 | PAYPAL *SHAMENYISUY 402-935-7733 CA | 9.99 | Services |
| 2/8/2017 | 2/8/2017 | 2 | RCN*CABLE PHONE INTERN 800-RINGRCN PA29418334 | 42.43 | Services |
| 2/10/2017 | 2/10/2017 | 2 | ABERCROMBIE & FITC NEW ALBANY OH | 100.20 | Merchandise |
| 2/11/2017 | 2/11/2017 | 2 | HONG KONG SUPERMARKET ALLSTON MA | 32.70 | Supermarkets |
| Category | Amount | Year |
|---|---|---|
| Automotive | 35 | 2017 |
| Department Stores | 576 | 2017 |
| Education | 0 | 2017 |
| Gasoline | 66 | 2017 |
| Government Services | 199 | 2017 |
| Home Improvement | 0 | 2017 |
| Medical Services | 0 | 2017 |
| Merchandise | 5540 | 2017 |
| Other/ Miscellaneous | 0 | 2017 |
| Restaurants | 77 | 2017 |
| Services | 533 | 2017 |
| Supermarkets | 429 | 2017 |
| Travel/ Entertainment | 1388 | 2017 |
| Wholesale Clubs | 0 | 2017 |
| Automotive | 0 | 2016 |
There are 7 variables we are interested in:
Year: 2016 or 2017
Since Month is categorical and Amount is continous, and I was aimed to compare the total mounth in each month, bar chart was used to summarize the data. I summarized the total amount spent in each month and usd bar chart display the numbers. From the plot we could find that, average spending over month is around 800 dollars. Total spending in March and November are the most.
Due to the same reason, bar chart was used here. The name of each category is long and can not be displayed at the bottom, so a rotated bar chart was created. Here We see the category I spend the most money is Merchandise, arount $5600, which is much higher than the other categories. The second category is Travel/Entertainemnt, around $1300.
We see that for most months, the spending on Merchandise was less than 500 dollars. However, in March and November, it is 3 times higher than the other months.
From the plot we can see that, still, the most spending is Merchandise. Besides, in November, the second spending is Travel/Entertainment. I checked detailed information. The auto insurance was renewed in March and Thanksgiving holiday was in November, which caused the majority of the spending.
Since the purpose was to compare the spending between 2016 and 2017, a grouped bar chart was used. From the plot we can see that, overall spending in 2017 is much less than that in 2016, especially Merchandise. The main reason is that during 2016, the most purchases were maded by discover card.
---
title: "ANLY 512 Final Project"
author: "Qianyi Chen"
date: "February 19, 2018"
output:
flexdashboard::flex_dashboard:
storyboard: true
social: menu
source: embed
orientation: columns
vertical_layout: fill
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(flexdashboard)
library(knitr)
library(ggplot2)
library(tidyverse)
library(readxl)
library(dplyr)
library(xts)
library(zoo)
library(lubridate)
setwd("T:\\Daisy Chen\\HU\\ANL 512 51")
```
###Overview of the Quantified Self movement

***
A project for ANLY512: Data Visualization
The Quantified Self movement grew from the popularity and growth of the internet of things, the mass collection of personal information, and mobile technologies (primarily wearable computing). This final class project uses a collection of 2 years of data on spending and payments captured by discover credit card.
The goal of the project is to collect, analyze and visualize the data using the tools and methods covered in class. Additionally, using the data-driven approach, I will create a summary which answers the following questions based on the data collected
1) What is the total spending by month in 2017?
2) Which category costed the most in 2017?
3) For Merchandise, what is the spending by month in 2017?
4) In march and November, what is the spending in each category?
5) Compared to 2017, what is the difference?
### Data preparation
```{r}
data <- read.csv("T:\\Daisy Chen\\HU\\ANL 512 51\\Discover-2017-YearEndSummary.csv")
kable(data[1:15,], caption="Sample raw data in 2017")
comp <- read.csv("T:\\Daisy Chen\\HU\\ANL 512 51\\2016.csv")
kable(comp[1:15,], caption="Sample raw data of 2016 and 2017")
```
***
- There are 7 variables we are interested in:
- Tran. Date: Transaction Date
- Post Date
- Month
- Description: Detailed Description of purchases
- Amount: The total amount spent
- Category
- Year: 2016 or 2017
###Q1: What is the total spending by month in 2017?
```{r}
# plot
library(ggplot2)
fill <- "gold1"
line <- "goldenrod2"
p<- ggplot(data, aes(x = Month, y=Amount)) +
geom_bar(stat = "identity", fill = "Blue")+
scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) +
labs(title = "Spending per Month", x = "Month", y = "Amount") +
theme_minimal()
p
```
***
Since Month is categorical and Amount is continous, and I was aimed to compare the total mounth in each month, bar chart was used to summarize the data.
I summarized the total amount spent in each month and usd bar chart display the numbers.
From the plot we could find that, average spending over month is around 800 dollars. Total spending in March and November are the most.
###Q2: Which category costed the most in 2017?
```{r}
library(ggplot2)
p<- ggplot(data, aes(x = Category, y=Amount, fill=Category)) +
geom_bar(stat = "identity", fill = "Blue")+
labs(title = "Spending per Category", x = "Category", y = "Amount") +
coord_flip()
p
```
***
Due to the same reason, bar chart was used here. The name of each category is long and can not be displayed at the bottom, so a rotated bar chart was created.
Here We see the category I spend the most money is Merchandise, arount $5600, which is much higher than the other categories. The second category is Travel/Entertainemnt, around $1300.
###Q3: For Merchandise, what is the spending by month in 2017?
```{r}
data1<- subset(data, Category=='Merchandise')
p<- ggplot(data1, aes(x = Month, y=Amount)) +
geom_bar(stat = "identity", fill = "Blue")+
scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) +
labs(title = "Spending per Month", x = "Month", y = "Amount") +
theme_minimal()
p
```
***
We see that for most months, the spending on Merchandise was less than 500 dollars. However, in March and November, it is 3 times higher than the other months.
###Q4: In march and November, what is the spending in each category?
```{r}
data2<- subset(data, Month=='3')
p1<- ggplot(data2, aes(x = Category, y=Amount, fill=Category)) +
geom_bar(stat = "identity", fill = "Blue")+
labs(title = "Spending per Category", x = "Category", y = "Amount") +
coord_flip()
p1
data3<- subset(data, Month=='11' )
p2<- ggplot(data3, aes(x = Category, y=Amount, fill=Category)) +
geom_bar(stat = "identity", fill = "DarkGreen")+
labs(title = "Spending per Category", x = "Category", y = "Amount") +
coord_flip()
p2
```
***
From the plot we can see that, still, the most spending is Merchandise. Besides, in November, the second spending is Travel/Entertainment.
I checked detailed information. The auto insurance was renewed in March and Thanksgiving holiday was in November, which caused the majority of the spending.
###Q5: Compared to 2017, what is the difference
```{r}
comp<-data.frame(comp)
comp$Year <- as.factor(comp$Year)
ggplot(data=comp, aes(x=Category, y=Amount)) +
geom_bar(aes(fill = Year), position = "dodge", stat="identity") +
labs(title = "Spending per Category", x = "Category", y = "Amount") +
theme(legend.position = "Right") +
scale_fill_discrete(name="Year") +
theme(legend.position = "bottom") +
coord_flip()
```
***
Since the purpose was to compare the spending between 2016 and 2017, a grouped bar chart was used.
From the plot we can see that, overall spending in 2017 is much less than that in 2016, especially Merchandise.
The main reason is that during 2016, the most purchases were maded by discover card.