A project for ANLY512: Data Visualization AT Hitch
The Quantified Self (QS) is a movement motivated to leverage the synergy of wearables, analytics, and “Big Data”. Designing this project around the QS movement makes perfect sense because it offers the opportunity to be both the data and question provider, the data analyst, the vis designer, and the end user.In this individual project, I uses a collection of two years’ data on spending and payments through discover credit card.
The goal of the project is to collect, analyze and visualize the data using the tools and methods covered in this class. Additionally, I will create a summary which answers the following questions based on the data collected: 1) What’s the total spending per month in 2017? 2)In April, August and December, how’s the spending in each category? 3) What’s the subtoal spending per transaction type in 2017? 4) For Merchandise, how’s the spending vary from month to month? 5) Compared to 2018, what’s the difference?
| Trans..Date | Post.Date | Month | Year | Description | Amount | Category |
|---|---|---|---|---|---|---|
| 1/3/2017 | 1/3/2017 | 1 | 2017 | AMAZON.COM AMZN.COM/BILLWANWO0HI9DROP | 5.82 | Merchandise |
| 1/5/2017 | 1/5/2017 | 1 | 2017 | PAYMENT - THANK YOU | -1458.00 | Payments and Credits |
| 1/7/2017 | 1/7/2017 | 1 | 2017 | TARGET.COM * 800-591-3869 MN | 2.96 | Merchandise |
| 1/7/2017 | 1/7/2017 | 1 | 2017 | TARGET.COM * 800-591-3869 MN | 49.72 | Merchandise |
| 1/9/2017 | 1/9/2017 | 1 | 2017 | LOWE’S OF MILFORD, MA MILFORD MA | 16.20 | Home Improvement |
| 1/9/2017 | 1/9/2017 | 1 | 2017 | STOP & SHOP 0040 MILFORD MA00903R | 4.38 | Supermarkets |
| 1/9/2017 | 1/9/2017 | 1 | 2017 | TARGET MILFORD MA | 14.10 | Supermarkets |
| 1/9/2017 | 1/9/2017 | 1 | 2017 | TJMAXX #0619 MILFORD MA | 8.50 | Merchandise |
| 1/10/2017 | 1/10/2017 | 1 | 2017 | TARGET.COM * 800-591-3869 MN | 19.48 | Merchandise |
| 1/11/2017 | 1/11/2017 | 1 | 2017 | SHELL 57544922503 WESTBOROUGH MA | 65.03 | Gasoline |
| 1/11/2017 | 1/11/2017 | 1 | 2017 | WAL-MART SC - #2158 NORTHBOROUGH MA | 71.32 | Merchandise |
| 1/13/2017 | 1/13/2017 | 1 | 2017 | DUNKIN #339844 Q35 MARLBOROUGH MA | 1.06 | Restaurants |
| 1/13/2017 | 1/15/2017 | 1 | 2017 | TARGET MARLBOROUGH MA | -11.99 | Payments and Credits |
| 1/15/2017 | 1/15/2017 | 1 | 2017 | H MART BURLINGTON BURLINGTON MA | 123.81 | Supermarkets |
| 1/16/2017 | 1/16/2017 | 1 | 2017 | TARGET WESTBOROUGH MA | 21.24 | Supermarkets |
| Category | Amount | Year |
|---|---|---|
| Automotive | 0.00 | 2017 |
| Department Stores | 265.67 | 2017 |
| Education | 0.00 | 2017 |
| Gasoline | 287.63 | 2017 |
| Government Services | 0.00 | 2017 |
| Home Improvement | 101.04 | 2017 |
| Medical Services | 28.88 | 2017 |
| Merchandise | 7427.75 | 2017 |
| Other/Miscellaneous | 0.00 | 2017 |
| Restaurants | 725.83 | 2017 |
| Services | 77.63 | 2017 |
| Supermarkets | 1207.49 | 2017 |
| Travel/Entertainment | 412.51 | 2017 |
| Wholesale Clubs | 0.00 | 2017 |
| Automotive | 0.00 | 2018 |
There are 7 variables listed:
Year: 2017 or 2018
Since Month is categorical and Amount is continous, and I was aimed to compare the total amount in each month, bar chart was used to summarize the data. I summarized the total amount spent in each month and usd bar chart display the numbers. From the plot we could find that, average spending over month was around $800. Total spending in April, August and December were the most.
From the plot, we can see that top 3 caterogies I spent frequently in were Merchandise, Supermarkets and Restaurants. I checked the detailed transaction history.I spent almost fifty percent in April for the purchase in IKEA. And most merchandises were online shopping.
The name of each category is long and can not be displayed at the bottom, so a rotated dot plot was created. Here We see the category I spent most was Merchandise, around $7500, which was much higher than the other categories. The 2nd and 3rd top categories were Supermarkets and restaurants, around $1200 and $730 respectively.
We see that for half of the year, the spending on Merchandise was less than $500. However, in spring and winter, it’s almost doubled than the other months.
```
Since the purpose was to compare the spending between 2017 and 2018, a grouped bar chart was used. From the plot we can see that, overall YTD spending in 2018 is much less than that in 2017, especially Merchandise. The main reason is that I’ve opened another two credit cards this year to take advantage of different rebate policies, but only 1 credit card was used in 2017.
Based on the visual analytics, following conclusions can be drawn
I spent most money on merchandise, supermarkets and restautants.I didn’t travel a lot in 2017.
Monthly cost on merchadise was $750 in average.Summer and winter season seemed the peak season for online shopping.
There was not big change of my spending habit between 2017 and 2018.
Though I used Discover credit card mostely, I’ve also had another Citibank card and Chase card applied in the mid/fall of 2018. The comparison may not be that accurate since the data for this year is not complete.
---
title: "ANLY 512 Final Project"
author: "Fuyan Li"
date: "Dec. 2018"
output:
flexdashboard::flex_dashboard:
storyboard: true
social: menu
source: embed
orientation: columns
vertical_layout: fill
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(flexdashboard)
library(knitr)
library(ggplot2)
library(tidyverse)
library(readxl)
library(dplyr)
library(xts)
library(zoo)
library(lubridate)
setwd("C:/Users/fli/Desktop/MyWork/Harrisburg/ANLY512/Final Project/")
```
###Overview of the Quantified Self movement
```{r}
knitr::include_graphics("/Users/fli/Desktop/MyWork/Harrisburg/ANLY512/Final Project/spending.jpg")
```
***
A project for ANLY512: Data Visualization
AT Hitch
The Quantified Self (QS) is a movement motivated to leverage the synergy of wearables, analytics, and "Big Data". Designing this project around the QS movement makes perfect sense because it offers the opportunity to be both the data and question provider, the data analyst, the vis designer, and the end user.In this individual project, I uses a collection of two years' data on spending and payments through discover credit card.
The goal of the project is to collect, analyze and visualize the data using the tools and methods covered in this class. Additionally, I will create a summary which answers the following questions based on the data collected:
1) What's the total spending per month in 2017?
2)In April, August and December, how's the spending in each category?
3) What's the subtoal spending per transaction type in 2017?
4) For Merchandise, how's the spending vary from month to month?
5) Compared to 2018, what's the difference?
### Data preparation
```{r}
data <- read.csv("C:/Users/fli/Desktop/MyWork/Harrisburg/ANLY512/Final Project/Discover-2017-YearEndSummary.csv")
kable(data[1:15,], caption="Sample raw data in 2017")
comp <- read.csv("C:/Users/fli/Desktop/MyWork/Harrisburg/ANLY512/Final Project/Discover-2018-YearToDateSummary.csv")
kable(comp[1:15,], caption="Sample raw data of 2017 and 2018")
```
***
- There are 7 variables listed:
- Tran. Date: Transaction Date
- Post Date
- Month
- Description: Detailed Description of purchases
- Amount: The total amount spent
- Category
- Year: 2017 or 2018
### Q1: What's the total spending per month in 2017?
```{r}
# plot
library(ggplot2)
fill <- "gold1"
line <- "goldenrod2"
p<- ggplot(data, aes(x = Month, y=Amount)) +
geom_bar(stat = "identity", fill = "LightBlue")+
scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) +
labs(title = "Monthly Spending 2017", x = "Month", y = "Amount $") +
theme_bw()
p
```
***
Since Month is categorical and Amount is continous, and I was aimed to compare the total amount in each month, bar chart was used to summarize the data.
I summarized the total amount spent in each month and usd bar chart display the numbers.
From the plot we could find that, average spending over month was around $800. Total spending in April, August and December were the most.
### Q2: In April, August and December, wha's the spending in each category?
```{r}
library(ggplot2)
data2 <- subset(data, Month=='4')
p1 <- ggplot(data2, aes(Amount,Category ))
p2 <-p1 + geom_point(aes(colour = factor(Category)))+
labs(title = "Transactions in Apr.2017", x = "Amount $", y = "Category") +
theme_bw()
p2
data3 <- subset(data, Month=='8')
p3 <- ggplot(data3, aes(Amount,Category ))
p4 <-p3 + geom_point(aes(colour = factor(Category)))+
labs(title = "Transactions in Aug.2017", x = "Amount $", y = "Category") +
theme_bw()
p4
data4 <- subset(data, Month=='12')
p5 <- ggplot(data3, aes(Amount,Category ))
p6 <-p5 + geom_point(aes(colour = factor(Category)))+
labs(title = "Transactions in Dec.2017", x = "Amount $", y = "Category") +
theme_bw()
p6
```
***
From the plot, we can see that top 3 caterogies I spent frequently in were Merchandise, Supermarkets and Restaurants.
I checked the detailed transaction history.I spent almost fifty percent in April for the purchase in IKEA. And most merchandises were online shopping.
### Q3: What's the subtoal spending per transaction type in 2017?
```{r}
agg2 <- aggregate(data$Amount, by=list(Category=data$Category), FUN=sum)
order_agg2 <- agg2[order(agg2$x),]
dotchart(order_agg2$x, labels=order_agg2$Category, cex=0.7, color="Purple", main="Subtotal $ per category in 2017")
```
***
The name of each category is long and can not be displayed at the bottom, so a rotated dot plot was created.
Here We see the category I spent most was Merchandise, around $7500, which was much higher than the other categories. The 2nd and 3rd top categories were Supermarkets and restaurants, around $1200 and $730 respectively.
### Q4: For Merchandise, how's the spending vary from month to month?
```{r}
data1<- subset(data, Category=='Merchandise')
p<- ggplot(data1, aes(x = Month, y=Amount)) +
geom_bar(stat = "identity", fill = "Orange")+
scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) +
labs(title = "Merchandise spending per month", x = "Month", y = "Amount $") +
theme_minimal()
p
```
***
We see that for half of the year, the spending on Merchandise was less than $500. However, in spring and winter, it's almost doubled than the other months.
### Q5: Compared to 2018, what's the difference?
```{r echo=FALSE, warning=FALSE}
comp<-data.frame(comp)
comp$Year <- as.factor(comp$Year)
ggplot(data=comp, aes(x=Category, y=Amount)) +
geom_bar(aes(fill = Year), position = "dodge", stat="identity") +
labs(title = "Spending 2017 vs 2018", x = "Category", y = "Amount $") +
theme(legend.position = "Right") +
scale_fill_discrete(name="Year") +
theme(legend.position = "bottom") +
coord_flip()
```
```
***
Since the purpose was to compare the spending between 2017 and 2018, a grouped bar chart was used.
From the plot we can see that, overall YTD spending in 2018 is much less than that in 2017, especially Merchandise.
The main reason is that I've opened another two credit cards this year to take advantage of different rebate policies, but only 1 credit card was used in 2017.
###Conclusion
Based on the visual analytics, following conclusions can be drawn
1. I spent most money on merchandise, supermarkets and restautants.I didn't travel a lot in 2017.
2. Monthly cost on merchadise was $750 in average.Summer and winter season seemed the peak season for online shopping.
3. There was not big change of my spending habit between 2017 and 2018.
4. Though I used Discover credit card mostely, I've also had another Citibank card and Chase card applied in the mid/fall of 2018. The comparison may not be that accurate since the data for this year is not complete.