A project for ANLY512: Data Visualization
The data I have collected is for a year long, but it is from 2017 April to 2018 March. In the last step, I am trying to do the comparison between two years, although the time period is not the same, it is a clear way to see what I have spent from the Discover account was openned.
The goal of the project is to collect, analyze and visualize the data using the tools and methods covered in class. Additionally, using the data-driven approach, I will create a summary which answers the following questions based on the data collected
| Trans..Date | Post.Date | Month | Description | Amount | Category |
|---|---|---|---|---|---|
| 4/11/17 | 4/11/17 | 4 | DUANE READE #14306 JERSEY CITY NJ | 10.33 | Merchandise |
| 4/11/17 | 4/11/17 | 4 | JIAN BING MAN. NEW YORK NY | 10.88 | Restaurants |
| 4/12/17 | 4/12/17 | 4 | KOBEYAKI 2 NEW YORK NY | 10.85 | Restaurants |
| 4/13/17 | 4/13/17 | 4 | DUANE READE #14306 JERSEY CITY NJ | 15.28 | Merchandise |
| 4/13/17 | 4/13/17 | 4 | KOBEQUE NEW YORK NY | 16.32 | Restaurants |
| 4/14/17 | 4/14/17 | 4 | AMC ONLINE #9640 888-440-4262 KS | 50.58 | Travel/ Entertainment |
| 4/14/17 | 4/14/17 | 4 | FRESCA DELI JERSEY CITY NJ | 19.76 | Supermarkets |
| 4/14/17 | 4/14/17 | 4 | MUJI FIFTH AVE NEW YORK NY | 31.31 | Merchandise |
| 4/14/17 | 4/14/17 | 4 | SQ *ARANJIRA NEWYORK NY0001152921507569922470 | 40.28 | Merchandise |
| 4/14/17 | 4/14/17 | 4 | SUNRISE MART - 41ST ST NEW YORK NY | 11.99 | Supermarkets |
| 4/15/17 | 4/15/17 | 4 | AMC EMPIRE 25 #0552 NEW YORK NY | 5.09 | Travel/ Entertainment |
| 4/15/17 | 4/15/17 | 4 | SHANGHAI BEST JERSEY CITY NJ | 47.72 | Restaurants |
| 4/16/17 | 4/16/17 | 4 | DUANE READE #14123 NEW YORK NY | 9.34 | Merchandise |
| 4/16/17 | 4/16/17 | 4 | H MART NEW YORK NY01604R | 19.96 | Supermarkets |
| 4/16/17 | 4/16/17 | 4 | HER NAME IS HAN NEW YORK NY | 36.57 | Restaurants |
| Category | Amount | Year |
|---|---|---|
| Awards and Rebate Credits | 3 | 2017 |
| Awards and Rebate Credits | 1 | 2018 |
| Department Stores | 1 | 2017 |
| Department Stores | 1 | 2018 |
| Education | 3 | 2017 |
| Education | 0 | 2018 |
| Home Improvement | 1 | 2017 |
| Home Improvement | 0 | 2018 |
| Merchandise | 64 | 2017 |
| Merchandise | 18 | 2018 |
| Payments and Credits | 16 | 2017 |
| Payments and Credits | 4 | 2018 |
| Restaurants | 123 | 2017 |
| Restaurants | 8 | 2018 |
There are 7 variables we are interested in:
Year: 2017 April and 2018 March
Comparing the total mounth in each month, bar chart was used to summarize the data. The total amount spent in each month and usd bar chart display the numbers. From the plot we could find that, average spending over month is around 500 dollars. Total spending in July 2017, December 2017, and March 2018 are the most.
Due to the same reason, bar chart was used here. The name of each category is long and can not be displayed at the bottom, so a rotated bar chart was created. Here we see the category I spend the most money is Merchandise, around $5000, which is much higher than the other categories. The second category is Restaurant, around $1300. The third category is Supermarkets, around $600.
We see that for most months, the spending was normal. However, in July 2017, December 2017 and March 2018, it is higher than the other months.
From the plot we can see that, still, the most spending is Merchandise. While in July 2017, the spending of Restaurants is the most. By checking detailed information. I have ordered too much salad from Pret a Manager.
Even the time period is not the same, but I would like to see the comparison between each category. Since the purpose was to compare the spending between 2017 and 2018, a grouped bar chart was used. The data for 2018 only covers 3 months, while for 2017, it covers 9 months. From the plot we can see that, anyway, spending in 2018 is much less than that in 2017, especially Restaurant. The main reason is that during 2017, the card was just opened. And I am trying to use it more. what’s more, I don’t have too much money to eat outside, shop at department stores, travel, etc. Start from 2018, I have to pay the tuition by myself, so there is no more extra money to spend and pay the balance:(
---
title: "ANLY 512 Final Project"
author: "Linfang Li"
output:
flexdashboard::flex_dashboard:
storyboard: true
social: menu
source: embed
orientation: columns
vertical_layout: fill
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(flexdashboard)
library(knitr)
library(ggplot2)
library(tidyverse)
library(readxl)
library(dplyr)
library(xts)
library(zoo)
library(lubridate)
```
###Overview of the Quantified Self movement
A project for ANLY512: Data Visualization
**The data I have collected is for a year long, but it is from 2017 April to 2018 March. In the last step, I am trying to do the comparison between two years, although the time period is not the same, it is a clear way to see what I have spent from the Discover account was openned.**
The goal of the project is to collect, analyze and visualize the data using the tools and methods covered in class. Additionally, using the data-driven approach, I will create a summary which answers the following questions based on the data collected
1) What is the total spending by month?
2) Which category costed the most?
3) What is the spending by month?
4) In July 2017, December 2017 and March 2018, what is the spending in each category?
5) Comparison, what is the difference?
###Data preparation
```{r}
data <- read.csv("/Users/Wen/Downloads/Discover-AllAvailable-20180814.CSV")
kable(data[1:15,], caption="Sample raw data")
comp <- read.csv("/Users/Wen/Documents/Discover-AllAvailable-Category.CSV")
kable(comp[1:14,], caption="Sample raw data by category")
```
***
- There are 7 variables we are interested in:
- Tran. Date: Transaction Date
- Post Date
- **Month (Categorical)**
- Description: Detailed Description of purchases
- **Amount: The total amount spent (continous)**
- Category
- Year: 2017 April and 2018 March
###Q1: What is the total spending by month?
```{r}
# plot
library(ggplot2)
fill <- "gold1"
line <- "goldenrod2"
p<- ggplot(data, aes(x = Month, y=Amount)) +
geom_bar(stat = "identity", fill = "LightGreen")+
scale_x_discrete(limits=c("Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar")) +
labs(title = "Spending per Month from 2017 April to 2018 March", x = "Month", y = "Amount") +
theme_minimal()
p
```
***
Comparing the total mounth in each month, bar chart was used to summarize the data.
The total amount spent in each month and usd bar chart display the numbers.
From the plot we could find that, average spending over month is around 500 dollars. Total spending in July 2017, December 2017, and March 2018 are the most.
###Q2: Which category costed the TOP 3?
```{r}
library(ggplot2)
p<- ggplot(data, aes(x = Category, y=Amount, fill=Category)) +
geom_bar(stat = "identity", fill = "LightGreen")+
labs(title = "Spending per Category", x = "Category", y = "Amount") +
coord_flip()
p
```
***
Due to the same reason, bar chart was used here. The name of each category is long and can not be displayed at the bottom, so a rotated bar chart was created.
Here we see the category I spend the most money is Merchandise, around $5000, which is much higher than the other categories. The second category is Restaurant, around $1300. The third category is Supermarkets, around $600.
###Q3: For each category, what is the spending by month?
```{r}
data1<- subset(data, Category=='Merchandise')
p<- ggplot(data1, aes(x = Month, y=Amount)) +
geom_bar(stat = "identity", fill = "LightGreen")+
scale_x_discrete(limits=c("Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar")) +
labs(title = "Merchandise", x = "Month", y = "Amount") +
theme_minimal()
p
data5<- subset(data, Category=='Restaurants')
p4<- ggplot(data5, aes(x = Month, y=Amount)) +
geom_bar(stat = "identity", fill = "LightGreen")+
scale_x_discrete(limits=c("Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar")) +
labs(title = "Spending per Month from 2017 April to 2018 March", x = "Month", y = "Amount") +
theme_minimal()
p4
data6<- subset(data, Category=='Awards and Rebate Credits')
p5<- ggplot(data6, aes(x = Month, y=Amount)) +
geom_bar(stat = "identity", fill = "LightGreen")+
scale_x_discrete(limits=c("Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar")) +
labs(title = "Restaurants", x = "Month", y = "Amount") +
theme_minimal()
p5
data7<- subset(data, Category=='Department Stores')
p6<- ggplot(data7, aes(x = Month, y=Amount)) +
geom_bar(stat = "identity", fill = "LightGreen")+
scale_x_discrete(limits=c("Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar")) +
labs(title = "Department Stores", x = "Month", y = "Amount") +
theme_minimal()
p6
data8<- subset(data, Category=='Education')
p7<- ggplot(data8, aes(x = Month, y=Amount)) +
geom_bar(stat = "identity", fill = "LightGreen")+
scale_x_discrete(limits=c("Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar")) +
labs(title = "Education", x = "Month", y = "Amount") +
theme_minimal()
p7
data9<- subset(data, Category=='Home Improvement')
p8<- ggplot(data9, aes(x = Month, y=Amount)) +
geom_bar(stat = "identity", fill = "LightGreen")+
scale_x_discrete(limits=c("Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar")) +
labs(title = "Spending per Month from 2017 April to 2018 March", x = "Month", y = "Amount") +
theme_minimal()
p8
data10<- subset(data, Category=='Services')
p9<- ggplot(data10, aes(x = Month, y=Amount)) +
geom_bar(stat = "identity", fill = "LightGreen")+
scale_x_discrete(limits=c("Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar")) +
labs(title = "Services", x = "Month", y = "Amount") +
theme_minimal()
p9
data11<- subset(data, Category=='Supermarkets')
p10<- ggplot(data11, aes(x = Month, y=Amount)) +
geom_bar(stat = "identity", fill = "LightGreen")+
scale_x_discrete(limits=c("Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar")) +
labs(title = "Supermarkets", x = "Month", y = "Amount") +
theme_minimal()
p10
data12<- subset(data, Category=='Travel/ Entertainment')
p11<- ggplot(data12, aes(x = Month, y=Amount)) +
geom_bar(stat = "identity", fill = "LightGreen")+
scale_x_discrete(limits=c("Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar")) +
labs(title = "Travel/ Entertainment", x = "Month", y = "Amount") +
theme_minimal()
p11
data13<- subset(data, Category=='Payments and Credits')
p12<- ggplot(data13, aes(x = Month, y=Amount)) +
geom_bar(stat = "identity", fill = "LightGreen")+
scale_x_discrete(limits=c("Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar")) +
labs(title = "Payments and Credits", x = "Month", y = "Amount") +
theme_minimal()
p12
```
***
We see that for most months, the spending was normal. However, in July 2017, December 2017 and March 2018, it is higher than the other months.
###Q4: In July 2017, December 2017 and March 2018, what is the spending in each category?
```{r}
data2<- subset(data, Month=='7')
p1<- ggplot(data2, aes(x = Category, y=Amount, fill=Category)) +
geom_bar(stat = "identity", fill = "LightGreen")+
labs(title = "Spending of July 2017 per Category", x = "Category", y = "Amount") +
coord_flip()
p1
data3<- subset(data, Month=='12' )
p2<- ggplot(data3, aes(x = Category, y=Amount, fill=Category)) +
geom_bar(stat = "identity", fill = "LightGreen")+
labs(title = "Spending of December 2017 per Category", x = "Category", y = "Amount") +
coord_flip()
p2
data4<- subset(data, Month=='3' )
p3<- ggplot(data4, aes(x = Category, y=Amount, fill=Category)) +
geom_bar(stat = "identity", fill = "LightGreen")+
labs(title = "Spending of March 2018 per Category", x = "Category", y = "Amount") +
coord_flip()
p3
```
***
From the plot we can see that, still, the most spending is Merchandise. While in July 2017, the spending of Restaurants is the most.
By checking detailed information. I have ordered too much salad from Pret a Manager.
###Q5: Comparison, what is the difference
```{r}
comp<-data.frame(comp)
comp$Year <- as.factor(comp$Year)
ggplot(data=comp, aes(x=Category, y=Amount)) +
geom_bar(aes(fill = Year), position = "dodge", stat="identity") +
labs(title = "Spending per Category", x = "Category", y = "Amount") +
theme(legend.position = "Right") +
scale_fill_discrete(name="Year") +
theme(legend.position = "bottom") +
coord_flip()
```
***
Even the time period is not the same, but I would like to see the comparison between each category.
Since the purpose was to compare the spending between 2017 and 2018, a grouped bar chart was used.
The data for 2018 only covers 3 months, while for 2017, it covers 9 months.
From the plot we can see that, anyway, spending in 2018 is much less than that in 2017, especially Restaurant.
The main reason is that during 2017, the card was just opened. And I am trying to use it more. what's more, I don't have too much money to eat outside, shop at department stores, travel, etc. Start from 2018, I have to pay the tuition by myself, so there is no more extra money to spend and pay the balance:(