quant_self
A project for ANLY512: Data Visualization
The Quantified Self movement grew from the popularity and growth of the internet of things, the mass collection of personal information, and mobile technologies (primarily wearable computing). This final class project uses a collection of 2 years of data on spending and payments captured by discover credit card.
The goal of this project is to collect, analyze and visualize the data using the tools and methods covered in class. Additionally, using the data-driven approach, I will create a summary which answers the following questions based on the data collected: 1) What is the total spending by month in 2018? 2) Which category costed the most in 2018? 3) For Services, what is the spending by month in 2018? 4) In March and November, what is the spending in each category? 5) Compared to 2017, what is the difference?
Trans..Date | Post.Date | Month | Description | Amount | Category |
---|---|---|---|---|---|
1/2/2018 | 1/2/2018 | 1 | 7-ELEVEN 39747 SAN JOSE CA | 14.80 | Gasoline |
1/2/2018 | 1/2/2018 | 1 | SAMS DOWNTOWN FEED AND P SAN JOSE CA | 4.21 | Merchandise |
1/5/2018 | 1/7/2018 | 1 | 7-ELEVEN 39747 SAN JOSE CA | 6.59 | Gasoline |
1/5/2018 | 1/7/2018 | 1 | SOUTHWEST MARKET SAN JOSE CA | 9.36 | Supermarkets |
1/7/2018 | 1/7/2018 | 1 | 7-ELEVEN 39747 SAN JOSE CA | 8.41 | Gasoline |
1/8/2018 | 1/8/2018 | 1 | MAY’S HAIR SALON SAN JOSE CA | 9.00 | Services |
1/9/2018 | 1/9/2018 | 1 | 7-ELEVEN 39747 SAN JOSE CA | 8.41 | Gasoline |
1/9/2018 | 1/9/2018 | 1 | SOUTHWEST MARKET SAN JOSE CA | 6.99 | Supermarkets |
1/10/2018 | 1/10/2018 | 1 | GROCERYONWHEELS.ORG 408-641-5715 CA | 36.41 | Supermarkets |
1/12/2018 | 1/12/2018 | 1 | 7-ELEVEN 39747 SAN JOSE CA | 8.41 | Gasoline |
1/15/2018 | 1/15/2018 | 1 | 7-ELEVEN 39747 SAN JOSE CA | 8.41 | Gasoline |
1/17/2018 | 1/17/2018 | 1 | 7-ELEVEN 39747 SAN JOSE CA | 8.41 | Gasoline |
1/19/2018 | 1/19/2018 | 1 | 7-ELEVEN 39747 SAN JOSE CA | 8.41 | Gasoline |
1/21/2018 | 1/21/2018 | 1 | 7-ELEVEN 39747 SAN JOSE CA | 7.29 | Gasoline |
1/22/2018 | 1/22/2018 | 1 | 7-ELEVEN 39747 SAN JOSE CA | 8.41 | Gasoline |
Category | Amount | Year |
---|---|---|
Automotive | 0 | 2018 |
Department Stores | 0 | 2018 |
Education | 1306 | 2018 |
Gasoline | 269 | 2018 |
Government Services | 0 | 2018 |
Home Improvement | 0 | 2018 |
Medical Services | 415 | 2018 |
Merchandise | 118 | 2018 |
Other/ Miscellaneous | 0 | 2018 |
Restaurants | 940 | 2018 |
Services | 1468 | 2018 |
Supermarkets | 271 | 2018 |
Travel/ Entertainment | 637 | 2018 |
Wholesale Clubs | 0 | 2018 |
Automotive | 0 | 2017 |
There are 7 variables we are interested in:
Year: 2017 or 2018
Since Month is categorical and Amount is continuous, I was aiming to compare the total spending in each month; bar chart was used to summarize the data. I summarized the total amount spent in each month and used bar chart to display the numbers. From the plot we could find that, average spending per month is around 450 dollars. Total spending in March and October is the highest.
Due to the same reason, bar chart was used here. The name of each category is long and cannot be displayed at the bottom, so a rotated bar chart was created. Here, we see the category I spent the most money is Services, approx. $1500, which is much higher than the other categories. The second category is Education, approx. $1300.
We see that for most months, the spending on Services was less than 150 dollars. However, in March, it is 10 times higher than the second highest month.
From the plot we can see that, still, the most spending was in Services for the month of March. Besides, in November, the highest spending was in Medical Services. I checked detailed information. I paid the CFA exam fees in March and was ill in the month of October, which caused most of the spending.
Since the purpose was to compare the spending between 2017 and 2018, a grouped bar chart was used. From the plot we can see that, overall spending in 2017 is much less than that in 2018, especially Merchandise. The main reason is that during 2016, the most purchases were not made by discover card.
---
title: "ANLY 512 Final Project"
author: "Sunny Duggal"
date: "February 22, 2019"
output:
flexdashboard::flex_dashboard:
storyboard: true
social: menu
source: embed
orientation: columns
vertical_layout: fill
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(flexdashboard)
library(knitr)
library(ggplot2)
library(tidyverse)
library(readxl)
library(dplyr)
library(xts)
library(zoo)
library(lubridate)
setwd("C:\\Users\\sunny.duggal\\Downloads")
```
###Overview of the Quantified Self movement

***
A project for ANLY512: Data Visualization
The Quantified Self movement grew from the popularity and growth of the internet of things, the mass collection of personal information, and mobile technologies (primarily wearable computing). This final class project uses a collection of 2 years of data on spending and payments captured by discover credit card.
The goal of this project is to collect, analyze and visualize the data using the tools and methods covered in class. Additionally, using the data-driven approach, I will create a summary which answers the following questions based on the data collected:
1) What is the total spending by month in 2018?
2) Which category costed the most in 2018?
3) For Services, what is the spending by month in 2018?
4) In March and November, what is the spending in each category?
5) Compared to 2017, what is the difference?
### Data preparation
```{r}
data <- read.csv("Discover-2018-YearEndSummary.csv")
kable(data[1:15,], caption="Sample raw data in 2018")
comp <- read.csv("2017.csv")
kable(comp[1:15,], caption="Sample raw data of 2017 and 2018")
```
***
- There are 7 variables we are interested in:
- Tran. Date: Transaction Date
- Post Date
- Month
- Description: Detailed Description of purchases
- Amount: The total amount spent
- Category
- Year: 2017 or 2018
###Q1: What is the total spending by month in 2018?
```{r}
# plot
library(ggplot2)
fill <- "gold1"
line <- "goldenrod2"
p<- ggplot(data, aes(x = Month, y=Amount)) +
geom_bar(stat = "identity", fill = "Blue")+
scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) + labs(title = "Spending per Month", x = "Month", y = "Amount") +
theme_minimal()
p
```
***
Since Month is categorical and Amount is continuous, I was aiming to compare the total spending in each month; bar chart was used to summarize the data.
I summarized the total amount spent in each month and used bar chart to display the numbers.
From the plot we could find that, average spending per month is around 450 dollars. Total spending in March and October is the highest.
###Q2: Which category costed the most in 2018?
```{r}
library(ggplot2)
p<- ggplot(data, aes(x = Category, y=Amount, fill=Category)) +
geom_bar(stat = "identity", fill = "Blue")+
labs(title = "Spending per Category", x = "Category", y = "Amount") +
coord_flip()
p
```
***
Due to the same reason, bar chart was used here. The name of each category is long and cannot be displayed at the bottom, so a rotated bar chart was created.
Here, we see the category I spent the most money is Services, approx. $1500, which is much higher than the other categories. The second category is Education, approx. $1300.
###Q3: For Services, what is the spending by month in 2018?
```{r}
data1<- subset(data, Category=='Services')
p<- ggplot(data1, aes(x = Month, y=Amount)) +
geom_bar(stat = "identity", fill = "Blue")+
scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) +
labs(title = "Spending per Month", x = "Month", y = "Amount") +
theme_minimal()
p
```
***
We see that for most months, the spending on Services was less than 150 dollars. However, in March, it is 10 times higher than the second highest month.
###Q4: In March and October, what is the spending in each category?
```{r}
data2<- subset(data, Month=='3')
p1<- ggplot(data2, aes(x = Category, y=Amount, fill=Category)) +
geom_bar(stat = "identity", fill = "Blue")+
labs(title = "Spending per Category", x = "Category", y = "Amount") +
coord_flip()
p1
data3<- subset(data, Month=='10' )
p2<- ggplot(data3, aes(x = Category, y=Amount, fill=Category)) +
geom_bar(stat = "identity", fill = "DarkGreen")+
labs(title = "Spending per Category", x = "Category", y = "Amount") +
coord_flip()
p2
```
***
From the plot we can see that, still, the most spending was in Services for the month of March. Besides, in November, the highest spending was in Medical Services.
I checked detailed information. I paid the CFA exam fees in March and was ill in the month of October, which caused most of the spending.
###Q5: Compared to 2017, what is the difference
```{r}
comp<-data.frame(comp)
comp$Year <- as.factor(comp$Year)
ggplot(data=comp, aes(x=Category, y=Amount)) +
geom_bar(aes(fill = Year), position = "dodge", stat="identity") +
labs(title = "Spending per Category", x = "Category", y = "Amount") +
theme(legend.position = "Right") +
scale_fill_discrete(name="Year") +
theme(legend.position = "bottom") +
coord_flip()
```
***
Since the purpose was to compare the spending between 2017 and 2018, a grouped bar chart was used.
From the plot we can see that, overall spending in 2017 is much less than that in 2018, especially Merchandise.
The main reason is that during 2016, the most purchases were not made by discover card.