Introduction


The Quantified Self Project

The goal of this project is to analyze, visualize and summarize my credit card spending pattern over the last two years.

Data Summary

Data was collected from my credit card statements from Discover, US Bank and AMEX from June 2017 to June 2019. These 3 data sources were combined together and credit transactions were removed, followed by some data cleanup.

Attributes: TransactionDate, TransactionMonth, TransactionYear, LongDescription, Description, City, State, Latitude, Longitude, Amount, Category, SubCategory.

Description, City and State were extracted from LongDescription (credit card transaction name). Corresponding Longitude and Latitude values were collected from https://www.latlong.net/. SubCategory was derived from Category and Description.

Questions
  1. Which categories did I spend the most in?
  2. How did my categorical spend compare across years?
  3. In Travel/Entertainment, which subcategories did I spend the most in?
  4. In Restaurants, what was the share of subcategorical spend?
  5. How did my average monthly spend vary over time?
  6. How did my monthly transaction frequency (and average monthly spend) vary over time?
  7. How was my spread of spend by location, and which were the top 5 states?
Visualization Methods

ggplot and ggplotly were used to create bar graphs, pie charts, line graphs and bubble charts, while ggmap was used to create spatial visualization.

Question 1


Which categories did I spend the most in?

I spent the most on Travel/Entertainment ($10340), followed by Services ($6689) and Restaurants ($6232), while I spent the least on Automotive ($272) in the last two years. Travel/Entertainment included data such as hotels, rental cars and flights, Services included monthly subscriptions and insurance, and Restaurants included food and drinks.

Question 2


How did my categorical spend compare across years?

Since only 2018 had a complete calendar year’s data, reading this output was not straightforward. However, I could predict outcomes based on 6 of months data in each of 2017 and 2019. I will have spent a lot more on Travel/Entertainment in 2019 compared to other years, considering I still have 2 trips to India in the next 6 months. I will have spent less on Services in 2019 compared to 2018 since my insurance premiums dropped while other areas have stayed the same or marginally increased. I will have spent considerably less on Merchandise in 2019 compared to other years, as I did spend a lot of money on furniture and online shopping in 2017 and 2018 when I moved between cities.

Question 3


In Travel/Entertainment, which subcategories did I spend the most in?

Rightly so, flights constituted the highest expense ($5503), with the next highest being ridesharing services such as Uber and Lyft ($1024) and lodging such as Airbnb and hotels ($777).

Question 4


In Restaurants, what was the share of subcategorical spend?

Though my life habits have changed over the last 3-4 years, where I moved from eating out daily to cooking more often, I still have spent a considerable amount on Restaurants. I spent more on gourmet and casual dining places (63%) rather than fast food joints (20%) over the last 2 years, while I have indulged in junk food and alcoholic beverages from time to time. I have cut down on my sweet tooth tendencies considerably (2%).

Question 5


How did my average monthly spend vary over time?

I spent more in November/December on average in both 2017 and 2018. This included traveling to New York to meet my cousins, buying gifts for family, taking a break in San Diego and my Black Friday exploits. In 2019, I spent a lot more in June as I had a trip to Harrisburg and also planned a trip to Portland for July.

Question 6


How did my monthly transaction frequency (and average monthly spend) vary over time?

Over the last 2 years, on average I had around 30-40 transactions. In November and December 2018 and June 2019, per my previous observations, I did spend more (size of bubbles) even though the number of transactions is not that high. In July 2018, I made a few road trips to Chicago, and tried out several restaurants both in Chicago and Saint Louis which increased the number of transactions drastically while keeping the average amount pretty low.

Question 7


How was my spread of spend by location, and which were the top 5 states?

State Count MeanAmount
MO 641 45.35
IL 73 19.37
MI 30 32.74
PA 23 10.54
CA 19 30.81
NY 19 26.04

Most of my expenses were around the Midwest, with occasional spurts in the East and West coasts. Travelling contributed to this spread of spend across the US. Looking at the top 5 states that I spent money in, Missouri, my home state, comes out on top as expected, while Illinois comes next since I went to school there and traveled several times to Chicago during the last 2 years. I stayed with my friends in Michigan while job hunting after school, which explains my spending pattern in that state.

---
title: "RichyVarghese_ANLY512-90-O-2019-LateSpring_TheQuantifiedSelf"
output: 
  flexdashboard::flex_dashboard:
    storyboard: true
    social: menu
    source: embed
    orientation: columns
    vertical_layout: fill
---

```{r setup, include=FALSE}
library(flexdashboard)
library(dygraphs)
library(xts)
library(readxl)
library(ggplot2)
library(plotly)
library(ggmap)
library(foreign)
library(gbm)
library(dplyr)
library(formattable)

ExpenseData <- read_excel("C:/Users/Richy/Downloads/ANLY512_ExpensesData.xlsx")
ExpenseData$TransactionDate <- as.Date(ExpenseData$TransactionDate, format = "%Y-%m-%d")
ExpenseData$TransactionYear <- as.factor(ExpenseData$TransactionYear)
```

###Introduction {data-commentary-width=600}

```{r}
knitr::include_graphics("C:/Users/Richy/Downloads/CreditCards.jpg")
```

***
####**The Quantified Self Project**
The goal of this project is to analyze, visualize and summarize my credit card spending pattern over the last two years.

#####**Data Summary**
Data was collected from my credit card statements from Discover, US Bank and AMEX from June 2017 to June 2019. These 3 data sources were combined together and credit transactions were removed, followed by some data cleanup.

**Attributes**: TransactionDate, TransactionMonth, TransactionYear, LongDescription, Description, City, State, Latitude, Longitude, Amount, Category, SubCategory.

Description, City and State were extracted from LongDescription (credit card transaction name). Corresponding Longitude and Latitude values were collected from https://www.latlong.net/. SubCategory was derived from Category and Description.

#####**Questions**
1. Which categories did I spend the most in?
2. How did my categorical spend compare across years?
3. In Travel/Entertainment, which subcategories did I spend the most in?
4. In Restaurants, what was the share of subcategorical spend?
5. How did my average monthly spend vary over time?
6. How did my monthly transaction frequency (and average monthly spend) vary over time?
7. How was my spread of spend by location, and which were the top 5 states?

#####**Visualization Methods**
ggplot and ggplotly were used to create bar graphs, pie charts, line graphs and bubble charts, while ggmap was used to create spatial visualization.

###Question 1 {data-commentary-width=500}

```{r}
CategoryTotals <- ExpenseData %>% group_by(Category) %>% summarise(Amount = sum(Amount))
CategoryTotals$Category <- as.character(CategoryTotals$Category)
CategoryTotals$Category <- factor(CategoryTotals$Category, levels = c("Automotive", "Medical Services", "Government Services", "Gasoline", "Supermarkets", "Merchandise", "Restaurants", "Services", "Travel/Entertainment"))
Q1 <- ggplot(CategoryTotals, aes(x = Category, y = Amount, fill = Category)) +
  geom_bar(stat = "identity") +
  labs(title = "Spend by Category", x = "Category", y = "Total Amount ($)") +
  coord_flip() +
  theme(legend.title = element_blank(), legend.position = "none")
ggplotly(Q1)
```

***
####**Which categories did I spend the most in?**

I spent the most on Travel/Entertainment ($10340), followed by Services ($6689) and Restaurants ($6232), while I spent the least on Automotive ($272) in the last two years. Travel/Entertainment included data such as hotels, rental cars and flights, Services included monthly subscriptions and insurance, and Restaurants included food and drinks.

###Question 2 {data-commentary-width=500}

```{r}
YearCategoryTotals <- ExpenseData %>% group_by(Category, TransactionYear) %>% summarise(Amount = sum(Amount))
Q2 <- ggplot(YearCategoryTotals, aes(x=Category, y=Amount)) +   
  geom_bar(aes(fill = TransactionYear), position = "dodge", stat="identity") +
  labs(title = "Spend by Category across Years", x = "Category", y = "Total Amount ($)") +
  scale_fill_discrete(name="Year") +
  coord_flip()
ggplotly(Q2)
```

***
####**How did my categorical spend compare across years?**

Since only 2018 had a complete calendar year's data, reading this output was not straightforward. However, I could predict outcomes based on 6 of months data in each of 2017 and 2019. I will have spent a lot more on Travel/Entertainment in 2019 compared to other years, considering I still have 2 trips to India in the next 6 months. I will have spent less on Services in 2019 compared to 2018 since my insurance premiums dropped while other areas have stayed the same or marginally increased. I will have spent considerably less on Merchandise in 2019 compared to other years, as I did spend a lot of money on furniture and online shopping in 2017 and 2018 when I moved between cities.

###Question 3 {data-commentary-width=500}

```{r}
TravEnterData <- subset(ExpenseData, Category == "Travel/Entertainment")
TravEnterSubCatTotals <- TravEnterData %>% group_by(SubCategory) %>% summarise(Count = n(), Amount = sum(Amount))
TravEnterSubCatTotals$SubCategory <- as.character(TravEnterSubCatTotals$SubCategory)
TravEnterSubCatTotals$SubCategory <- factor(TravEnterSubCatTotals$SubCategory, levels = c("Music Subscription", "Museum", "Electric Scooter", "Festival", "Games", "Metro", "Concert", "Train", "Movies", "Theater", "Streaming Subscription", "Rental Car", "Tourism", "Theme Park", "Bus", "Lodging", "Rideshare", "Flight"))
Q3 <- ggplot(TravEnterSubCatTotals, aes(x = SubCategory, y = Amount, fill = SubCategory)) +
  geom_bar(stat = "identity") +
  labs(title = "Travel/Entertainment Spend by SubCategory", x = "SubCategory", y = "Total Amount ($)") +
  coord_flip() +
  theme(legend.title = element_blank(), legend.position = "none")
ggplotly(Q3)
```

***
####**In Travel/Entertainment, which subcategories did I spend the most in?**

Rightly so, flights constituted the highest expense ($5503), with the next highest being ridesharing services such as Uber and Lyft ($1024) and lodging such as Airbnb and hotels ($777).

###Question 4 {data-commentary-width=500}

```{r}
RestaurantData <- subset(ExpenseData, Category == "Restaurants")
RestSubCatTotals <- RestaurantData %>% group_by(SubCategory) %>% summarise(Count = n(), Amount = sum(Amount))
Q4 <- ggplot(RestSubCatTotals, aes(x = "", y = Amount, fill = SubCategory)) +
  geom_bar(width = 1, stat = "identity") +
  coord_polar("y", start = 0) +
  theme_void() +
  geom_text(label = c("63.3%", "20.3%", "1.9%", "14.5%"), position = position_stack(vjust = 0.5)) +
  labs(title = "Share of Spend on Restaurants' SubCategory")
Q4
```

***
####**In Restaurants, what was the share of subcategorical spend?**

Though my life habits have changed over the last 3-4 years, where I moved from eating out daily to cooking more often, I still have spent a considerable amount on Restaurants. I spent more on gourmet and casual dining places (63%) rather than fast food joints (20%) over the last 2 years, while I have indulged in junk food and alcoholic beverages from time to time. I have cut down on my sweet tooth tendencies considerably (2%).

###Question 5 {data-commentary-width=500}

```{r}
ExpenseData$MonthYear <- format(as.Date(ExpenseData$TransactionDate), "%Y-%m")
MonthYearMeans <- ExpenseData %>% group_by(TransactionYear, MonthYear) %>% summarise(Amount = mean(Amount))
Q5 <- ggplot(MonthYearMeans, aes(x = MonthYear, y = Amount, group = 1)) +
  geom_line(aes(color = TransactionYear), size = 1) +
  geom_point() +
   xlab("Year") +
  ylab("Mean Amount ($)") +
  theme(axis.text.x = element_blank(), axis.ticks.x = element_blank(), plot.background = element_blank(), legend.title = element_blank()) +
  labs(title = "Average Monthly Spend across Years")
ggplotly(Q5)
```

***
####**How did my average monthly spend vary over time?**

I spent more in November/December on average in both 2017 and 2018. This included traveling to New York to meet my cousins, buying gifts for family, taking a break in San Diego and my Black Friday exploits. In 2019, I spent a lot more in June as I had a trip to Harrisburg and also planned a trip to Portland for July.  

###Question 6 {data-commentary-width=500}

```{r}
MonthYearMeans <- ExpenseData %>% group_by(TransactionYear, MonthYear) %>% summarise(Count = n(), Amount = mean(Amount))
Q6 <- ggplot(MonthYearMeans, aes(x = MonthYear, y = Count, size = Amount)) +
  geom_point(aes(color = TransactionYear)) +
   xlab("Year") +
  ylab("Number of Transactions") +
  theme(axis.text.x = element_blank(), axis.ticks.x = element_blank(), plot.background = element_blank(), legend.title = element_blank()) +
  labs(title = "Monthly Transaction Frequency across Years (Bubble Size = Mean Amount)")
ggplotly(Q6)
```

***
####**How did my monthly transaction frequency (and average monthly spend) vary over time?**

Over the last 2 years, on average I had around 30-40 transactions. In November and December 2018 and June 2019, per my previous observations, I did spend more (size of bubbles) even though the number of transactions is not that high. In July 2018, I made a few road trips to Chicago, and tried out several restaurants both in Chicago and Saint Louis which increased the number of transactions drastically while keeping the average amount pretty low.

###Question 7 {data-commentary-width=500}

```{r}
qmplot(x = Longitude, y = Latitude, data = ExpenseData, source = "google", maptype = "roadmap", legend = "none", colour = I("red"), mapcolor = "color", extent = "panel", main = "Spend by Location", xlab = "Longitute", ylab = "Latitude", zoom = 4)
```

***
####**How was my spread of spend by location, and which were the top 5 states?**

```{r}
StateTransData <- ExpenseData %>% group_by(State) %>% summarise(Count = n(), MeanAmount = round(mean(Amount),2)) %>% subset(Count > 15) %>% arrange(desc(Count))
formattable(StateTransData, align = c("c", "c", "c"))
```

Most of my expenses were around the Midwest, with occasional spurts in the East and West coasts. Travelling contributed to this spread of spend across the US. Looking at the top 5 states that I spent money in, Missouri, my home state, comes out on top as expected, while Illinois comes next since I went to school there and traveled several times to Chicago during the last 2 years. I stayed with my friends in Michigan while job hunting after school, which explains my spending pattern in that state.