We’ve entered the era of big data. From the APP on your smart phone, the wearable computers on your wrist, the recommended video for you on YouTube to the people you might be interested in on Twitter & Instagram, they’ve all applied different of algorithms based on user data. Based on the location you’re in, Google Map can show you the recreational activities nearby; Yelp can recommend the resturants you may like. When you log in to the Twitter or Facebook, they will show the trend you like based on your browsing history. In extreme conditions, when data breach happens to your bank, your credit card transaction history maybe the best way of criminals target specific kind of crime to you. Like it or not, we’ll not be living without tons of data. Everyone and everything has been quantified. Therefore, doing quantitative analysis and visualize the result is the best way to understand your lifestyle. I chose the credit card year-end summary of my American Express Gold card and trying to find potential pattern of daily activities.
| Date | Description | Address | Amount | Category |
|---|---|---|---|---|
| 2019-12-30 | GRAND SICHUAN | 342 GROVE ST,JERSEY CITY,NJ,07302-2917 | 23.43 | Restaurant-Restaurant |
| 2019-12-29 | BJ’S WHOLESALE CLUB | 396-420 MARIN BLVD,JERSEY CITY,NJ,07302 | 23.97 | Merchandise & Supplies-Wholesale Stores |
| 2019-12-28 | 99 Ranch Market | 420 GRAND ST,JERSEY CITY,NJ,07302 | 3.70 | Merchandise & Supplies-Groceries |
| 2019-12-28 | 99 Ranch Market | 420 GRAND ST,JERSEY CITY,NJ,07302 | 100.83 | Merchandise & Supplies-Groceries |
| 2019-12-27 | GNC.COM 877-GNC-4700 PA | 935 1ST AVE,KING OF PRUSSIA,PA,19406-1342 | 44.99 | Merchandise & Supplies-Internet Purchase |
| 2019-12-27 | TASTE OF NORTH CHINA | 75 MONTGOMERY STREET,JERSEY CITY,NJ,07302 | 45.95 | Restaurant-Restaurant |
| 2019-12-26 | SPROVE MARKET PLACE | 70 CHRISTOPHER COLUMBUS DR,JERSEY CITY,NJ,07302 | 24.74 | Merchandise & Supplies-Groceries |
| 2019-12-26 | SZECHUAN MOUNTAIN HOUSE | 23 SAINT MARKS PL,MANHATTAN,NY,10003 | 91.17 | Restaurant-Restaurant |
| 2019-12-24 | BJ’S WHOLESALE CLUB | 396-420 MARIN BLVD,JERSEY CITY,NJ,07302 | 16.98 | Merchandise & Supplies-Wholesale Stores |
| 2019-12-24 | GRAND SICHUAN | 342 GROVE ST,JERSEY CITY,NJ,07302-2917 | 40.50 | Restaurant-Restaurant |
| Date | Description | Amount | Address | Category |
|---|---|---|---|---|
| 2018-12-31 | XIAN FAMOUS FOODS | 12.70 | 24 W 45TH ST FRNT 1,NEW YORK,NY,10036,UNITED STATES | Restaurant-Restaurant |
| 2018-12-28 | COUSINS MAINE LOBSTER- CO | 33.44 | 939 HIGH RIDGE RD,STAMFORD,CT,06905-1609,UNITED STATES | Restaurant-Bar & Café |
| 2018-12-27 | 1-800-FLOWERS.COM (800)468-1141 NY | 63.96 | 1 OLD COUNTRY RD,-,CARLE PLACE,NY,11514,UNITED STATES | Merchandise & Supplies-Mail Order |
| 2018-12-27 | DUNKIN DONUTS | 4.78 | 501 5TH AVE,NEW YORK,NY,10018,UNITED STATES OF AMERICA (THE) | Restaurant-Bar & Café |
| 2018-12-26 | McDonald’s | 5.65 | 18 E 42ND ST,NEW YORK,NY,10017,UNITED STATES | Restaurant-Bar & Café |
| 2018-12-26 | Pret A Manger | 5.43 | 62 W 45TH ST FRNT 1,NEW YORK,NY,10036,UNITED STATES | Restaurant-Bar & Café |
| 2018-12-26 | XIAN FAMOUS FOODS | 13.75 | 24 W 45TH ST FRNT 1,NEW YORK,NY,10036,UNITED STATES | Restaurant-Restaurant |
| 2018-12-25 | TASTE OF NORTH CHINA | 37.30 | 75 MONTGOMERY STREET,JERSEY CITY,NJ,07302,UNITED STATES | Restaurant-Restaurant |
| 2018-12-24 | WALGREENS | 10.50 | 1ST NE & MASSACHUSETTS,WASHINGTON,DC,20002,UNITED STATES | Merchandise & Supplies-Pharmacies |
| 2018-12-22 | Le Pain Quotidien | 22.24 | 50 MASSACHUSETTS AVE,-,WASHINGTON,DC,20002,UNITED STATES | Restaurant-Restaurant |
I’ve attached part of the data aside. A picture above is the card I have been using for 2 years as my primary card. The statement was downloaded from American Express official website as Excel file, after cleaning and sorting, it was saved as .csv file and imported into R. Then we did analysis to the spending in the year 2019. I also compared monthly spending with data in year 2018, which we used the same way to acquire and store data.
Below are the five questions arose from the data:
Q1: What is the spending structure by month? How dose it compare to average spending? What’s the logic behind that?
Q2: What are the top visited locations? Does every of those in accordance with my memory?
Q3: How about the spending in year 2019 compare to year 2018? What’s the difference and is there any correlation?
Q4: What’s the spending structure in specific sub-category? Any logic behid that?
Q5: Make a word cloud to visualize the merchants name and trying to find out some patterns.
The average spending per month was around $1600. After drawing the spending by month, I was surprised by the distribution of how much money spent in July, August and September. I thought the highest spending should be in holiday season. After carefully check the statement, I got the conclusion: I resigned from previous company at the end of June. In July, I sprained my ankle while playing soccer, so I paid a fortune for Urgentcare visit, crutch purchase, X-Ray and more online shopping and food deliveries. Then in August, I found out the necessity of insurance, so I purchased insurance by myself, which caused the surge of spending this month. Also in August, my wife passed the level III of CFA test, so we booked a boutique steakhouse for celebration. Then in September, I enrolled in graduate school and paied the tuition, round-trip train tickets, hotel reservation. In the following holiday season, I had to choose to reduce expenditure and only purchased necessary goods.
After plotting on the Google Map, I found out the top 6 visited locations are mostly supermarkets and Chinese restaurants, and that in accordance with my lifestyle. I love shopping and cooking, especially Chinese food. Every weekend, I will do shopping at the nearby locations: ShopRite and BJ’s. Sometimes I will do additional shopping on the Sprove Market Place near subway station after came back from work. Biweekly I will visit local Chinese supermarket names 99 Ranch Market for some Chinese veggie or seasonings, as well as bubble tea! Yet, there’re some cuisine that you’re not able to cook at home due to the limitation of cookingware or cooking skills, I will buy food from local Chinese restaurants called Taste of North China. I’m also a coffee lover and tend to get cheap & fast coffee at coffee shop down the block. However, I remembered to have ordered Starbucks & McDonald’s most. After checking the phone, I realized that usually I would order these 2 through APP, but Dunkin Donuts APP doesn’t work on my phone, so the statement shows I have visited DD/BR most of time for coffee.
I’ve plotted the comparation of spending between year 2018 & 2019. As you can see, I have been using this card more in 2019 than 2018 in each month. After looing up the card information, I found that in Oct 2018, Amex announced the ‘new’ Gold Card: they’ve improved the card benefit tremendously. The Amex Gold Card now has 4X Points at restaurants worldwide, 4X points at U.S. supermarkets, 3X points for flights booked directly with airlines and USD 120 dinning credit at participating restaurants. Since then, I used the card as my primary credit card, and that’s why a soar in spending after Sep 2018 - I need to shopping groceries and foods anyway, why not use a card with more cashback? Besides the trend, each August I’ll spend around USD 1000 more than other months. He gave us the explanation : first, the card charges annual membership fee which is 295 per year in August; second, there’s always more discount from merchants in summer, so I tend to purchase more cloths online when there’s a good deal; third, I’ll have more outdoor recreational activities in summer months, which causes more spending in August.
The figure shows sub-categories of merchandise spending. I spent more than half of money on grocery shopping, which gives me best cashback bonus; clothing stores and internet purchases ranked the sencond and third, which I bought more than 10 pairs of shoes from Adidas last year for my wife and me due to hugh discount. And since we just moved into our new house in Jan 2018, we spent quite a few on furniture shopping from IKEA.com, which caused internet purchase also takes up more than 1/10.
The figure shows sub-categories of business spending. Nearly 3/4 of business spending falls on health care services. Actually, in year 2019, I suffered from lower-back pain, tendinitis and shoulder pain for a long time. I went to physical therapy 3 times a week for 3 months, chiropractor twice a week for 2 month. I always orders food or coffee online and pick up in store, so the internet services were mostly consist of food ordering, which takes up more than 13% of spending. I passed my FRM exam last year. As preparation, I printed quite a few documents in FedEx, and that’s where the nearly 6% printing & publishing spending came from.
Top Merchants
I’ve plotted the top merchants visited in year 2019 by amount spent. It’s a supplement of Q2 while I’ve listed top 80 merchants. Among which, you can see the grocery stores, wholesale store and Chinese restaurants. Besides, I’m also a fan of Adidas, Macy’s and The Cheesecake Factory. I spent quite a fortune on Amtrak tickets which are all round-trip tickets between New York and Harrisburg. I visited doctor named ‘Proactive health and rehabilitation’ for my lower-back pain treatment. After I’ve finished our research, I found out that it’s actually quite interesting if you quantify your daily life and visualize it. It really helps you to know yourself better.
---
title: 'The Quantified Self'
author: 'Yu Bai'
date: 'Apr 21, 2020'
output:
flexdashboard::flex_dashboard:
storyboard: true #as shown on flexdashboard, segement as different board
orientation: columns
vertical_layout: fill
source: embed
---
```{r setup, include=FALSE}
library(flexdashboard)
library(lubridate) #time convert
library(ggplot2)
library(zoo) #time series
library(plotly) #interactive plot
library(ggmap) #plot on google map
library(qmap) #plot on google map
library(data.table) #contain first/last function
library(knitr) #insert picture
library(dplyr)
library(shiny)
library(DT)
library(dygraphs)
library(wordcloud2)
library(webshot)
library(htmlwidgets)
setwd('C:/Zillionaire/Study_Material/HU/HU Courses/ANLY 512')
```
### Introduction of Project

***
We've entered the era of big data. From the APP on your smart phone, the wearable computers on your wrist, the recommended video for you on YouTube to the people you might be interested in on Twitter & Instagram, they've all applied different of algorithms based on user data. Based on the location you're in, Google Map can show you the recreational activities nearby; Yelp can recommend the resturants you may like. When you log in to the Twitter or Facebook, they will show the trend you like based on your browsing history. In extreme conditions, when data breach happens to your bank, your credit card transaction history maybe the best way of criminals target specific kind of crime to you.
Like it or not, we'll not be living without tons of data. Everyone and everything has been quantified. Therefore, doing quantitative analysis and visualize the result is the best way to understand your lifestyle. I chose the credit card year-end summary of my American Express Gold card and trying to find potential pattern of daily activities.
### Data Collection and Preparation
```{r}
year_2019 <- read.csv('2019.csv')
year_2019$Date <- as.Date(year_2019$Date, format = '%m/%d/%y') #change date column data type to 'Date'
dec_19 <- year_2019[year_2019$Date >= '2019-12-01' & year_2019$Date <= '2019-12-31',] #select data of Dec 2019
nov_19 <- year_2019[year_2019$Date >= '2019-11-01' & year_2019$Date <= '2019-11-30',] #select data of Nov 2019
#date_index <- seq(from=as.Date('2019-01-01'), by='month', length.out=12) create a vector of every 1st day in each month
sum_by_month <- aggregate(year_2019$Amount, by = list(month(year_2019$Date)), FUN = sum) #aggregate spending by month
month_index <- as.data.frame(yearmon(2019+0:11/12)) #create a vector of 12 months in a year
spending_by_month <- cbind(month_index, sum_by_month)[,-2] #refine the dataframe, on column of date and the other as spending
colnames(spending_by_month)[1:2] <-c('Month', 'Spending')
year_2018 <- read.csv('2018.csv')
year_2018$Date <- as.Date(year_2018$Date, format = '%m/%d/%y')
sum_by_month_18 <- aggregate(year_2018$Amount, by = list(month(year_2018$Date)), FUN = sum)
sum_by_month_18 <- rbind(0,0,sum_by_month_18)
month_index_18 <- as.data.frame(yearmon(2018+0:11/12)) #create a vector of 12 months in a year
spending_by_month_18 <- cbind(month_index_18, sum_by_month_18)[,-2] #refine the dataframe, on column of date and the other as spending
colnames(spending_by_month_18)[1:2] <-c('Month', 'Spending')
month_comp <- c(1:12)
year_comp <- data.frame(cbind(month_comp, spending_by_month, spending_by_month_18)[,-c(2,4)])
colnames(year_comp)[1:3] <-c('Month', '2019', '2018')
kable(head(year_2019, 10), align = 'c', format = 'html', caption = '2019 Year-end Summary') #display data table in dashboard, align text in center
kable(head(year_2018, 10), align = 'c', format = 'html', caption = '2018 Year-end Summary')
```
***
{width=290px}
I've attached part of the data aside. A picture above is the card I have been using for 2 years as my primary card. The statement was downloaded from American Express official website as Excel file, after cleaning and sorting, it was saved as .csv file and imported into R. Then we did analysis to the spending in the year 2019. I also compared monthly spending with data in year 2018, which we used the same way to acquire and store data.
Below are the five questions arose from the data:
Q1: What is the spending structure by month? How dose it compare to average spending? What's the logic behind that?
Q2: What are the top visited locations? Does every of those in accordance with my memory?
Q3: How about the spending in year 2019 compare to year 2018? What's the difference and is there any correlation?
Q4: What's the spending structure in specific sub-category? Any logic behid that?
Q5: Make a word cloud to visualize the merchants name and trying to find out some patterns.
### Q1: Spending By Month
```{r}
ggplot(spending_by_month, aes(x = factor(Month), y = Spending)) + #use factor to ensure every x value shown on x axis
geom_bar(stat = 'identity', fill = 'lightblue') + #bar chart
ggtitle('Spending By Month') +
geom_text(aes(label = Spending), vjust = -0.3, color = 'black', size = 2.8) + #add value on the top of each bar
theme(axis.text.x = element_text(angle = 45, vjust = 0.5), panel.background = element_rect(fill = 'grey95', color = 'grey95', size = 0.5, linetype = 'solid')) + #change font angle,size,background color
xlab('Month') + ylab('Spendings($)') +
geom_line() +
geom_hline(aes(yintercept = mean(Spending), linetype = 'Average Spending'), color = 'red', data = spending_by_month, show.legend = T) + #add a line as average spending
scale_linetype_manual(name = 'Legend', values = 2) #add legend of line and define line type
```
***
The average spending per month was around $1600. After drawing the spending by month, I was surprised by the distribution of how much money spent in July, August and September. I thought the highest spending should be in holiday season. After carefully check the statement, I got the conclusion: I resigned from previous company at the end of June. In July, I sprained my ankle while playing soccer, so I paid a fortune for Urgentcare visit, crutch purchase, X-Ray and more online shopping and food deliveries. Then in August, I found out the necessity of insurance, so I purchased insurance by myself, which caused the surge of spending this month. Also in August, my wife passed the level III of CFA test, so we booked a boutique steakhouse for celebration. Then in September, I enrolled in graduate school and paied the tuition, round-trip train tickets, hotel reservation. In the following holiday season, I had to choose to reduce expenditure and only purchased necessary goods.
### Q2: GGMAP of Frequently Visited Locations
```{r}
usual_loc <- as.data.frame(sort(table(year_2019$Address), decreasing = T)[1:10])
#find out 10 most visited locations and create a dataframe
home <- '88 Morgan St, Jersey City, NJ 07302' #set up home address
route_1 <- trek(home, as.character(usual_loc$Var1[1]), structure = 'route')
#trek function create a route between to addresses on google map, the route value is numeric, then use qmap to draw on the map
qmap(home, zoom = 15) +
geom_path(
aes(x = lon, y = lat), color = 'blue',
size = 1.5, alpha = .5,
data = route_1, lineend = 'round'
) + ggtitle('ShopRite - Grocery') + geom_point( aes(x = lon, y = lat), data = tail(route_1,1), color = 'red', size = 3) + geom_point( aes(x = lon, y = lat), data = head(route_1,1), color = 'green', size = 3)
#geom_path draws a line, geom_point add a point on the map(u can add multiple points)
route_2 <- trek(home, as.character(usual_loc$Var1[2]), structure = 'route')
qmap(home, zoom = 15) +
geom_path(
aes(x = lon, y = lat), color = 'blue',
size = 1.5, alpha = .5,
data = route_2, lineend = 'round'
) + ggtitle('99 Ranch Market - Chinese Supermarket') + geom_point( aes(x = lon, y = lat), data = route_2[22,], color = 'red', size = 3) + geom_point( aes(x = lon, y = lat), data = head(route_2,1), color = 'green', size = 3)
route_3 <- trek(home, as.character(usual_loc$Var1[3]), structure = 'route')
qmap(home, zoom = 15) +
geom_path(
aes(x = lon, y = lat), color = 'blue',
size = 1.5, alpha = .5,
data = route_3, lineend = 'round'
) + ggtitle("BJ's - Wholsale Club") + geom_point( aes(x = lon, y = lat), data = tail(route_3,1), color = 'red', size = 3) + geom_point( aes(x = lon, y = lat), data = head(route_3,1), color = 'green', size = 3)
route_4 <- trek(home, as.character(usual_loc$Var1[4]), structure = 'route')
qmap(home, zoom = 15) +
geom_path(
aes(x = lon, y = lat), color = 'blue',
size = 1.5, alpha = .5,
data = route_4, lineend = 'round'
) + ggtitle('SPROVE MARKET PLACE - Grocery near Subway') + geom_point( aes(x = lon, y = lat), data = tail(route_4,1), color = 'red', size = 3) + geom_point( aes(x = lon, y = lat), data = head(route_4,1), color = 'green', size = 3)
route_5 <- trek(home, as.character(usual_loc$Var1[5]), structure = 'route')
qmap(home, zoom = 15) +
geom_path(
aes(x = lon, y = lat), color = 'blue',
size = 1.5, alpha = .5,
data = route_5, lineend = 'round'
) + ggtitle('TASTE OF NORTH CHINA - Chinese Resturant') + geom_point( aes(x = lon, y = lat), data = tail(route_5,1), color = 'red', size = 3) + geom_point( aes(x = lon, y = lat), data = head(route_5,1), color = 'green', size = 3)
route_6 <- trek(home, as.character(usual_loc$Var1[7]), structure = 'route')
qmap(home, zoom = 12) +
geom_path(
aes(x = lon, y = lat), color = 'blue',
size = 1.5, alpha = .5,
data = route_6, lineend = 'round'
) + ggtitle('Dunkin Donuts - Coffee Shop Near Company') + geom_point( aes(x = lon, y = lat), data = tail(route_6,1), color = 'red', size = 3) + geom_point( aes(x = lon, y = lat), data = head(route_6,1), color = 'green', size = 3)
```
***
After plotting on the Google Map, I found out the top 6 visited locations are mostly supermarkets and Chinese restaurants, and that in accordance with my lifestyle. I love shopping and cooking, especially Chinese food. Every weekend, I will do shopping at the nearby locations: ShopRite and BJ's. Sometimes I will do additional shopping on the Sprove Market Place near subway station after came back from work. Biweekly I will visit local Chinese supermarket names 99 Ranch Market for some Chinese veggie or seasonings, as well as bubble tea! Yet, there're some cuisine that you're not able to cook at home due to the limitation of cookingware or cooking skills, I will buy food from local Chinese restaurants called Taste of North China. I'm also a coffee lover and tend to get cheap & fast coffee at coffee shop down the block. However, I remembered to have ordered Starbucks & McDonald's most. After checking the phone, I realized that usually I would order these 2 through APP, but Dunkin Donuts APP doesn't work on my phone, so the statement shows I have visited DD/BR most of time for coffee.
### Q3: Spending Comparation Between 2018 & 2019
```{r}
dygraph(year_comp, main = 'Year 2019 & 2018 Comparation') %>%
dyAxis('y', label = 'Spending Amount($)') %>%
dyAxis('x', label = 'Month', valueRange = c(1:12), drawGrid = F) %>%
dyOptions(drawPoints = TRUE, pointSize = 5, pointShape = "triangle", fillGraph = F, drawXAxis = T,
drawYAxis = T, includeZero = T, drawAxesAtZero = F, disableZoom = T, colors = c('steelblue', 'green'))
#draw line graph to compare 2018 & 2019
```
***
I've plotted the comparation of spending between year 2018 & 2019. As you can see, I have been using this card more in 2019 than 2018 in each month. After looing up the card information, I found that in Oct 2018, Amex announced the 'new' Gold Card: they've improved the card benefit tremendously. The Amex Gold Card now has 4X Points at restaurants worldwide, 4X points at U.S. supermarkets, 3X points for flights booked directly with airlines and USD 120 dinning credit at participating restaurants. Since then, I used the card as my primary credit card, and that's why a soar in spending after Sep 2018 - I need to shopping groceries and foods anyway, why not use a card with more cashback? Besides the trend, each August I'll spend around USD 1000 more than other months. He gave us the explanation : first, the card charges annual membership fee which is 295 per year in August; second, there's always more discount from merchants in summer, so I tend to purchase more cloths online when there's a good deal; third, I'll have more outdoor recreational activities in summer months, which causes more spending in August.
### Q4 Figure1: Spending By Sub-category - Merchandise
```{r}
#draw donut chart for merchandise & business
biz_cat <- year_2019[agrep('Business', year_2019$Category),]
mer_cat <- year_2019[agrep('Merchandise', year_2019$Category),]
#agrep function does fuzzy matching, I use it to categorize different spending types and put them into different dataframes
mer_cat_sum <- aggregate(mer_cat$Amount, by = list(factor(mer_cat$Category)), FUN = sum)
colnames(mer_cat_sum)[1:2] <-c('Cat', 'Amount')
plot_ly(mer_cat_sum, type='pie', values = ~Amount, labels = ~Cat,
hole = 0.6, textposition = 'inside', textinfo = 'percent',
marker = list(line = list(color = 'ivory1', width = 1))) %>%
layout(title = 'Merchandise Sub-categories Spending Structure', showlegend = T,
xaxis = list(showgrid = F, zeroline = FALSE, showticklabels = F),
yaxis = list(showgrid = F, zeroline = FALSE, showticklabels = F))
#plot_ly draw interactive chart, type = pie to draw pir chart, hole = 0.6 make it a donut chart
#also make text position inside the slice and only shows percentage
```
***
The figure shows sub-categories of merchandise spending. I spent more than half of money on grocery shopping, which gives me best cashback bonus; clothing stores and internet purchases ranked the sencond and third, which I bought more than 10 pairs of shoes from Adidas last year for my wife and me due to hugh discount. And since we just moved into our new house in Jan 2018, we spent quite a few on furniture shopping from IKEA.com, which caused internet purchase also takes up more than 1/10.
### Q4 Figure2: Spending By Sub-category - Business
```{r}
biz_cat_sum <- aggregate(biz_cat$Amount, by = list(factor(biz_cat$Category)), FUN = sum)
colnames(biz_cat_sum)[1:2] <-c('Cat', 'Amount')
plot_ly(biz_cat_sum, type='pie', values = ~Amount, labels = ~Cat,
hole = 0.6, textposition = 'inside', textinfo = 'percent',
marker = list(line = list(color = 'ivory1', width = 1))) %>%
layout(title = 'Business Sub-categories Spending Structure', showlegend = T,
xaxis = list(showgrid = F, zeroline = FALSE, showticklabels = F),
yaxis = list(showgrid = F, zeroline = FALSE, showticklabels = F))
```
***
The figure shows sub-categories of business spending. Nearly 3/4 of business spending falls on health care services. Actually, in year 2019, I suffered from lower-back pain, tendinitis and shoulder pain for a long time. I went to physical therapy 3 times a week for 3 months, chiropractor twice a week for 2 month. I always orders food or coffee online and pick up in store, so the internet services were mostly consist of food ordering, which takes up more than 13% of spending. I passed my FRM exam last year. As preparation, I printed quite a few documents in FedEx, and that's where the nearly 6% printing & publishing spending came from.
### Q5: Word Cloud for Top Merchants
```{r}
sum_by_merchant <- aggregate(year_2019$Amount, by = list(year_2019$Description), FUN = sum)
mer_freq <- as.data.frame(table(year_2019$Description))
mer_com <- cbind(sum_by_merchant, mer_freq[,2])
colnames(mer_com)[1:3] <- c('Merchant', 'Amount', 'Frequency')
top_mer <- as.data.frame(mer_com[order(mer_com$Amount, decreasing = T),])
#aggregate by merchant name, to calculate how much money spent and how many times visited on each merchant
my_graph <- wordcloud2(top_mer[1:80,1:3], size = .7, gridSize = 20)
saveWidget(my_graph, "tmp.html", selfcontained = F)
#webshot("tmp.html", "wc1.png", delay = 5, vwidth = 1000, vheight = 1000)
#it should just work if you us wordcloud2 or letterCloud function, but in fact it doesn't work due to package problem which package developer didn't solve. If I use word cloud directly, then the story board won't show any image; thus I have to save it as a picture using saveWidget, then post it onto the web using webshot
#But these 2 functions doesn't work very well, so finally I just uploaded a picture I created in R itself and saved it into my drive
```

***
I've plotted the top merchants visited in year 2019 by amount spent. It's a supplement of Q2 while I've listed top 80 merchants. Among which, you can see the grocery stores, wholesale store and Chinese restaurants. Besides, I'm also a fan of Adidas, Macy's and The Cheesecake Factory. I spent quite a fortune on Amtrak tickets which are all round-trip tickets between New York and Harrisburg. I visited doctor named 'Proactive health and rehabilitation' for my lower-back pain treatment. After I've finished our research, I found out that it's actually quite interesting if you quantify your daily life and visualize it. It really helps you to know yourself better.