The first story is about the general spend trend by date*category, this visualization indicates:
If see the spend in a more aggregated level, see the mean spend by month*category, we got the following insights:
Sometimes mean spend don’t tell a story of the individual expenditures, so from the detailed spend distribution, we get that:
I am also interested in the weekday spending distribution, as we may have different mood for spending money in 7 days a week:
Besides analysis based on Temporal grouping, we can also dig via the Spatial view, first by State:
Also, as Illinois is the place I am living, and we have detailed Spatial data about cities, we take a closer look at my spending within Illinois (please zoom in):
---
title: "ANLY512 - Quantified Self - My Spend with AMEX in Six Months - Wen He"
output:
flexdashboard::flex_dashboard:
storyboard: true
social: menu
source: embed
---
```{r setup, include=FALSE,message=FALSE,warning=FALSE}
library(flexdashboard)
library(xlsx)
library(ggplot2)
library(ggmap)
library(RgoogleMaps)
library(tidyverse)
library(lubridate)
library(stringr)
library(plotly)
library(leaflet)
```
```{r, message=FALSE,warning=FALSE}
credit <- read.xlsx("Credit.xlsx", 1)
credit <- credit[order(credit$Date),]
credit$Year <- year(credit$Date)
credit$Month <- month(credit$Date)
credit$Week <- week(credit$Date)
credit$Weekday <- wday(credit$Date)
credit$State <- str_sub(credit$Location,start = -2)
# Group by
cd_date <- credit %>%
group_by(Date,Category) %>%
summarise(tot_spend =sum(Amount), mean_spend = mean(Amount),median_spend = median(Amount),
max_spend = max(Amount), min_spend = min(Amount),trans_cnt = n())
cd_month <- credit %>%
group_by(Month, Category) %>%
summarise(tot_spend =sum(Amount), mean_spend = mean(Amount),median_spend = median(Amount),
max_spend = max(Amount), min_spend = min(Amount),trans_cnt = n())
cd_week <- credit %>%
group_by(Week) %>%
summarise(tot_spend =sum(Amount), mean_spend = mean(Amount),median_spend = median(Amount),
max_spend = max(Amount), min_spend = min(Amount),trans_cnt = n())
cd_weekday <- credit %>%
group_by(Weekday) %>%
summarise(tot_spend =sum(Amount), mean_spend = mean(Amount),median_spend = median(Amount),
max_spend = max(Amount), min_spend = min(Amount),trans_cnt = n())
cd_Location <- credit %>%
group_by(Location) %>%
summarise(tot_spend =sum(Amount), mean_spend = mean(Amount),median_spend = median(Amount),
max_spend = max(Amount), min_spend = min(Amount),trans_cnt = n())
cd_State <- credit %>%
group_by(State) %>%
summarise(tot_spend =sum(Amount), mean_spend = mean(Amount),median_spend = median(Amount),
max_spend = max(Amount), min_spend = min(Amount),trans_cnt = n())
cd_Location <- data.frame(cd_Location,geocode(as.character(cd_Location$Location)))
cd_State <- data.frame(cd_State,geocode(as.character(cd_State$State)))
theme_set(theme_bw())
```
### Spend by Date*Category
```{r}
# Mean Spend by day by category
p <- ggplot(cd_date, aes(x = as.character(Date), y = mean_spend)) +
geom_bar(aes(fill = Category, color = Category), stat = "identity", alpha = 0.5) +
labs(title = "Six Months Spend Overview by Day & Category",
caption = "Source: AMEX personal Account Summary") +
theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank())
ggplotly(p)
```
***
The first story is about the general spend trend by date*category, this visualization indicates:
- The flactuation of spend is very big if viewing by date
- Teh biggest spend were either coming from Merchandise & Supplies or Transportation
- I only have return orders in category Merchandise & Supplies, which are shown as negative spend
- In detail, teh biggest spend on 6/2 is spend for the change of my car brakes, and the biggest transportation spend on 7/3 is from my purchase of flight tickets, hotel and rental cars preparing for a class trip to HU (Big Investment...each month)
### Spend by Month*Category
```{r}
k <- ggplot(cd_month, aes(x = as.character(Month), y = mean_spend)) +
geom_bar(aes(fill = Category), stat = "identity", alpha = 0.5) +
labs(title = "Mean Spend by Month",
caption = "Source: AMEX personal Account Summary")+
coord_flip()
ggplotly(k)
## distribution of Monthly spend
```
***
If see the spend in a more aggregated level, see the mean spend by month*category, we got the following insights:
- July is the biggest mean spend month
- Travel/Transportation exist in each month because of HU and gas fee for my daily car commute
- Restaurant also exists every month, but in small portion
### Monthly Spend Distribution*Category
```{r}
s <- ggplot(credit, aes(x = as.character(Month), y = Amount)) +
geom_boxplot(aes(color = Category, fill = Category), position = "dodge") +
labs(title = "Distribution of Spend by Month",
caption = "Source: AMEX personal Account Summary")
s
```
***
Sometimes mean spend don't tell a story of the individual expenditures, so from the detailed spend distribution, we get that:
- The biggest outlier is the Merchandise & Supplies spend in June
- The distribution of mean Travel spend didn't flactuate that much month by month
- Business Service spend in June has a wide range
### Spend by Weekday
```{r}
y <- ggplot(credit, aes(x = as.character(Weekday), y = Amount)) +
geom_boxplot() +
labs(title = "Spend distribution by Weekday",
caption = "Source: AMEX personal Account Summary")
ggplotly(y)
```
***
I am also interested in the weekday spending distribution, as we may have different mood for spending money in 7 days a week:
- The distribution is actually identical across 7 days
- what is still interesting here is that on weekends, we have more extreme outliers of spend, which indicates the irrational purchase may be more possibly happening on weekends
### Geo Distribution of Mean Spending by State
```{r}
US <- geocode("United States")
US_osm_map <- qmap("United States", zoom = 4, source = "google", maptype = "toner-lite")
US_osm_map + geom_point(data = cd_State, aes(x = lon, y = lat, size = mean_spend, alpha = mean_spend), color = "red") +
labs(title = "Spend distribution by State",
caption = "Source: AMEX personal Account Summary")
```
***
Besides analysis based on Temporal grouping, we can also dig via the Spatial view, first by State:
- It draws a picture of my spending journey in the four months across serveral states
- Florida is the biggest mean spending place, which is due to the trip there
- PA is always for school, so it exists but not big - don't always have mood for extra purchase when I am on my way to school
### Geo Distribution of Mean Spending Within Illinois
```{r}
leaflet(cd_Location) %>%
addTiles() %>%
addCircles(lng = ~ lon, lat = ~lat, weight = 1,
radius = ~ sqrt(mean_spend)*300, popup = ~paste(seq = "
",
Location,
"$",round(mean_spend,0))
)
```
***
Also, as Illinois is the place I am living, and we have detailed Spatial data about cities, we take a closer look at my spending within Illinois (please zoom in):
- The biggest dot in the northwest suburb is where my home and company locate now, so the biggest spend happened there
- Aurora is the place I shopped a lot, because it's where the best outlets locate..
- Ah, although downtown Chicago is splendid, I stil don't have that much time travel in town and make that much purchases