ANLY512 - Quantified Self - My Spend with AMEX in Six Months

---
title: "ANLY512 - Quantified Self - My Spend with AMEX in Six Months - Wen He"
output: 
  flexdashboard::flex_dashboard:
    storyboard: true
    social: menu
    source: embed
  
---

```{r setup, include=FALSE,message=FALSE,warning=FALSE}
library(flexdashboard)
library(xlsx)
library(ggplot2)
library(ggmap)
library(RgoogleMaps)
library(tidyverse)
library(lubridate)
library(stringr)
library(plotly)
library(leaflet)

```


```{r, message=FALSE,warning=FALSE}



credit <- read.xlsx("Credit.xlsx", 1)
credit <- credit[order(credit$Date),]

credit$Year <- year(credit$Date)
credit$Month <- month(credit$Date)
credit$Week <- week(credit$Date)
credit$Weekday <- wday(credit$Date)
credit$State <- str_sub(credit$Location,start = -2)



# Group by
cd_date <- credit %>% 
  group_by(Date,Category) %>% 
  summarise(tot_spend =sum(Amount), mean_spend = mean(Amount),median_spend = median(Amount),
            max_spend = max(Amount), min_spend = min(Amount),trans_cnt = n())


cd_month <- credit %>% 
  group_by(Month, Category) %>% 
  summarise(tot_spend =sum(Amount), mean_spend = mean(Amount),median_spend = median(Amount),
            max_spend = max(Amount), min_spend = min(Amount),trans_cnt = n())

cd_week <- credit %>% 
  group_by(Week) %>% 
  summarise(tot_spend =sum(Amount), mean_spend = mean(Amount),median_spend = median(Amount),
            max_spend = max(Amount), min_spend = min(Amount),trans_cnt = n())


cd_weekday <- credit %>% 
  group_by(Weekday) %>% 
  summarise(tot_spend =sum(Amount), mean_spend = mean(Amount),median_spend = median(Amount),
            max_spend = max(Amount), min_spend = min(Amount),trans_cnt = n())


cd_Location <- credit %>% 
  group_by(Location) %>% 
  summarise(tot_spend =sum(Amount), mean_spend = mean(Amount),median_spend = median(Amount),
            max_spend = max(Amount), min_spend = min(Amount),trans_cnt = n())

cd_State <- credit %>% 
  group_by(State) %>% 
  summarise(tot_spend =sum(Amount), mean_spend = mean(Amount),median_spend = median(Amount),
            max_spend = max(Amount), min_spend = min(Amount),trans_cnt = n())


cd_Location <- data.frame(cd_Location,geocode(as.character(cd_Location$Location)))
cd_State <- data.frame(cd_State,geocode(as.character(cd_State$State)))


theme_set(theme_bw())

```




### Spend by Date*Category



```{r}
# Mean Spend by day by category

p <- ggplot(cd_date, aes(x = as.character(Date), y = mean_spend)) +
  geom_bar(aes(fill = Category, color = Category), stat = "identity", alpha = 0.5) + 
  labs(title = "Six Months Spend Overview by Day & Category",
       caption = "Source: AMEX personal Account Summary") +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank())

ggplotly(p)


```


*** 
The first story is about the general spend trend by date*category, this visualization indicates:

- The flactuation of spend is very big if viewing by date
- Teh biggest spend were either coming from Merchandise & Supplies or Transportation
- I only have return orders in category Merchandise & Supplies, which are shown as negative spend
- In detail, teh biggest spend on 6/2 is spend for the change of my car brakes, and the biggest transportation spend on 7/3 is from my purchase of flight tickets, hotel and rental cars preparing for a class trip to HU (Big Investment...each month)



###  Spend by Month*Category

```{r}

k <- ggplot(cd_month, aes(x = as.character(Month), y = mean_spend)) +
  geom_bar(aes(fill = Category), stat = "identity", alpha = 0.5) + 
  labs(title = "Mean Spend by Month",
       caption = "Source: AMEX personal Account Summary")+
  coord_flip()

ggplotly(k)

## distribution of Monthly spend


```


*** 
If see the spend in a more aggregated level, see the mean spend by month*category, we got the following insights:

- July is the biggest mean spend month
- Travel/Transportation exist in each month because of HU and gas fee for my daily car commute
- Restaurant also exists every month, but in small portion




###  Monthly Spend Distribution*Category

```{r}

s <- ggplot(credit, aes(x = as.character(Month), y = Amount)) +
  geom_boxplot(aes(color = Category, fill = Category), position = "dodge") +
  labs(title = "Distribution of Spend by Month",
       caption = "Source: AMEX personal Account Summary")

s

```

*** 
Sometimes mean spend don't tell a story of the individual expenditures, so from the detailed spend distribution, we get that:

- The biggest outlier is the Merchandise & Supplies spend in June
- The distribution of mean Travel spend didn't flactuate that much month by month
- Business Service spend in June has a wide range




###  Spend by Weekday

```{r}


y <- ggplot(credit, aes(x = as.character(Weekday), y = Amount)) +
  geom_boxplot() + 
  labs(title = "Spend distribution by Weekday",
       caption = "Source: AMEX personal Account Summary")

ggplotly(y)

```


*** 
I am also interested in the weekday spending distribution, as we may have different mood for spending money in 7 days a week:

- The distribution is actually identical across 7 days
- what is still interesting here is that on weekends, we have more extreme outliers of spend, which indicates the irrational purchase may be more possibly happening on weekends



### Geo Distribution of Mean Spending by State



```{r}

US <- geocode("United States")
US_osm_map <- qmap("United States", zoom = 4, source = "google", maptype = "toner-lite") 


US_osm_map + geom_point(data = cd_State, aes(x = lon, y = lat, size = mean_spend, alpha = mean_spend), color = "red") +
labs(title = "Spend distribution by State",
     caption = "Source: AMEX personal Account Summary")

```

*** 
Besides analysis based on Temporal grouping, we can also dig via the Spatial view, first by State:

- It draws a picture of my spending journey in the four months across serveral states
- Florida is the biggest mean spending place, which is due to the trip there
- PA is always for school, so it exists but not big - don't always have mood for extra purchase when I am on my way to school


### Geo Distribution of Mean Spending Within Illinois

```{r}




leaflet(cd_Location) %>% 
  addTiles() %>% 
  addCircles(lng = ~ lon, lat = ~lat, weight = 1,
             radius = ~ sqrt(mean_spend)*300, popup = ~paste(seq = "
", 
                                                             Location,
                                                             "$",round(mean_spend,0))
  )



```

*** 
Also, as Illinois is the place I am living, and we have detailed Spatial data about cities, we take a closer look at my spending within Illinois (please zoom in):

- The biggest dot in the northwest suburb is where my home and company locate now, so the biggest spend happened there
- Aurora is the place I shopped a lot, because it's where the best outlets locate..
- Ah, although downtown Chicago is splendid, I stil don't have that much time travel in town and make that much purchases