Overview of the Quantified Self movement

quant_self


Final Project for ANLY-512: Data Visualization

Professor Alan Hitch, Ph.D.

The Quantified Self movement grew from the popularity and growth of the internet of things, the mass collection of personal information, and mobile technologies (primarily wearable computing).

This final class project uses a collection of One Year of data (Oct. 2020 - Sep. 2021) on spending and payments captured by my personal Amex card.

The goal of the project is to collect, analyze and visualize the data using the tools and methods covered in class. Additionally, using the data-driven approach, I will create a summary which answers the following questions based on the data collected.

  1. What is the total spending by month in Oct. 2020 - Sep. 2021?

  2. Which category costed the most in Oct. 2020 - Sep. 2021?

  3. For Dining category, what is the spending by month in Oct. 2020 - Sep. 2021?

  4. In February and July, what is the spending in each category?

  5. Compared Q4 2020 and Q1 2021, what is the difference?

Data Preparation

# A tibble: 12 x 7
   `Tran. Date` `Post Date` Month Description        Amount Category        Year
   <chr>        <chr>       <dbl> <chr>               <dbl> <chr>          <dbl>
 1 10/01/2020   10/01/2020     10 AMAZON SHOP WITH ~ -53.3  Fees & Adjust~  2020
 2 10/01/2020   10/01/2020     10 AplPay APPLE.COM/~   5.43 Merchandise &~  2020
 3 10/01/2020   10/01/2020     10 PRIME VIDEO*MK1LK~   5.99 Merchandise &~  2020
 4 10/01/2020   10/01/2020     10 UBER EATS        ~  28.9  Restaurant-Re~  2020
 5 10/02/2020   10/02/2020     10 UBER EATS        ~  22.6  Restaurant-Re~  2020
 6 10/02/2020   10/02/2020     10 AMAZON MKTPLACE P~  53.3  Merchandise &~  2020
 7 10/02/2020   10/02/2020     10 UBER EATS        ~  90.6  Restaurant-Re~  2020
 8 10/03/2020   10/03/2020     10 AMEX Streaming Su~  -5.99 Merchandise &~  2020
 9 10/03/2020   10/03/2020     10 UBER EATS        ~  20.0  Restaurant-Re~  2020
10 10/03/2020   10/03/2020     10 TMOBILE WEB UPGRA~ 241.   Communication~  2020
11 10/04/2020   10/04/2020     10 AMAZON SHOP WITH ~ -40.6  Fees & Adjust~  2020
12 10/04/2020   10/04/2020     10 AMEX Streaming Su~  -5.43 Merchandise &~  2020
# A tibble: 12 x 7
   `Tran. Date` `Post Date` Month Description        Amount Category        Year
   <chr>        <chr>       <dbl> <chr>               <dbl> <chr>          <dbl>
 1 09/19/2021   09/19/2021      9 "NYCT PAYGO      ~ 2.75e0 Transportatio~  2021
 2 09/19/2021   09/19/2021      9 "ARABICA         ~ 2.72e1 Restaurant-Ba~  2021
 3 09/21/2021   09/21/2021      9 "Amazon Marketpla~ 1.09e1 Merchandise &~  2021
 4 09/21/2021   09/21/2021      9 "UBER EATS       ~ 8.78e1 Restaurant-Re~  2021
 5 09/22/2021   09/22/2021      9 "T-MOBILE.COM\\CU~ 2.06e2 Communication~  2021
 6 09/23/2021   09/23/2021      9 "UBER EATS       ~ 4.00e1 Restaurant-Re~  2021
 7 09/25/2021   09/25/2021      9 "NYCT PAYGO      ~ 2.75e0 Transportatio~  2021
 8 09/25/2021   09/25/2021      9 "Amazon Marketpla~ 1.20e1 Merchandise &~  2021
 9 09/25/2021   09/25/2021      9 "T-MOBILE        ~ 4.35e1 Communication~  2021
10 09/28/2021   09/28/2021      9 "Interest Charge ~ 5.11e1 Fees & Adjust~  2021
11 09/28/2021   09/28/2021      9 "CLCKPAY25 PARK R~ 4.09e3 Business Serv~  2021
12 09/30/2021   09/30/2021      9 "UBER EATS       ~ 7.39e1 Restaurant-Re~  2021

Q1: What is the total spending by month in Oct. 2020 - Sep. 2021?


Since Month is categorical and Amount is continuous, and I aim to compare the total amount in each month, bar chart was used to summarize the data.

I summarized the total amount spent in each month and used bar chart to display the numbers.

From the plot we could find that average spending per month is around 3,800 dollars. June has the highest spending of ~ $5,800.00.

Q2: Which category costed the most in Oct. 2020 - Sep. 2021?


Due to the same reason, bar chart was used here. The name of each category is long and can not be displayed at the bottom, so a rotated bar chart was created.

Here We see the category I spend the most money is Restaurant, around $18,000 for the 12-month period between Oct. 2020 - 2021.

Q3: For Restaurant, what is the spending by month in Oct. 2020 - Sep. 2021?


We see that for most months, the spending on Restaurant was more than 1,000 dollars. However, in February and July, it is almost 3 times higher than the other months.

I will perform an in-depth view of these two months, but the quick answer would be:

Feb. has Chinese New Year so traditionally we go out and celebrate.

July 28th is my girlfriend’s birthday so fancy dinners were in order. Spent $700+ at two different fine establishments.

Q4: In February and July, what is the spending in each category?

Q5: Compare Oct - Dec, 2020 and Jan - Mar, 2021, what is the difference?


Since the purpose was to compare the spending between Q4 2020 and Q1 2021, a grouped bar chart was used.

There was a Year-end holiday trip in January that explains the travel & lodge expenses for Q4 2020.

The surge in Education expense for Q1 2021 was related to the deposit charged by Harrisburg U.

.

---
title: "ANLY 512 Final Project"
author: "Yuxuan Zhao"
date: "`r Sys.Date()`"
output: 
  flexdashboard::flex_dashboard:
    storyboard: true
    social: menu
    source: embed
    orientation: columns
    vertical_layout: fill
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(flexdashboard)
library(knitr)
library(ggplot2)
library(tidyverse)
library(readxl)
library(dplyr)
library(xts)
library(zoo)
library(lubridate)
setwd("~/Work Files - R")
```

### Overview of the Quantified Self movement
![quant_self](credit-cards.jpg)


***

Final Project for ANLY-512: Data Visualization

Professor Alan Hitch, Ph.D.

The Quantified Self movement grew from the popularity and growth of the internet of things, the mass collection of personal information, and mobile technologies (primarily wearable computing). 

This final class project uses a collection of One Year of data (Oct. 2020 - Sep. 2021) on spending and payments captured by my personal Amex card.

The goal of the project is to collect, analyze and visualize the data using the tools and methods covered in class. Additionally, using the data-driven approach, I will create a summary which answers the following questions based on the data collected.


1) What is the total spending by month in Oct. 2020 - Sep. 2021?

2) Which category costed the most in Oct. 2020 - Sep. 2021?

3) For Dining category, what is the spending by month in Oct. 2020 - Sep. 2021?

4) In February and July, what is the spending in each category?

5) Compared Q4 2020 and Q1 2021, what is the difference?


### Data Preparation

```{r}
setwd("~/Work Files - R")
library(readxl)
data <- read_excel("amexdata.xlsx")
View(data)

head(data,12)

tail(data,12)
```

***

- There are 7 variables we are interested in:


- Tran. Date: Transaction Date
- Post Date
- Month
- Description: Detailed Description of purchases
- Amount: The total amount spent	
- Category
- Year: 2020 or 2021


### Q1: What is the total spending by month in Oct. 2020 - Sep. 2021?

```{r fig.height=8, fig.width=10}

# plot

library(ggplot2)

fill <- "gold2"
line <- "goldenrod2"

p1 <- ggplot(data, aes(x = Month, y=Amount)) + 
  geom_bar(stat = "identity", fill = "Grey")+
  scale_x_discrete(limits=c("Oct","Nov","Dec","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep")) +
  labs(title = "Spending per Month", x = "Month", y = "Amount") +
  theme_minimal()

p1
```

***


Since Month is categorical and Amount is continuous, and I aim to compare the total amount in each month, bar chart was used to summarize the data.

I summarized the total amount spent in each month and used bar chart to display the numbers.

From the plot we could find that average spending per month is around 3,800 dollars. June has the highest spending of ~ $5,800.00.


### Q2: Which category costed the most in Oct. 2020 - Sep. 2021? 

```{r fig.height=12, fig.width=10}
library(ggplot2)

p2 <- ggplot(data, aes(x = Category, y=Amount, fill=Category)) + 
  geom_bar(stat = "identity", fill = "Grey")+
  labs(title = "Spending per Category", x = "Category", y = "Amount") +
  coord_flip()

p2
```


***

Due to the same reason, bar chart was used here. The name of each category is long and can not be displayed at the bottom, so a rotated bar chart was created.

Here We see the category I spend the most money is Restaurant, around $18,000 for the 12-month period between Oct. 2020 - 2021.


### Q3: For Restaurant, what is the spending by month in Oct. 2020 - Sep. 2021?

```{r fig.height=10, fig.width=12}
data1 <- subset(data, Category=='Restaurant-Restaurant')

p3 <- ggplot(data1, aes(x = Month, y = Amount)) + 
  geom_bar(stat = "identity", fill = "Grey") +
  scale_x_discrete(limits=c("Oct","Nov","Dec","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep")) +
  labs(title = "Spending per Month", x = "Month", y = "Amount") +
  theme_minimal()

p3
```

***

We see that for most months, the spending on Restaurant was more than 1,000 dollars. However, in February and July, it is almost 3 times higher than the other months.

I will perform an in-depth view of these two months, but the quick answer would be: 

Feb. has Chinese New Year so traditionally we go out and celebrate.

July 28th is my girlfriend's birthday so fancy dinners were in order. Spent $700+ at two different fine establishments.


### Q4: In February and July, what is the spending in each category?

```{r fig.height=8, fig.width=12}

data2 <- subset(data, Month == '2')

p4 <- ggplot(data2, aes(x = Category, y=Amount, fill=Category)) + 
  geom_bar(stat = "identity", fill = "Grey")+
  labs(title = "Spending per Category - February", x = "Category", y = "Amount") +
  coord_flip()

data3 <- subset(data, Month == '7')

p5 <- ggplot(data3, aes(x = Category, y=Amount, fill=Category)) + 
  geom_bar(stat = "identity", fill = "Grey")+
  labs(title = "Spending per Category - July", x = "Category", y = "Amount") +
  coord_flip()

require(gridExtra)

grid.arrange(p4, p5, ncol=2)

```


### Q5: Compare Oct - Dec, 2020 and Jan - Mar, 2021, what is the difference?


```{r fig.height=12, fig.width=10}
setwd("~/Work Files - R")

library(readxl)
amexdata_Copy <- read_excel("amexdata - Copy.xlsx")
View(amexdata_Copy)

comp <- data.frame(amexdata_Copy)

comp$Year <- as.factor(comp$Year)

ggplot(data=comp, aes(x=Category, y=Amount)) +   
  geom_bar(aes(fill = Year), position = "dodge", stat="identity") +
    labs(title = "Spending per Category", x = "Category", y = "Amount") +
  theme(legend.position = "Right") +
  scale_fill_discrete(name="Year") +
  theme(legend.position = "bottom") +
  coord_flip()
```


***

Since the purpose was to compare the spending between Q4 2020 and Q1 2021, a grouped bar chart was used.

There was a Year-end holiday trip in January that explains the travel & lodge expenses for Q4 2020.

The surge in Education expense for Q1 2021 was related to the deposit charged by Harrisburg U.

.