Introduction

Objective

Develop a visualization dashboard based on a series of data about your own life. The actual data used for this project can range from daily sleep regimes, TV shows watched, types of food eaten, spending habits, commute times to work, travel habits, to blood pressure and nutrient intake. The amount of data you collect and harvest will differ based on your specified objectives.

Ultimately the project must meet certain key objectives:

  1. You must provide a written summary of your data collection, analysis and visualization methods, including why you chose your methods, and what tools you utilized.

  2. Your summary must outline ≥ 5 questions that can be evaluated using a data-driven approach. These questions should be more than just “How many miles did I run”, although a couple of your questions could be stated that way.

  3. You must collect, manage, and store the data necessary for this visualization.

  4. You must design and create an appropriate set of visualizations (try not to use just one type of visualization) within a dashboard/storyboard that provides insight into your specified questions, with a minimum of ≥ 1 interactive graphical element.

Summary

The project started with the choice of the topic, where both participants sought a topic of common interest for analysis. Since finances are a constant concern in our daily lives, we decided to analyze our spending history considering categories, months, and amounts. The purpose of the study is to analyze monthly expenses in depth and generate some insights so we can possibly understand where we can reduce or optimize expenses and which are the most critical months of expenses during the year, thus leading us to make more financially conscious decisions.

Data Collection

1- Data collection – using the data provided by the credit card issuing banks, we were able to collect data on expenses over the last 12 months.

2- Data cleaning - after downloading the data, we understood that the categories and formats of the two databases were not standardized, so we defined a standard and adapted the data to have a reliable base to work with.

3- Addition of information- while working on the base, we realized for example that adding the months of the year directly to our base would facilitate the process when plotting the graphs, so we made some additions

4- Define the study questions. See below:

Questions

Q1: Which categories do we spend most money on? Is there a difference in our spending habits?

There is some similarity in the categories where Divya and Paula spend their money - they both pay importance to Health and Wellness as well as like to shop. However, Paula spends most her money on Shopping, which is still less than what Divya spends on Food & Drink!

Q2: What is the correlation between spending categories and time of the year? Is there a particular time of year when we spend more or less in a particular category? Explain why?

From the graphs, we are able to see that there is a vast difference in the categories we spend on at different times of the year. For example, Divya spends on Health and Wellness consistently most months, but she spends more on Shopping in certain months vs Entertainment or Travel.

Q3: Find two relevant categories that can be chosen for cost reduction. Explain why

We chose Shopping and Travel categories because they have an important weight in the total monthly expenses of both study participants, so we chose them to analyze separately not only because of the high amount of expenses, greater than, for example, entertainment, but also because they are categories considered non essential, different of “Bills & Utilities” and “Health & Wellness”, which are considered by us as essential categories and more difficult to reduce, with less weight in the analyzed monthly expenses.

Q4:Does the payment history follow the expenses trending or it should be an attention point?

For both of the participants we can see that spending and payments are well aligned, so they are not a red flag for any of the participants, even during the months that the expenses were above the monthly average monthly spending.

Q5: What is the correlation between buying groceries and eating out? Does spending more on one decrease the spending on the other?

There is definitely a positive correlation between spending on groceries and eating out. Divya’s graph shows that whenever she spent more on Food & Drink, her grocery spending was less and vice versa. Paula’s graph shares similar insight for the most part, however, during some months, she only spent on eating out and didn’t buy any groceries.

Spending by Category

Row

Divya’s Spending by Category

Paula’s Spending by Category

Row

Spending by Category Comparision

From these graphs, it is evident that Divya spends the most on Health and Wellness as well as Food & Drink categories, while for Paula, the most expensive categories are Shopping and Health and Wellness. So, there is some amount of similarity in what they both spend most on. Divya spends the least on Donations and Entertainment, whereas Paula spends the least on Personal stuff and Bills/Utilities. Divya could potentially learn how to keep the utility bills down from Paula while Paula could try to cut down on Shopping.

Monthly Spending Analysis

Column

Monthly Spending Divya

Monthly Spending Paula

Monthly Spending Comparision

Divya spends on Health and Wellness every month, however she made a big payment for the Gym Training during a promotion in June 2022. Divya spent more on Home than her highest spending category of Health and Wellness in October 2022 because she moved houses. Divya also spent most on Shopping in November and least in both April 2022 and 2023, likely because of Black Friday sales as well as moving to a new house.

On the other hand, even Paula spends more money in the category shopping, we can see a frequent expense also high on Health and Wellness because one of the monthly expenses in the credit card is the gym membership, which represents around 28% of her credit card expenses every month. On the months of December and January, when the membership was frozen, we could see a significant impact in the final credit card amount.

Shopping and Traveling

Row

Shopping Divya

Travel Divya

Row

Shopping Paula

Travel Paula

Payment History

Row

Divya’s Payments

Paula’s Payments

Groceries vs Eating Out

Row

Divya: Correlation between Groceries and Eating out

Paula: Correlation between Groceries and Eating out

Row

Grocery vs Eating Out Comparision

This graph shows us that there is a correlation between buying groceries and eating out. For Divya, whenever she spent more on groceries, her spending on Food & Drink decreased in those months, and vice versa. The graph aslo shows that for some months, the spending on both categories was almost equal whereas, the biggest difference between the spending on groceries and eating out was in April 2022, which may indicate that she might have visited some expensive restaurants in that month.

For Paula, we can see that the food & drink expenses are way higher than the grocery shopping amount. This can be explained considering that she spends most of her time out of home and doesn’t really cook often. On the months of higher spending, was when she had people visiting, so this category increased considerably. Foods & Drinks is a relevant category that can be considered for cost containment.

---
title: "ANLY 512 Final Project - Quantified Self"
author: Divya Jain and Paula Peres
output: 
  flexdashboard::flex_dashboard:
    orientation: rows
    vertical_layout: fill
    source: embed 
    
---
# **Introduction**


### Objective

Develop a visualization dashboard based on a series of data about your own life. The actual data used for this project can range from daily sleep regimes, TV shows watched, types of food eaten, spending habits, commute times to work, travel habits, to blood pressure and nutrient intake. The amount of data you collect and harvest will differ based on your specified objectives.

Ultimately the project must meet certain key objectives:

1. You must provide a written summary of your data collection, analysis and visualization methods, including why you chose your methods, and what tools you utilized.

2. Your summary must outline ≥ 5 questions that can be evaluated using a data-driven approach. These questions should be more than just “How many miles did I run”, although a couple of your questions could be stated that way.

3. You must collect, manage, and store the data necessary for this visualization.

4. You must design and create an appropriate set of visualizations (try not to use just one type of visualization) within a dashboard/storyboard that provides insight into your specified questions, with a minimum of ≥ 1 interactive graphical element.  


### Summary
  
The project started with the choice of the topic, where both participants sought a topic of common interest for analysis. Since finances are a constant concern in our daily lives, we decided to analyze our spending history considering categories, months, and amounts.
The purpose of the study is to analyze monthly expenses in depth and generate some insights so we can possibly understand where we can reduce or optimize expenses and which are the most critical months of expenses during the year, thus leading us to make more financially conscious decisions.

**Data Collection**


1- Data collection – using the data provided by the credit card issuing banks, we were able to collect data on expenses over the last 12 months.

2- Data cleaning - after downloading the data, we understood that the categories and formats of the two databases were not standardized, so we defined a standard and adapted the data to have a reliable base to work with.

3- Addition of information- while working on the base, we realized for example that adding the months of the year directly to our base would facilitate the process when plotting the graphs, so we made some additions

4- Define the study questions. See below: 


**Questions**
  

**Q1: Which categories do we spend most money on? Is there a difference in our spending habits?**

There is some similarity in the categories where Divya and Paula spend their money - they both pay importance to Health and Wellness as well as like to shop. However, Paula spends most her money on Shopping, which is still less than what Divya spends on Food & Drink! 

**Q2: What is the correlation between spending categories and time of the year? Is there a particular time of year when we spend more or less in a particular category? Explain why?**

From the graphs, we are able to see that there is a vast difference in the categories we spend on at different times of the year. For example, Divya spends on Health and Wellness consistently most months, but she spends more on Shopping in certain months vs Entertainment or Travel.

**Q3: Find two relevant categories that can be chosen for cost reduction. Explain why**

We chose Shopping and Travel categories because they have an important weight in the total monthly expenses of both study participants, so we chose them to analyze separately not only because of the high amount of expenses, greater than, for example, entertainment, but also because they are categories considered non essential, different of “Bills & Utilities” and “Health & Wellness”, which are considered by us as essential categories and more difficult to reduce, with less weight in the analyzed monthly expenses.

**Q4:Does the payment history follow the expenses trending or it should be an attention point? **

For both of the participants we can see that spending and payments are well aligned, so they are not a red flag for any of the participants, even during the months that the expenses were above the monthly average monthly spending.

**Q5: What is the correlation between buying groceries and eating out? Does spending more on one decrease the spending on the other?**

There is definitely a positive correlation between spending on groceries and eating out. Divya's graph shows that whenever she spent more on Food & Drink, her grocery spending was less and vice versa. Paula's graph shares similar insight for the most part, however, during some months, she only spent on eating out and didn't buy any groceries. 


```{r setup, include=FALSE}
library(flexdashboard)
library(dygraphs)
library(dplyr)
library(ggplot2)
library(rgdal)
library(RCurl)
library(tidyverse)
library(scatterpie)
library(knitr)
library(readr)
library(MASS)
library(lubridate)
library(gapminder)
library(readxl)
library(plotly)
library(gridExtra)
```

# **Spending by Category**

Row 
-------------------------------------

### Divya's Spending by Category

```{r}
library(ggplot2)
Divya_dataset <- read_excel('/Users/divyajain92/Downloads/Divya dataset.xlsx')
p1<- ggplot(Divya_dataset, aes(x = Category, y=Amount, fill=Category)) + 
  geom_bar(stat = "identity", fill = "Dark Green")+
  labs(title = "Spending per Category", x = "Category", y = "Amount")+
  coord_flip()
p1
```


### Paula's Spending by Category

```{r}
library(ggplot2)
Paula_dataset <- read_excel('/Users/divyajain92/Downloads/Paula dataset.xlsx')
p2<- ggplot(Paula_dataset, aes(x = Category, y=Amount, fill=Category)) + 
  geom_bar(stat = "identity", fill = "Dark Blue")+
  labs(title = "Spending per Category", x = "Category", y = "Amount")+
  coord_flip()
p2
```

Row {data-height=150}
-------------------------------------

### Spending by Category Comparision 

From these graphs, it is evident that Divya spends the most on Health and Wellness as well as Food & Drink categories, while for Paula, the most expensive categories are Shopping and Health and Wellness. So, there is some amount of similarity in what they both spend most on. Divya spends the least on Donations and Entertainment, whereas Paula spends the least on Personal stuff and Bills/Utilities. Divya could potentially learn how to keep the utility bills down from Paula while Paula could try to cut down on Shopping. 


# **Monthly Spending Analysis**

Column {data-width=500 .tabset .tabset-fade}
-------------------------------------

### Monthly Spending Divya

```{r}
library(ggplot2)
Divya_dataset <- read_excel('/Users/divyajain92/Downloads/Divya dataset.xlsx')
spending <- ggplot(Divya_dataset, aes(x = Month, y = Amount)) +
  geom_point(aes(col=Category, size=Amount)) +
  labs(title = "Divya Spending", x = "Date", y = "Amount")

(ggspending <- ggplotly(spending))
```

### Monthly Spending Paula

```{r}
library(ggplot2)
Paula_dataset <- read_excel('/Users/divyajain92/Downloads/Paula dataset.xlsx')
spending <- ggplot(Paula_dataset, aes(x = Month, y = Amount)) +
  geom_point(aes(col=Category, size=Amount)) +
  labs(title = "Paula Spending", x = "Date", y = "Amount")

(ggspending <- ggplotly(spending))
```

{data-height=150}
-------------------------------------

### Monthly Spending Comparision 

Divya spends on Health and Wellness every month, however she made a big payment for the Gym Training during a promotion in June 2022. Divya spent more on Home than her highest spending category of Health and Wellness in October 2022 because she moved houses. Divya also spent most on Shopping in November and least in both April 2022 and 2023, likely because of Black Friday sales as well as moving to a new house.

On the other hand, even Paula spends more money in the category shopping, we can see a frequent expense also high on Health and Wellness because one of the monthly expenses in the credit card is the gym membership, which represents around 28% of her credit card expenses every month. On the months of December and January, when the membership was frozen, we could see a significant impact in the final credit card amount.


# **Shopping and Traveling**

Row {.tabset .tabset-fade}
-------------------------------------

### Shopping Divya

```{r}
Divya_dataset <- read_excel('/Users/divyajain92/Downloads/Divya dataset.xlsx')
Shopping <- Divya_dataset[Divya_dataset$Category == "Shopping", ]
ggplot(Shopping, aes(x=Month, y=Amount)) +
  geom_line(color="Orange", size=2, alpha=0.9, linetype=1) +
  ggtitle("Shopping Spend - Divya")
```

### Travel Divya 

```{r}
Travel <- Divya_dataset[Divya_dataset$Category == "Travel", ]
ggplot(Travel, aes(x=Month, y=Amount)) +
  geom_line(color="Orange", size=2, alpha=0.9, linetype=1) +
  ggtitle("Travel Spend - Divya")
```

Row {.tabset .tabset-fade}
----------------------------------

### Shopping Paula

```{r}
Paula_dataset <- read_excel('/Users/divyajain92/Downloads/Paula dataset.xlsx')
Shopping <- Paula_dataset[Paula_dataset$Category == "Shopping", ]
ggplot(Shopping, aes(x=Month, y=Amount)) +
  geom_line(color="Purple", size=2, alpha=0.9, linetype=1) +
  ggtitle("Shopping Spend - Paula")
```

### Travel Paula

```{r}
Travel <- Paula_dataset[Paula_dataset$Category == "Travel", ]
ggplot(Travel, aes(x=Month, y=Amount)) +
  geom_line(color="Purple", size=2, alpha=0.9, linetype=1) +
  ggtitle("Travel Spend - Paula")
```


# **Payment History**

Row 
-------------------------------------

### Divya's Payments

```{r}
Divya_payments <- read_excel('/Users/divyajain92/Downloads/Divya payments.xls')
ggplot(Divya_payments, aes(x=PostDate, y=Amount)) + 
  geom_bar(stat = "identity", fill="Plum")+
  ggtitle("Payment History - Divya")
```

### Paula's Payments

```{r}
Paula_payments <- read_excel('/Users/divyajain92/Downloads/Paula payments.xls')
ggplot(Paula_payments, aes(x=PostDate, y=Amount)) + 
  geom_bar(stat = "identity", fill="Tomato")+
  ggtitle("Payment History - Paula")
```


# **Groceries vs Eating Out**

Row 
-------------------------------------

### Divya: Correlation between Groceries and Eating out

```{r}
Divya_dataset<-data.frame(Divya_dataset)
Divya_dataset$Month <- as.factor(Divya_dataset$Month)
Divya_dataset = subset(Divya_dataset,Category == "Food & Drink" | Category == "Groceries")
ggplot(data=Divya_dataset, aes(x=Month, y=Amount)) +   
  geom_bar(aes(fill = Category), position = "dodge", stat="identity") +
    labs(title = "Groceries vs Eating Out", x = "Month", y = "Amount") +
  theme(legend.position = "Right") +
  scale_fill_discrete(name="Month") +
  theme(legend.position = "bottom") +
  coord_flip()
```

### Paula: Correlation between Groceries and Eating out

```{r}
Paula_dataset<-data.frame(Paula_dataset)
Paula_dataset$Month <- as.factor(Paula_dataset$Month)
Paula_dataset = subset(Paula_dataset,Category == "Food & Drink" | Category == "Groceries")
ggplot(data=Paula_dataset, aes(x=Month, y=Amount)) +   
  geom_bar(aes(fill = Category), position = "dodge", stat="identity") +
    labs(title = "Groceries vs Eating Out", x = "Month", y = "Amount") +
  theme(legend.position = "Right") +
  scale_fill_discrete(name="Month") +
  theme(legend.position = "bottom") +
  coord_flip()
```

Row {data-height=150}
-------------------------------------

### Grocery vs Eating Out Comparision 

This graph shows us that there is a correlation between buying groceries and eating out. For Divya, whenever she spent more on groceries, her spending on Food & Drink decreased in those months, and vice versa. The graph aslo shows that for some months, the spending on both categories was almost equal whereas, the biggest difference between the spending on groceries and eating out was in April 2022, which may indicate that she might have visited some expensive restaurants in that month. 

For Paula, we can see that the food & drink expenses are way higher than the grocery shopping amount. This can be explained considering that she spends most of her time out of home and doesn’t really cook often. On the months of higher spending, was when she had people visiting, so this category increased considerably. Foods & Drinks is a relevant category that can be considered for cost containment.