Requirements

The Quantified Self (QS) is a movement motivated to leverage the synergy of wearables (https://en.wikipedia.org/wiki/Wearable_computer), analytics, and “Big Data”. This movement exploits the ease and convenience of data acquisition through the internet of things (IoT) (https://en.wikipedia.org/wiki/Internet_of_Things) to feed the growing obsession of personal informatics and quotidian (http://www.dictionary.com/browse/quotidian) data. The website http://quanti edself.com/ (http://quanti edself.com/) is a great place to start to understand more about the QS movement.

The value of the QS for our class is that its core mandate is to visualize and generate questions and insights about a topic that is of immense importance to most people - themselves. It also produces a wealth of data in a variety of forms. Therefore, designing this project around the QS movement makes perfect sense because it offers you the opportunity to be both the data and question provider, the data analyst, the vis designer, and the end user. This means you will be in the unique position of being capable of providing feedback and direction at all points along the data visualization/analysis life cycle.

Develop a visualiation dashboard based on a series of data about your own life. The actual data used for this project can range from daily sleep regimes, to spatial positioning information at 1 minute intervals, to blood pressure and nutritient intake. The amount of data you collect and harvest will differ based on your specied objectives. Ultimately the project must meet certain key objectives:

You must provide an written summary of your data collection, analysis and visualization methods, including the why you chose your methods, and what tools you utilized. You summary must outline 5 questions that can be evaluated using a data-driven approach. You must collect, manage, and store the data necessary for this visualization. You must design and create a set of visualizations within a dashboard/storyboard that provides insight into your specified questions. With at minimum 1 interactive graphical element

Data Collection and Visualization Methods

The data were collected from my personal Citibank debit card, encompassing the years 2016 and 2017. The data consist of monthly expenses (in USD) on rent, medical insurance, grocery, transportation, shopping and entertainment. Visualization methods like histogram, scattorplot, line chart, bar plot, polar coordinates, mapping, and interactive plot, were used.

Summary of questions

  1. What is the total expense every month, in 2016 and 2017 ?
  2. What is the total expense per category, in 2016 and 2017 ?
  3. Show the variability of expenses, per category, in 2016 and 2017 (Use a plot).
  4. What is the variability of total expense every month, in 2016 and 2017, by excluding fixed costs like rent and medical insurance ?
  5. Draw scatter plots among variable cost categories (i.e exclude rent and medical insurance) and find if, certain pair of cost categories are highly correlated/anti-correlated.

Final Output and Results

1. Total expense by month (2016 - 2017)

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

spending_data <- read.csv("/Users/arka/Desktop/Harrisburg_University_Courses/Semester_2_Late_Fall/ANLY 512-50 (Data Visualization)/Final_Project_Quantified_Self/Spending_Statement_2016_2017.csv")

spending_data$Index <- c(1:24)

spending_by_month <- as.data.frame(spending_data %>%
                                      group_by(Month) %>% 
                                         summarise_each(funs(sum)))
## `summarise_each()` is deprecated.
## Use `summarise_all()`, `summarise_at()` or `summarise_if()` instead.
## To map `funs` over all variables, use `summarise_all()`
sum_by_month <- rowSums(spending_by_month)

spending_by_month$Sum_By_Month <- sum_by_month

ggplot(spending_by_month, aes(x = Month, y = Sum_By_Month)) +
    geom_bar(stat = "identity", position="identity", width = .7, colour = "goldenrod2", fill = "gold1") +
    scale_x_continuous(name = "Month", breaks = seq(0, 13, 1), limits = c(0, 13)) +
    scale_y_continuous(name = "Total Expense (USD)", limits = c(0, 15000)) +
    ggtitle("Total Debit Card Expense by Month during 2016-2017") +
    geom_text(aes(label = Month), position = position_dodge(width = 0.5), vjust = -0.25, size = 3) +
    theme_bw()

2. Total expense per category (2016 - 2017)

library(reshape2)

spending_by_category <- as.data.frame(colSums(spending_by_month[, c(2:7)]))
colnames(spending_by_category) <- c("Expense")
spending_by_category$Category <- c("Rent", "Medical.Insurance", "Grocery", "Transportation", "Shopping", "Entertainment")

ggplot(spending_by_category, aes(x = Category, y = Expense)) +
    geom_bar(stat = "identity", position ="identity", width = .7, fill = "lightgreen") +
    ggtitle("Total Debit Card Expense by Category during 2016-2017") +
    geom_text(aes(label = Category), position = position_dodge(width = 0.5), vjust = -0.25, size = 3) +
    theme_bw()

3. Variability of expenses, per category, in 2016 and 2017

ggplot(data = spending_data, 
       aes(x = Index, y = Rent)) +
       xlab("Month No. (Ranges from 1-24)") + 
       ylab("Rent Expense (USD)") +
       geom_point() +
       geom_line() +
       ggtitle("Rent Expense vs Time") +
       stat_smooth(method = "loess", formula = y ~ x, size = 0.7, col = "red")

ggplot(data = spending_data, 
       aes(x = Index, y = Medical.Insurance)) +
       xlab("Month No. (Ranges from 1-24)") + 
       ylab("Medical Insurance Expense (USD)") +
       geom_point() +
       geom_line() +
       ggtitle("Medical Insurance Expense vs Time") +
       stat_smooth(method = "loess", formula = y ~ x, size = 0.7, col = "red")

ggplot(data = spending_data, 
       aes(x = Index, y = Grocery)) +
       xlab("Month No. (Ranges from 1-24)") + 
       ylab("Grocery Expense (USD)") +
       geom_point() +
       geom_line() +
       ggtitle("Grocery Expense vs Time") +
       stat_smooth(method = "loess", formula = y ~ x, size = 0.7, col = "red")

ggplot(data = spending_data, 
       aes(x = Index, y = Transportation)) +
       xlab("Month No. (Ranges from 1-24)") + 
       ylab("Transportation Expense (USD)") +
       geom_point() +
       geom_line() +
       ggtitle("Transportation Expense vs Time") +
       stat_smooth(method = "loess", formula = y ~ x, size = 0.7, col = "red")

ggplot(data = spending_data, 
       aes(x = Index, y = Shopping)) +
       xlab("Month No. (Ranges from 1-24)") + 
       ylab("Shopping Expense (USD)") +
       geom_point() +
       geom_line() +
       ggtitle("Shopping Expense vs Time") +
       stat_smooth(method = "loess", formula = y ~ x, size = 0.7, col = "red")

ggplot(data = spending_data, 
       aes(x = Index, y = Entertainment)) +
       xlab("Month No. (Ranges from 1-24)") + 
       ylab("Entertainment Expense (USD)") +
       geom_point() +
       geom_line() +
       ggtitle("Entertainment Expense vs Time") +
       stat_smooth(method = "loess", formula = y ~ x, size = 0.7, col = "red")

4. Variability of total expense by month, in 2016 and 2017, by excluding fixed costs like rent and medical insurance

spending_by_month_copy <- spending_by_month
spending_by_month_copy$Rent <- NULL
spending_by_month_copy$Medical.Insurance <- NULL
spending_by_month_copy$Index <- NULL
spending_by_month_copy$Sum_By_Month <- NULL

sum_by_month <- rowSums(spending_by_month_copy)
spending_by_month_copy$Sum_By_Month <- sum_by_month

ggplot(data = spending_by_month_copy, 
       aes(x = Month, y = Sum_By_Month)) +
       xlab("Month") + 
       ylab("Expense (USD)") +
       geom_point() +
       geom_line() +
       ggtitle("Variability of expense (excluding Rent and Medical Insurance) vs Time") +
       stat_smooth(method = "loess", formula = y ~ x, size = 0.7, col = "red")

5. Scatter plots among variable cost categories (i.e exclude rent and medical insurance) and find if, certain pair of cost categories are highly correlated/anti-correlated.

ggplot(data = spending_by_month_copy, 
       aes(x = Grocery, y = Transportation)) + 
       geom_point(position = "jitter") + 
       geom_smooth(method = "lm") +
       theme_minimal() + 
       labs(title = "Relationship between aggregated monthly Grocery and Transportation expense",  
            x = "Aggregate Monthly Grocery Expense",
            y = "Aggregate Monthly Transportation Expense")

ggplot(data = spending_by_month_copy, 
       aes(x = Grocery, y = Shopping)) +
       geom_point(position = "jitter") +
       geom_smooth(method = "lm") +
       theme_minimal() + 
       labs(title = "Relationship between aggregated monthly Grocery and Shopping expense",  
            x = "Aggregate Monthly Grocery Expense",
            y = "Aggregate Monthly Shopping Expense")

ggplot(data = spending_by_month_copy,
       aes(x = Grocery, y = Entertainment)) + 
       geom_point(position = "jitter") + 
       geom_smooth(method = "lm") +
       theme_minimal() +
       labs(title = "Relationship between aggregated monthly Grocery and Entertainment expense",  
            x = "Aggregate Monthly Grocery Expense",
            y = "Aggregate Monthly Entertainment Expense")

ggplot(data = spending_by_month_copy,
       aes(x = Transportation, y = Shopping)) + 
       geom_point(position = "jitter") + 
       geom_smooth(method = "lm") +
       theme_minimal() +
       labs(title = "Relationship between aggregated monthly Transportation and Shopping expense",  
            x = "Aggregate Monthly Transportation Expense",
            y = "Aggregate Monthly Shopping Expense")

ggplot(data = spending_by_month_copy,
       aes(x = Transportation, y = Entertainment)) + 
       geom_point(position = "jitter") + 
       geom_smooth(method = "lm") +
       theme_minimal() +
       labs(title = "Relationship between aggregated monthly Transportation and Entertainment expense",  
            x = "Aggregate Monthly Transportation Expense",
            y = "Aggregate Monthly Entertainment Expense")

ggplot(data = spending_by_month_copy,
       aes(x = Shopping, y = Entertainment)) + 
       geom_point(position = "jitter") + 
       geom_smooth(method = "lm") +
       theme_minimal() +
       labs(title = "Relationship between aggregated monthly Shopping and Entertainment expense",  
            x = "Aggregate Monthly Shopping Expense",
            y = "Aggregate Monthly Entertainment Expense")