The Quantified Self (QS) is a movement motivated to leverage the synergy of wearables (https://en.wikipedia.org/wiki/Wearable_computer), analytics, and “Big Data”. This movement exploits the ease and convenience of data acquisition through the internet of things (IoT) (https://en.wikipedia.org/wiki/Internet_of_Things) to feed the growing obsession of personal informatics and quotidian (http://www.dictionary.com/browse/quotidian) data. The website http://quanti edself.com/ (http://quanti edself.com/) is a great place to start to understand more about the QS movement.
The value of the QS for our class is that its core mandate is to visualize and generate questions and insights about a topic that is of immense importance to most people - themselves. It also produces a wealth of data in a variety of forms. Therefore, designing this project around the QS movement makes perfect sense because it offers you the opportunity to be both the data and question provider, the data analyst, the vis designer, and the end user. This means you will be in the unique position of being capable of providing feedback and direction at all points along the data visualization/analysis life cycle.
Develop a visualiation dashboard based on a series of data about your own life. The actual data used for this project can range from daily sleep regimes, to spatial positioning information at 1 minute intervals, to blood pressure and nutritient intake. The amount of data you collect and harvest will differ based on your specied objectives. Ultimately the project must meet certain key objectives:
You must provide an written summary of your data collection, analysis and visualization methods, including the why you chose your methods, and what tools you utilized. You summary must outline 5 questions that can be evaluated using a data-driven approach. You must collect, manage, and store the data necessary for this visualization. You must design and create a set of visualizations within a dashboard/storyboard that provides insight into your specified questions. With at minimum 1 interactive graphical element
The data were collected from my personal Citibank debit card, encompassing the years 2016 and 2017. The data consist of monthly expenses (in USD) on rent, medical insurance, grocery, transportation, shopping and entertainment. Visualization methods like histogram, scattorplot, line chart, bar plot, polar coordinates, mapping, and interactive plot, were used.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
spending_data <- read.csv("/Users/arka/Desktop/Harrisburg_University_Courses/Semester_2_Late_Fall/ANLY 512-50 (Data Visualization)/Final_Project_Quantified_Self/Spending_Statement_2016_2017.csv")
spending_data$Index <- c(1:24)
spending_by_month <- as.data.frame(spending_data %>%
group_by(Month) %>%
summarise_each(funs(sum)))
## `summarise_each()` is deprecated.
## Use `summarise_all()`, `summarise_at()` or `summarise_if()` instead.
## To map `funs` over all variables, use `summarise_all()`
sum_by_month <- rowSums(spending_by_month)
spending_by_month$Sum_By_Month <- sum_by_month
ggplot(spending_by_month, aes(x = Month, y = Sum_By_Month)) +
geom_bar(stat = "identity", position="identity", width = .7, colour = "goldenrod2", fill = "gold1") +
scale_x_continuous(name = "Month", breaks = seq(0, 13, 1), limits = c(0, 13)) +
scale_y_continuous(name = "Total Expense (USD)", limits = c(0, 15000)) +
ggtitle("Total Debit Card Expense by Month during 2016-2017") +
geom_text(aes(label = Month), position = position_dodge(width = 0.5), vjust = -0.25, size = 3) +
theme_bw()
library(reshape2)
spending_by_category <- as.data.frame(colSums(spending_by_month[, c(2:7)]))
colnames(spending_by_category) <- c("Expense")
spending_by_category$Category <- c("Rent", "Medical.Insurance", "Grocery", "Transportation", "Shopping", "Entertainment")
ggplot(spending_by_category, aes(x = Category, y = Expense)) +
geom_bar(stat = "identity", position ="identity", width = .7, fill = "lightgreen") +
ggtitle("Total Debit Card Expense by Category during 2016-2017") +
geom_text(aes(label = Category), position = position_dodge(width = 0.5), vjust = -0.25, size = 3) +
theme_bw()
ggplot(data = spending_data,
aes(x = Index, y = Rent)) +
xlab("Month No. (Ranges from 1-24)") +
ylab("Rent Expense (USD)") +
geom_point() +
geom_line() +
ggtitle("Rent Expense vs Time") +
stat_smooth(method = "loess", formula = y ~ x, size = 0.7, col = "red")
ggplot(data = spending_data,
aes(x = Index, y = Medical.Insurance)) +
xlab("Month No. (Ranges from 1-24)") +
ylab("Medical Insurance Expense (USD)") +
geom_point() +
geom_line() +
ggtitle("Medical Insurance Expense vs Time") +
stat_smooth(method = "loess", formula = y ~ x, size = 0.7, col = "red")
ggplot(data = spending_data,
aes(x = Index, y = Grocery)) +
xlab("Month No. (Ranges from 1-24)") +
ylab("Grocery Expense (USD)") +
geom_point() +
geom_line() +
ggtitle("Grocery Expense vs Time") +
stat_smooth(method = "loess", formula = y ~ x, size = 0.7, col = "red")
ggplot(data = spending_data,
aes(x = Index, y = Transportation)) +
xlab("Month No. (Ranges from 1-24)") +
ylab("Transportation Expense (USD)") +
geom_point() +
geom_line() +
ggtitle("Transportation Expense vs Time") +
stat_smooth(method = "loess", formula = y ~ x, size = 0.7, col = "red")
ggplot(data = spending_data,
aes(x = Index, y = Shopping)) +
xlab("Month No. (Ranges from 1-24)") +
ylab("Shopping Expense (USD)") +
geom_point() +
geom_line() +
ggtitle("Shopping Expense vs Time") +
stat_smooth(method = "loess", formula = y ~ x, size = 0.7, col = "red")
ggplot(data = spending_data,
aes(x = Index, y = Entertainment)) +
xlab("Month No. (Ranges from 1-24)") +
ylab("Entertainment Expense (USD)") +
geom_point() +
geom_line() +
ggtitle("Entertainment Expense vs Time") +
stat_smooth(method = "loess", formula = y ~ x, size = 0.7, col = "red")
spending_by_month_copy <- spending_by_month
spending_by_month_copy$Rent <- NULL
spending_by_month_copy$Medical.Insurance <- NULL
spending_by_month_copy$Index <- NULL
spending_by_month_copy$Sum_By_Month <- NULL
sum_by_month <- rowSums(spending_by_month_copy)
spending_by_month_copy$Sum_By_Month <- sum_by_month
ggplot(data = spending_by_month_copy,
aes(x = Month, y = Sum_By_Month)) +
xlab("Month") +
ylab("Expense (USD)") +
geom_point() +
geom_line() +
ggtitle("Variability of expense (excluding Rent and Medical Insurance) vs Time") +
stat_smooth(method = "loess", formula = y ~ x, size = 0.7, col = "red")