DASHBOARD

Column1

CITIES VISITED IN USA

NETFLIX WATCH TIME

NETFLIX WORDCLOUD

CREDIT CARD ACTIVITY

DAILY ACTIVITY SINCE JAN’18

REPORT

DATA COLLECTION PROCESS

For this dashboard, I had to look at a few places to get the data. For the first graphic I recalled and listed the cities I had lived and visited in the past, and then using the function “geocode” in R recorded the longitudes and latitudes. Next, I scrapped my Netflix watching history from my Netflix account, later I used this data to, one, analyze the days I used my account, and second, which shows I had watched the most. After this, I collected data from my credit card statements that categorized each activity. Lastly, I gathered data from a mobile app I use that tracks the steps taken every day. All of the data was stored and managed in a .csv file so I can create the visualizations.

ANALYSIS

CONCLUSION

In conclusion, this dashboard has really opened my eyes as to where I spend most of my energy, money, and time. Going forward, this will definitely help me track and utilize the resources I have in a better way. Thank you for taking some time to take a peak into my life.

---
title: "THE QUANTIFIED SELF"
author: "Himanshu Soni"
output: 
  flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: fill
    social: menu
    source: embed
---

```{r setup, include=FALSE}
library(readxl)
library(ggplot2)
library(ggmap)
library(NLP)
library(tm)
library(SnowballC)
library(RColorBrewer)
library(wordcloud)
library(dygraphs)
library(xts)
```

#SUMMARY {.sidebar}
The quantified self dashboard is an effort to learn and analyze a few, if not all, aspects of my life. In this dashboard I have tried to highlight and answer five questions. In the process of answering these questions, I learned a great deal about myself. Since coming to the United States of America, I have tried making the most of my free time when I am not working or studying. My focus on traveling, exploring TV shows, indulging in leisurely/entertainment activities, and taking care of my health were all an effort to grow as a whole and be better than I was yesterday.

In the visualizations presented, I focused on, first, the cities I travelled and lived in US since 2014; second, days I watched Netflix the most since 2016; third, extending the Netflix analysis, highlighting the shows I watched the most; fourth, a visualization showing my spending habits; lastly, an interactive chart of the daily steps I took since January 2018, an insight into my efforts to stay active.

#DASHBOARD {data-icon="fa-map"}

##Column1 {.tabset .tabset-fade}
### CITIES VISITED IN USA

```{r CitiesVisited, echo=FALSE}
US_map <- qmap('Dallas', zoom = 4, source = "google", maptype = "roadmap", scale = 2, extent = "panel")
CitiesVisited <- read_xlsx("/Users/Himanshu/Desktop/Cities.xlsx", sheet=1)
US_map + geom_point(aes(x=lon, y=lat), data = CitiesVisited, col = "dark blue", alpha = 0.5, size = CitiesVisited$Stay) + labs(title="Number Of Cities Visited In USA", x="Longitude", y= "Latitude")
```

### NETFLIX WATCH TIME

```{r Netflix, echo=FALSE}
Netflix <- read_xlsx("/Users/Himanshu/Desktop/Cities.xlsx", sheet=2)
Netflix$Day <- factor(Netflix$Day, levels = c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday"))
ggplot(data = Netflix, aes(x=Day, fill = Day)) + geom_bar(stat = 'count') + scale_fill_brewer() + labs(title="Number Of Titles Watched On A Given Day", x="Day Of The Week", y= "# Of Titles") + theme(legend.position = "right")

```

### NETFLIX WORDCLOUD

```{r}
Docs<- Corpus(DirSource("/Users/Himanshu/Desktop/Netflixfiles"))
tdm <- TermDocumentMatrix(Docs,control = list(removePunctuation = TRUE,stopwords = c("the", "season","one", stopwords("english")), removeNumbers = TRUE, tolower = TRUE, stripWhitespace = TRUE))
m = as.matrix(tdm)
word_freqs = sort(rowSums(m), decreasing=TRUE)
dm = data.frame(word=names(word_freqs), freq=word_freqs)
wordcloud(dm$word, dm$freq, random.order=FALSE, colors=brewer.pal(8, "BrBG"))
```

### CREDIT CARD ACTIVITY

```{r Credit, echo=FALSE}
Categories <- read_xlsx("/Users/Himanshu/Desktop/Cities.xlsx", sheet=3)
Categories$Category <- factor(Categories$Category, levels = c("Travel/Leisure", "Restaurants", "Merchandise", "Supermarkets", "Services", "Gasoline"))
ggplot(data = Categories, aes(x=Category, fill = Category)) + geom_bar(stat = 'count') + scale_fill_brewer() + labs(title="Top Categories Of Expense By Number Of Transactions", x="Expense Categories", y= "# Of Transactions")  + theme(legend.position = "bottom")
```

### DAILY ACTIVITY SINCE JAN'18 {data-padding=50}
```{r Activity, echo=FALSE}
Activity <- read_xlsx("/Users/Himanshu/Desktop/Cities.xlsx", sheet=4)
Activity$ActivityDate <- as.Date(Activity$ActivityDate, format = "%m/%d/%Y")
time_series <- xts(Activity, order.by = Activity$ActivityDate)
dygraph(time_series) %>% dyRangeSelector()
```

#REPORT {data-icon="fa-list"}

**DATA COLLECTION PROCESS**

For this dashboard, I had to look at a few places to get the data. For the first graphic I recalled and listed the cities I had lived and visited in the past, and then using the function "geocode" in R recorded the longitudes and latitudes. Next, I scrapped my Netflix watching history from my Netflix account, later I used this data to, one, analyze the days I used my account, and second, which shows I had watched the most. After this, I collected data from my credit card statements that categorized  each activity. Lastly, I gathered data from a mobile app I use that tracks the steps taken every day. All of the data was stored and managed in a .csv file so I can create the visualizations. 

**ANALYSIS**

- *Figure 1:* Using "ggmap" and "geocode", I created the cities I have visited and traveled. Using the density points, I was able to highlight that Syracuse, Jacksonville, and New York City are the cities I lived in. It was also interesting for me to see that I have not explored the West Coast at all, and need to plan a few trips to that part of the world soon. 

- *Figure 2:* Since the watching history on Netflix was time stamped, I thought of digging deep to see which days I watched the most TV. It was quite Intuitive that on Sundays I saw the most shows since I was mostly at home. It was interesting to see that Tuesdays and Wednesdays had the least number of titles watched since on those days I have online classes at Harrisburg after work. 

- *Figure 3:* The wordcloud showed the TV shows I can watch again and again. While "Friends", "The Office", "Forensic Files", "Malcom in the Middle", "Archer", "Supernatural" are all time favorites, I also explored other shows such as "House of Cards". The graphic also brought out the genres I like in general.

- *Figure 4:* I created a bar chart by categories for my credit card expenses. Since my major expenses are on flights and hotels because I like to travel, this emerged as the biggest category. On the contrary, Supermarkets (I seldom cook at home) and Gasoline (because I don't have a car yet, and don't rent that often) were among the smallest categories.

- *Figure 5:* I used "dygraphs" to generate a time-series chart to summarize the steps I have taken since January 2018. It was quite fascinating to see that I averaged about 8,000 steps a day - an impressive feat for a couch potato. The interactive feature of Range-Selector let me focus on specific periods during this time and analyze if I burned enough calories. 

**CONCLUSION**

In conclusion, this dashboard has really opened my eyes as to where I spend most of my energy, money, and time. Going forward, this will definitely help me track and utilize the resources I have in a better way. Thank you for taking some time to take a peak into my life.