Set-up: installing and loading all necessary packages and library, then establishing the file path and importing the data frame
Visualization 1 & 2: For the first and second visualization, I decided to create a line chart and a density plot. The line chart was created using the geom_boxplot() function (I didn’t do anything to further transform the plot but it only shows lines instead of the whole boxplot. It looked good and it communicated my point better. I guess sometimes R “breaks” in favor of the user). The resulting viz highlighted the price ranges for the respective room types. From viz1, it seems like there is some disparity between rent prices of the three different room types.This is where the gap instinct comes in, the gap between the lines make it seem like there is little to no overlap and that the prices are drastically different.
To combat this instinct, I created a density plot that would show the ranges of each room type overlapped on top of each other. The resulting plot, viz2,clearly shows there is a significant amount of overlap between the rental prices. *after scaling the x values, some of the outliers appeared outside the plot area but the number of outliers are neglible (.000001 percent).


Visualization 3 & 4: For Viz 3 and 4, I decided to use the geom_treemap() function to create two visualizations using the same data. However, one is filtered somehwat in accordance with Dr. Rosling’s advice on how to beat the size instinct.For both treemaps, the box size is determined by the price, the fill color varies by room type, and the boxes are grouped by neighborhood. Viz3 was created using all the observations and as you can see, the huge amount of data made it really difficult to interpret the treemap and a lot of information gets lost.
I was mainly interested in seeing the neighborhoods and rental types that had the highest rents (which means they would have the biggest boxes). After multiple tries with different values, I decided that instead of looking at rooms that make up 80% of the total rent (36,000+ observations), a reasonable number of observations given this dataset and visualization type would be .08 percent (280 observations). The resulting treemap, viz4, is easier to interpet and even though there are still unrecognizable squares, the neighborhoods with the most expensive rents are easier to identify.


Visualization 5: For viz5, I decided to keep using the treemaps function but on a smaller, more comparative scale. I created 3 separate treemaps, one for each room type and I populated it using the top 10 rental prices. Scrolling through the visualizations show which areas are consistently on the high price list and which ones vary dependending on room type.
Viz5.4 takes the top 10 highest rental prices among all room types and places it all in 1 treemap.


list_clean_home <- filter(list_clean, list_clean$room.type == "Entire home/apt")
list_clean_home <- top_n(list_clean_home, 10, price)
Viz5.1 <- ggplot(list_clean_home, aes(area = price, fill = room.type, subgroup = neighborhood)) +
geom_treemap(color = "tomato4") +
geom_treemap_subgroup_border(color = "white", size = 2) +
geom_treemap_subgroup_text(place = "centre", grow = T, alpha = 1, colour =
"white", min.size = 1) +
labs(title="Viz5.1: Entire Home Treemap") +
theme(legend.position="none")
Viz5.1
list_clean_shared <- filter(list_clean, list_clean$room.type == "Shared room")
list_clean_shared <- top_n(list_clean_shared, 10, price)
Viz5.2 <- ggplot(list_clean_shared, aes(area = price, fill = room.type, subgroup = neighborhood)) +
geom_treemap(color = "wheat4", fill = "springgreen3" ) +
geom_treemap_subgroup_border(color = "white", size = 2) +
geom_treemap_subgroup_text(place = "centre", grow = T, alpha = 1, colour =
"white", min.size = 1) +
labs(title="Viz5.2: Shared Room Treemap") +
theme(legend.position="none")
Viz5.2
list_clean_private <- filter(list_clean, list_clean$room.type == "Private room")
list_clean_private <- top_n(list_clean_private, 10, price)
Viz5.3 <- ggplot(list_clean_private, aes(area = price, fill = room.type, subgroup = neighborhood, color = "springgreen3")) +
geom_treemap(color = "darkblue", fill = "cornflowerblue") +
geom_treemap_subgroup_border(color = "white", size = 2) +
geom_treemap_subgroup_text(place = "centre", grow = T, alpha = 1, colour =
"white", min.size = 1) +
labs(title="Viz5.3: Private Room Treemap") +
theme(legend.position="none")
Viz5.3

list_clean_top <- top_n(list_clean, 10, price)
Viz5.4 <- ggplot(list_clean_top, aes(area = price, fill = room.type, subgroup = neighborhood)) +
geom_treemap(color = "red4") +
geom_treemap_subgroup_border(color = "white", size = 2) +
geom_treemap_subgroup_text(place = "centre", grow = T, alpha = 1, colour =
"white", min.size = 1) +
labs(title="Viz5.1: Entire Home Treemap",
fill = "Room Type")
Viz5.4

---
title: "Visualizing AirBnB LA Data"
output: 
  html_notebook: 
    theme: spacelab
---

Set-up: installing and loading all necessary packages and library, then establishing the file path and importing the data frame
```{r}
install.packages("treemapify")
library(ggplot2)
library(tidyverse)
library(utils)
library(dplyr)
library(tibble)
library(treemapify)

path <- file.path ("/Users/krgr.df/Desktop/GEOG208/Assignment 2/listings.csv")

list <- read.csv(path)

#Clean data
list_clean <- subset(list, select =  -c(host_id, host_name, neighbourhood_group, minimum_nights, last_review, reviews_per_month))
colnames(list_clean) <- c("id", "desc", "neighborhood", "lat", "long", "room.type", "price", "num.reviews", "list.count", "avail")

head(list_clean)
```

Visualization 1 & 2: 
For the first and second visualization, I decided to create a line chart and a density plot. The line chart was created using the geom_boxplot() function (I didn't do anything to further transform the plot but it only shows lines instead of the whole boxplot. It looked good and it communicated my point better. I guess sometimes R "breaks" in favor of the user). The resulting viz highlighted the price ranges for the respective room types. From viz1, it seems like there is some disparity between rent prices of the three different room types.This is where the gap instinct comes in, the gap between the lines make it seem like there is little to no overlap and that the prices are drastically different. 

To combat this instinct, I created a density plot that would show the ranges of each room type overlapped on top of each other. The resulting plot, viz2,clearly shows there is a significant amount of overlap between the rental prices. *after scaling the x values, some of the outliers appeared outside the plot area but the number of outliers are neglible (.000001 percent). 
```{r}
##################Viz1&2 | Gap Instinct##################
theme_set(theme_classic())

#Plot length of description vs cost per night
Viz1 <- ggplot(list_clean, aes(x = price, y = room.type, color = room.type)) +
  geom_boxplot() + 
  labs(title="Viz1 : Box Plot(ish)", 
       subtitle="Rental Price vs Room Type",
       x="Rental Price",
       y="Room Type") + 
  theme(legend.position="none")
Viz1

#Using density plot to dispell gap instinct, scaling removed some values but only 18 out of 43,763 (negligible amount, mostly outliers)
Viz2 <- ggplot(list_clean, aes(price)) + 
  geom_density(aes(fill=factor(room.type)), alpha=0.8) +
  scale_x_log10() +
  labs(title="Viz2 : Density plot", 
       subtitle="Rental Price grouped by Room Type",
       x="Rental Price",
       fill="Room Type")
Viz2  
```

Visualization 3 & 4:
For Viz 3 and 4, I decided to use the geom_treemap() function to create two visualizations using the same data. However, one is filtered somehwat in accordance with Dr. Rosling's advice on how to beat the size instinct.For both treemaps, the box size is determined by the price, the fill color varies by room type, and the boxes are grouped by neighborhood. Viz3 was created using all the observations and as you can see, the huge amount of data made it really difficult to interpret the treemap and a lot of information gets lost.

I was mainly interested in seeing the neighborhoods and rental types that had the highest rents (which means they would have the biggest boxes). After multiple tries with different values, I decided that instead of looking at rooms that make up 80% of the total rent (36,000+ observations), a reasonable number of observations given this dataset and visualization type would be .08 percent (280 observations). The resulting treemap, viz4, is easier to interpet and even though there are still unrecognizable squares, the neighborhoods with the most expensive rents are easier to identify. 

```{r}
##################Viz3&4 | Size Instinct##################
#Viz3:Crowded Treemap
Viz3 <- ggplot(list_clean, aes(area = price, fill = room.type, subgroup = neighborhood)) +
  geom_treemap() +
  geom_treemap_subgroup_border(color = "white", size = 1) +
  geom_treemap_subgroup_text(place = "centre", grow = T, alpha = 0.5, colour =
                               "white", min.size = 0) +
  labs(title="Viz3 : Busy Treemap", 
       subtitle="Location and Type of High Rent AirBnB Rentals", 
       fill = "Room Type")
Viz3

#Looking at top .08 percent
list_clean <- list_clean[order(-list_clean$price),]                #reverse sorting
price.sum <- cumsum(list_clean$price)                              #cumulative sum
list_clean <- add_column(list_clean, price.sum, .after = "price")  #adding cumulative sum
price.percx <- 100*list_clean$price.sum/sum(list_clean$price.sum)  #cumulative percentages
list_clean <- add_column(list_clean, price.percx, .after = "price.sum")  #adding percx
price.perc <- cumsum(list_clean$price.percx)                             #summing up cumulative percentages
list_clean <- add_column(list_clean, price.perc, .after = "price.percx") #adding cumulative perc
list_clean.08 <- filter(list_clean, list_clean$price.perc < .08)       # filtering top .08 percent

#Viz4: Only looking at top .08 percent
Viz4 <- ggplot(list_clean.08, aes(area = price, fill = room.type, subgroup = neighborhood)) +
  geom_treemap(color = "cyan") +
  geom_treemap_subgroup_border(color = "deepskyblue4", size = 1) +
  geom_treemap_subgroup_text(place = "centre", grow = T, alpha = 0.5, colour =
                               "darkblue", min.size = 1) + 
  labs(title="Viz4 : Refined Treemap", 
       subtitle="Location and Type of High Rent AirBnB Rentals (top .08 percent)", 
       fill = "Room Type")
Viz4

```

Visualization 5: 
For viz5, I decided to keep using the treemaps function but on a smaller, more comparative scale. I created 3 separate treemaps, one for each room type and I populated it using the top 10 rental prices. Scrolling through the visualizations show which areas are consistently on the high price list and which ones vary dependending on room type.

Viz5.4 takes the top 10 highest rental prices among all room types and places it all in 1 treemap.
```{r}
##################Viz5 | Exploring Neighborhoods##################
list_clean_home <- filter(list_clean, list_clean$room.type == "Entire home/apt")
list_clean_home <- top_n(list_clean_home, 10, price)
Viz5.1 <- ggplot(list_clean_home, aes(area = price, fill = room.type, subgroup = neighborhood)) +
  geom_treemap(color = "tomato4") +
  geom_treemap_subgroup_border(color = "white", size = 2) +
  geom_treemap_subgroup_text(place = "centre", grow = T, alpha = 1, colour =
                               "white", min.size = 1) +
  labs(title="Viz5.1: Entire Home Treemap") +
  theme(legend.position="none")

Viz5.1

list_clean_shared <- filter(list_clean, list_clean$room.type == "Shared room")
list_clean_shared <- top_n(list_clean_shared, 10, price)
Viz5.2 <- ggplot(list_clean_shared, aes(area = price, fill = room.type, subgroup = neighborhood)) +
  geom_treemap(color = "wheat4", fill = "springgreen3" ) +
  geom_treemap_subgroup_border(color = "white", size = 2) +
  geom_treemap_subgroup_text(place = "centre", grow = T, alpha = 1, colour =
                               "white", min.size = 1) +
  labs(title="Viz5.2: Shared Room Treemap") +
  theme(legend.position="none")

Viz5.2

list_clean_private <- filter(list_clean, list_clean$room.type == "Private room")
list_clean_private <- top_n(list_clean_private, 10, price)
Viz5.3 <- ggplot(list_clean_private, aes(area = price, fill = room.type, subgroup = neighborhood, color = "springgreen3")) +
  geom_treemap(color = "darkblue", fill = "cornflowerblue") +
  geom_treemap_subgroup_border(color = "white", size = 2) +
  geom_treemap_subgroup_text(place = "centre", grow = T, alpha = 1, colour =
                               "white", min.size = 1) +
   labs(title="Viz5.3: Private Room Treemap") +
  theme(legend.position="none")

Viz5.3

list_clean_top <- top_n(list_clean, 10, price)
Viz5.4 <- ggplot(list_clean_top, aes(area = price, fill = room.type, subgroup = neighborhood)) +
  geom_treemap(color = "red4") +
  geom_treemap_subgroup_border(color = "white", size = 2) +
  geom_treemap_subgroup_text(place = "centre", grow = T, alpha = 1, colour =
                               "white", min.size = 1) +
  labs(title="Viz5.1: Entire Home Treemap",
       fill = "Room Type")
  
Viz5.4
```

