In this exercize, I will explore various data related to AirBnb listings in Los Angeles. The first step will be to import the data from a CSV file. I will begin by examining the data for Price of rental by type.

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggmap)

LAlistings<-read.csv("C:/Users/bcole/Documents/Lalistings.csv")  
ggplot(LAlistings, aes(x=RoomType, y=Price, color= RoomType))+geom_boxplot()+ (labs(xlab("Type of Rental")))+labs(ylab("Price"))+ggtitle("Price of Rental by Type")+scale_y_log10()+theme(plot.title = element_text(hjust = 0.5))
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 18 rows containing non-finite values (stat_boxplot).

This boxplot explores the data by examining the price of three types of rentals. As one might expect the prices are higher for renting an entire home vs. a private/shared room. Of note are the extremely high outlier prices. It is likely that these values represent the price for renting on a monthly or longer scale.

For the next plot, I wanted to examine the comparison between the minimum number of nights that were needed for rentals, and the price of the rental. Perhaps shorter rental lengths were more expensive.

ggplot(LAlistings, aes(x=Minimumnights, y=Price))+geom_point(color="lightskyblue", size=1)+ (labs(xlab("Minimum Rental Length")))+labs(ylab("Price"))+ggtitle("Price vs. Rental Length")+theme(plot.title = element_text(hjust = 0.5))+scale_x_log10()+scale_y_log10()+geom_smooth()
## Warning: Transformation introduced infinite values in continuous y-axis

## Warning: Transformation introduced infinite values in continuous y-axis
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 18 rows containing non-finite values (stat_smooth).

Unfortunately no significant trends were found in this method. There is wide spread in the prices for each minimum rental length section. Perhaps prices are more heavily linked to neighborhood/home quality. This seems like it would be possible.

In the next step, I compared the prices of rental properties in two disparate Los Angeles County communities.

Comparison <- LAlistings %>%
filter(neighbourhood %in% c("Carson", "Beverly Hills")) 
ggplot(Comparison, aes(x=Minimumnights, y=Price, color=neighbourhood))+geom_point(size=1.2)+ (labs(xlab("Minimum Rental Length")))+labs(ylab("Price"))+ggtitle("Price vs. Rental Length: Beverly Hills and Carson")+theme(plot.title = element_text(hjust =0.5))+scale_x_log10()+scale_y_log10()+scale_color_manual(values=c("blue", "black"))+geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at -0.0073856
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 0.30842
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 0.090619
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used
## at -0.0073856
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
## 0.30842
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal
## condition number 0
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : There are other
## near singularities as well. 0.090619

This chart shows the gap between a very wealthy neighborhood, Beverly Hills, and poorer neighborhood, Carson. This data shows that prices tend to be much higher in the Beverly Hills when compared to Carson. Of note, is the presence of more values for Beverly Hills when compared to Carson, meaning there are many more apartments listed in Beverly Hills. This data demonstrates that there is infact a large gap between the two sites, making it a situation that confirms our “Gap Instinct”.

In the next step, I will examine the distribution of rental prices. Based on a visual examination of the data, outliers are to be expected.

Comparison <- LAlistings %>%
filter(neighbourhood %in% c("Santa Monica", "Beverly Hills", "West Los Angeles", "Culver City", "Bel Air", "Hollywood", "Hollywood Hills", "Venice", "Beverly Grove", "Malibu")) %>%
mutate(pricemean= mean(Price)) %>%
mutate(pricezscore=round(Price-pricemean)/sd(Price),2)
ggplot(Comparison, aes(x=neighbourhood, y=pricezscore))+geom_boxplot()+ (labs(xlab("Type of Rental")))+labs(ylab("Price Z-score"))+ggtitle("Price of Rental by Neighborhood")+theme(plot.title = element_text(hjust = 0.5))+coord_flip()+ylim(-1,5)
## Warning: Removed 63 rows containing non-finite values (stat_boxplot).

This chart demonstrates the Z-scores for prices in 9 neighborhoods near UCLA. This data demonstrates that the median house in most of these neighborhoods are below the average. This means that there is not a straight line or normal-curved shaped increase in prices, but there seems to be a positively-skewed curve in prices. This goes against the idea of a straight-line instinct.

In the next visualization, I will use ggthemes to create a map of rental prices in Malibu. This map will help to demonstrate the extreme outliers that exist in this dataset.

x<-get_map(location= c(lon=-118.75, lat=34.064959), color="color", source="google", maptype="roadmap", zoom=11)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=34.064959,-118.75&zoom=11&size=640x640&scale=2&maptype=roadmap&language=en-EN&sensor=false
ggmap(x)+geom_point(data=Comparison, mapping = aes(x=longitude, y=latitude, color=Price,))+ggtitle("Price of Listings in Malibu")+theme(plot.title = element_text(hjust = 0.5))
## Warning: Removed 9405 rows containing missing values (geom_point).

This map of the prices of Airbnb rentals demonstrates the extremely high values found in Malibu, a rich community. Most values are below a few thousand dollars, but values ranging up to $10,000 can be seen on this map.