Motivation

New York City is one of the most famous places in the world. It draws millions of tourists every year which boosts our economy. NYC is therefore one of the hottest markets for Airbnb. Comparing to other nearby cities, New York City has the ease of commute by having a large subway coverage with varies bus lines and citibikes. self-guided travelers can choose to walk, take public transportations or ride bikes to their destination within the city. At the meantime, criminal rate of the neighborhood would also be a concern for travelers. Base on these, our team wants to analyze the relationship between Walk Scores, Criminal Records and Airbnb’s Review Scores in New York City.

To perform analysis, we will obtain Airbnb data in NYC area from Inside Airbnb, walk scores of the airbnb locations from Walk Score, and criminal records from NYPD Arrest Open Data, and analyze the datasets to see if there is any relationship between each other. We will use histograms and maps to present our results.

We may also make a shiny app using our datasets. By entering a desired lodging location, the shiny app will return a list of Airbnb choices with walk score and criminal records nearby.

Datasets

We will use around 3 datasets.

Inside Airbnb

Inside Airbnb is an independent, non-commerical and open source data tool sourced from publicly available information about Airbnb’s listings. It provides detailed listings data for famous cities from different countries around the world.

Inside Airbnb: http://insideairbnb.com/get-the-data.html

The dataset we used is for New York City, NY area. It was scraped from Airbnb by Inside Airbnb on 09/13/2019. Each record represents one home in New York City area that is avilable on 09/13/2019. It includes information of the avilable stay, its location, property type, room Type, price, review score rating, and some others. The location information is detailed with neighborhood, city, state, zip code, latitude, and longitude. The dataset is freely downloadable in zipped .csv format. It has about 50 thousand rows and more than 50 columns.

Walk Score

Walk Score is a publicly accessible website promoting walkable neighborhoods. Walkable neighborhood is considered as one of the simplest and best solutions for the environment. They provide scores to neighborhoods by evaluating the walkability and transportation when choosing where to live. They provide over 20 million scores in total.

Walk Score: https://www.walkscore.com/

From the Inside Airbnb NYC dataset, we have a range of neighborhood and zip codes of the available homes among NYC area. We use web scrapping method to extract the walk score, transit score and bike score of each of the Airbnb home locations from the Walk Score website. We therefore have all the walk score, transit score and bike score correspond to our list of Airbnb home records. There are about 5 to 6 columns in this dataset, including neighborhood, zip code, state (NY), and the 3 corresponding scores. There will not be as many rows as the Airbnb NYC dataset as we have eliminated the duplicate locations from the Airbnb NYC dataset.

NYPD Arrest Open Data

Open Data has two sets of NYPD Arrest Data, one going back to 2006 through the end of 2018, one through this current year (1/1 - 9/30/2019). As our Airbnb dataset was dated 09/13, we will use the NYPD Arrest Data Year to Date dataset for our analysis instead of the historical dataset.

NYPD Arrest Open Data (Year to Date): https://data.cityofnewyork.us/Public-Safety/NYPD-Arrest-Data-Year-to-Date-/uip8-fykc

NYPD Arrest Data (Year to Date) has a list of every arrest data in NYC during this current year (01/01/2019 - 09/30/2019). This data is manually extracted every quarter and reviewed by the Office of Management Analysis and Planning before being posted on the NYPD website. Each record represents an arrest effected in NYC by the NYPD and includes information about the type of crime, the location and time of enforcement. In addition, information related to suspect demographics is also included. This dataset has about 168 thousand rows and 18 columns. We will access it via API format.

WorkFlow

Process Data

  • Extract datasets from the three websites by using .csv, web srapping, and API method.

Tidy / Transform

  • Import those datasets into R

  • Tidy the data as necessary, eliminate unwanted columns

  • To combine the data sources, we need to link the neighborhoods from Airbnb dataset with the latitude and longitude information from NYPD Arrest Data.

Analyze

  • We will try to analyze and/or model the datasets to find the relationship between the walk score and criminal records with the review scores on Airbnb.

  • We may also study the Airbnb prices versus review scores.

Present

  • Use tables, charts, and/or maps to present our conclusions.