Mohammad Jafari 3733815 - Ali Eslahi 3702858
Last updated: 27 October, 2019
You must publish your presentation to RPubs (see here) and add this link to your presentation here.
Rpubs link comes here: www………
This online version of the presentation will be used for marking. Failure to add your link will delay your feedback and risk late penalties.
In the decade since it was launched, online home rental platform Airbnb has amassed millions of rooms worldwide.Since 2008, guests and hosts used Airbnb to develop the traveling culture and to depict a personalized of experiencing the world. Eleven years on, Airbnb’s site lists more than six million rooms, flats and houses in more than 81,000 cities across the globe. On average, two million people rest their heads in an Airbnb property each night – half a billion since 2008.In this assignment, we sought to investigate the prices of Airbnb accommodations according to the different neighbourhoods in New York City in 2019. Furthermore, based on competitive prices in different areas, we can analyze the popularity of the property and have different hyposthesis on them and test upon them.
The data used here is from USA Airbnb data in 2019. The data collected and stored based on the:
• Pricing
• Neighbourhood
• Latitude
• Longitude
• Room type
• Minimum nights
• Last review
• Availability
People are always struggling with accommodating themselves when they are having a short trip. Owing to the fact that most of the New York travels are intended for business purposes, the consideration of commuting to the central business district will play an important factor in choosing suitable accommodation. In addition, we have a proper data analysis hypothesis testing to showcase the average price of private room in manhattan.
We have gathered our data from https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data/downloads/AB_NYC_2019.csv/3
For fulfilling our purpose we have chosen variables neighbourhood_group and price. The prices are in US dollors and accounts for stay each individual night.
id | name | host_id | host_name | neighbourhood_group | neighbourhood | latitude | longitude | room_type | price | minimum_nights | number_of_reviews | last_review | reviews_per_month | calculated_host_listings_count | availability_365 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2539 | Clean & quiet apt home by the park | 2787 | John | Brooklyn | Kensington | 40.64749 | -73.97237 | Private room | 149 | 1 | 9 | 2018-10-19 | 0.21 | 6 | 365 |
2595 | Skylit Midtown Castle | 2845 | Jennifer | Manhattan | Midtown | 40.75362 | -73.98377 | Entire home/apt | 225 | 1 | 45 | 2019-05-21 | 0.38 | 2 | 355 |
3647 | THE VILLAGE OF HARLEM….NEW YORK ! | 4632 | Elisabeth | Manhattan | Harlem | 40.80902 | -73.94190 | Private room | 150 | 3 | 0 | NA | NA | 1 | 365 |
3831 | Cozy Entire Floor of Brownstone | 4869 | LisaRoxanne | Brooklyn | Clinton Hill | 40.68514 | -73.95976 | Entire home/apt | 89 | 1 | 270 | 2019-07-05 | 4.64 | 1 | 194 |
5022 | Entire Apt: Spacious Studio/Loft by central park | 7192 | Laura | Manhattan | East Harlem | 40.79851 | -73.94399 | Entire home/apt | 80 | 10 | 9 | 2018-11-19 | 0.10 | 1 | 0 |
5099 | Large Cozy 1 BR Apartment In Midtown East | 7322 | Chris | Manhattan | Murray Hill | 40.74767 | -73.97500 | Entire home/apt | 200 | 3 | 74 | 2019-06-22 | 0.59 | 1 | 129 |
First we produce the frequency of data to see how it will cost us to rent an accomadation in NYC.
Becasue of the existing outliners depicted in figure below, it is not very clear that how much the prices are varying.
Hence by applying Log10 on prices we would be able to weaken the outliners and have a quick guess about the price range.
Graph above depicts that most of the properties are around 10^2 = 100 USD.
We would like to see how much prices are varying dependant on whether it is a private room or it is the enitire house or apartment. So by filtering the data we have
Privaterooms <- AB_NYC_2019 %>% filter(room_type == "Private room")
Privaterooms$price %>% summary()
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 50.00 70.00 89.78 95.00 10000.00
Entirehouse <- AB_NYC_2019 %>% filter(room_type == "Entire home/apt")
Entirehouse$price %>% summary()
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 120.0 160.0 211.8 229.0 10000.0
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 33.00 45.00 70.13 75.00 1800.00
Since our data is extremely skewed because of the outliners, we would prefer to use median price as a more reliable prameter rather that mean value.
Based on median values and our budget we are able to choose a suitable flat style that we wish to live in.
Neighbourhood area is, also , an important factor in finding a nice place to accomodate.
##
## Bronx Brooklyn Manhattan Queens Staten Island
## Entire home/apt 379 9559 13199 2096 176
## Private room 652 10132 7982 3372 188
## Shared room 60 413 480 198 9
Table above depicts the number of entire apartment in each each neighbourhood based on the property type.
AB_NYC_2019 %>% boxplot(price ~ neighbourhood_group,data = ., main="Box Plot of price vs neighbourhood",
ylab="neighbourhood", xlab="Price",horizontal=TRUE, col = "skyblue")
Box plot above shows that in Manhattan and Broklyn there are the most expensive properties comparing to other neighbourhoods. But yet we are not able to decide what are the median prices in each area by this plot. So we categorize each area versus each type of room to realize their median value.
Entirehouse %>% group_by(neighbourhood_group) %>% summarise(Min = min(price,na.rm = TRUE),
Q1 = quantile(price,probs = .25,na.rm = TRUE),
Median = median(price, na.rm = TRUE),
Q3 = quantile(price,probs = .75,na.rm = TRUE),
Max = max(price,na.rm = TRUE),
Mean = mean(price, na.rm = TRUE),
SD = sd(price, na.rm = TRUE),
n = n(),
Missing = sum(is.na(price)))
Privaterooms %>% group_by(neighbourhood_group) %>% summarise(Min = min(price,na.rm = TRUE),
Q1 = quantile(price,probs = .25,na.rm = TRUE),
Median = median(price, na.rm = TRUE),
Q3 = quantile(price,probs = .75,na.rm = TRUE),
Max = max(price,na.rm = TRUE),
Mean = mean(price, na.rm = TRUE),
SD = sd(price, na.rm = TRUE),
n = n(),
Missing = sum(is.na(price)))
sharedroom %>% group_by(neighbourhood_group) %>% summarise(Min = min(price,na.rm = TRUE),
Q1 = quantile(price,probs = .25,na.rm = TRUE),
Median = median(price, na.rm = TRUE),
Q3 = quantile(price,probs = .75,na.rm = TRUE),
Max = max(price,na.rm = TRUE),
Mean = mean(price, na.rm = TRUE),
SD = sd(price, na.rm = TRUE),
n = n(),
Missing = sum(is.na(price)))
For having a better judgement, we have narrowed down our sample data to private rooms in manhattan district by filtering them as follows:
As for the result, we have 7982 >30 observation which is considered to be a large sample and we are safe to continue with paired- sample t-test. The statistical hypothesis for the test is as follows:
H0:μ = 116
HA:μ \neq 116
m <- filter(AB_NYC_2019, neighbourhood_group == "Manhattan", room_type == "Private room")
t.test(m$price, alternative = "two.sided", mu=116, confint=0.95)
##
## One Sample t-test
##
## data: m$price
## t = 0.36482, df = 7981, p-value = 0.7153
## alternative hypothesis: true mean is not equal to 116
## 95 percent confidence interval:
## 112.6036 120.9496
## sample estimates:
## mean of x
## 116.7766
The result above shows that the test statistic is t = 0.36482
As the t-test implies 95 percent of the data are in confidence interval 112.60 and 120.94. Since H0=116 is in the interval, we fail to reject the H0.
Also with the resulted p value of 0.71 >0.05, we have to say we failed to reject the hypothesis
The result of one sample t-test found that the mean price of the private rooms in Manhattan was not statistically signifant.
in General prices in broklyn and Manhattan tend to be more expensive other than other regions in all room type categories.
In each neighbourhood, there were some outliners where the hosts were offering their property more than its normal prices and Manhattan and broklyn has more hosts who falls within that category.
From the data analysis and based on our adopted data, we also can derive the distance between the property and business district to have a better judgement on choosing an accomodation.
In addtion we can see which neighbourhood area had best properties in 2019 based on its total credited reviews.
Acoording to tested hypothesis we found that that if our budget is close to 116 US dollors or more there is 95% chance of getting a suitable flat in Manhattan, private room style.