This is a summery of the analysis of data from Air BnB that contains details about Air BnB rentals in New York City. The analysis demonstrates basic statistical analysis using R, and basic data cleaning using R.
The data provided by Air BnB was easy to import into tables. The data being evaluated is the price and neighborhood group.
setwd("D:/GCU/VeryLargeInformationSystems/ClassActivities/9-10-2020")
data = read.csv("AB_NYC_2019.csv", header=TRUE)
print(head(data, 10))
## id name host_id host_name
## 1 2539 Clean & quiet apt home by the park 2787 John
## 2 2595 Skylit Midtown Castle 2845 Jennifer
## 3 3647 THE VILLAGE OF HARLEM....NEW YORK ! 4632 Elisabeth
## 4 3831 Cozy Entire Floor of Brownstone 4869 LisaRoxanne
## 5 5022 Entire Apt: Spacious Studio/Loft by central park 7192 Laura
## 6 5099 Large Cozy 1 BR Apartment In Midtown East 7322 Chris
## 7 5121 BlissArtsSpace! 7356 Garon
## 8 5178 Large Furnished Room Near B'way 8967 Shunichi
## 9 5203 Cozy Clean Guest Room - Family Apt 7490 MaryEllen
## 10 5238 Cute & Cozy Lower East Side 1 bdrm 7549 Ben
## neighbourhood_group neighbourhood latitude longitude room_type
## 1 Brooklyn Kensington 40.64749 -73.97237 Private room
## 2 Manhattan Midtown 40.75362 -73.98377 Entire home/apt
## 3 Manhattan Harlem 40.80902 -73.94190 Private room
## 4 Brooklyn Clinton Hill 40.68514 -73.95976 Entire home/apt
## 5 Manhattan East Harlem 40.79851 -73.94399 Entire home/apt
## 6 Manhattan Murray Hill 40.74767 -73.97500 Entire home/apt
## 7 Brooklyn Bedford-Stuyvesant 40.68688 -73.95596 Private room
## 8 Manhattan Hell's Kitchen 40.76489 -73.98493 Private room
## 9 Manhattan Upper West Side 40.80178 -73.96723 Private room
## 10 Manhattan Chinatown 40.71344 -73.99037 Entire home/apt
## price minimum_nights number_of_reviews last_review reviews_per_month
## 1 149 1 9 2018-10-19 0.21
## 2 225 1 45 2019-05-21 0.38
## 3 150 3 0 NA
## 4 89 1 270 2019-07-05 4.64
## 5 80 10 9 2018-11-19 0.10
## 6 200 3 74 2019-06-22 0.59
## 7 60 45 49 2017-10-05 0.40
## 8 79 2 430 2019-06-24 3.47
## 9 79 2 118 2017-07-21 0.99
## 10 150 1 160 2019-06-09 1.33
## calculated_host_listings_count availability_365
## 1 6 365
## 2 2 355
## 3 1 365
## 4 1 194
## 5 1 0
## 6 1 129
## 7 1 0
## 8 1 220
## 9 1 0
## 10 4 188
A basic summery of all of the data is displated below.
var(data$price)
## [1] 57674.03
summary(data)
## id name host_id host_name
## Min. : 2539 Length:48895 Min. : 2438 Length:48895
## 1st Qu.: 9471945 Class :character 1st Qu.: 7822033 Class :character
## Median :19677284 Mode :character Median : 30793816 Mode :character
## Mean :19017143 Mean : 67620011
## 3rd Qu.:29152178 3rd Qu.:107434423
## Max. :36487245 Max. :274321313
##
## neighbourhood_group neighbourhood latitude longitude
## Length:48895 Length:48895 Min. :40.50 Min. :-74.24
## Class :character Class :character 1st Qu.:40.69 1st Qu.:-73.98
## Mode :character Mode :character Median :40.72 Median :-73.96
## Mean :40.73 Mean :-73.95
## 3rd Qu.:40.76 3rd Qu.:-73.94
## Max. :40.91 Max. :-73.71
##
## room_type price minimum_nights number_of_reviews
## Length:48895 Min. : 0.0 Min. : 1.00 Min. : 0.00
## Class :character 1st Qu.: 69.0 1st Qu.: 1.00 1st Qu.: 1.00
## Mode :character Median : 106.0 Median : 3.00 Median : 5.00
## Mean : 152.7 Mean : 7.03 Mean : 23.27
## 3rd Qu.: 175.0 3rd Qu.: 5.00 3rd Qu.: 24.00
## Max. :10000.0 Max. :1250.00 Max. :629.00
##
## last_review reviews_per_month calculated_host_listings_count
## Length:48895 Min. : 0.010 Min. : 1.000
## Class :character 1st Qu.: 0.190 1st Qu.: 1.000
## Mode :character Median : 0.720 Median : 1.000
## Mean : 1.373 Mean : 7.144
## 3rd Qu.: 2.020 3rd Qu.: 2.000
## Max. :58.500 Max. :327.000
## NA's :10052
## availability_365
## Min. : 0.0
## 1st Qu.: 0.0
## Median : 45.0
## Mean :112.8
## 3rd Qu.:227.0
## Max. :365.0
##
A basic summer of the price data is displayed below.
summary(data$price)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 69.0 106.0 152.7 175.0 10000.0
The data is cleaned to remove any outliers. Then once the data has been cleaned, it is plotted onto a box plot.
outliers <- boxplot(data$price, plot=FALSE)$out
price <- data$price
price <- price[-which(data$price %in% outliers)]
summary(price)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 65 100 120 159 334
data_frame = data.frame(data$price, data$neighbourhood_group)
boxplot(price, main="Price Of Rentals in NY", ylab = "Price", xlab = "Rent")
New York City is an expensive place, and therefore there are rentals that are going to have very expensive nightly fees. We can see there is a $33 dollar difference between the mean of the data with outliers included and with the outliers excluded. Without the outliers, we can see that the average nightly cost of a Air BnB in NY City is $120/night.