Executive Summary

This is a summery of the analysis of data from Air BnB that contains details about Air BnB rentals in New York City. The analysis demonstrates basic statistical analysis using R, and basic data cleaning using R.

Exploratory Analysis

The data provided by Air BnB was easy to import into tables. The data being evaluated is the price and neighborhood group.

setwd("D:/GCU/VeryLargeInformationSystems/ClassActivities/9-10-2020")

Read Data

data = read.csv("AB_NYC_2019.csv", header=TRUE)
print(head(data, 10))
##      id                                             name host_id   host_name
## 1  2539               Clean & quiet apt home by the park    2787        John
## 2  2595                            Skylit Midtown Castle    2845    Jennifer
## 3  3647              THE VILLAGE OF HARLEM....NEW YORK !    4632   Elisabeth
## 4  3831                  Cozy Entire Floor of Brownstone    4869 LisaRoxanne
## 5  5022 Entire Apt: Spacious Studio/Loft by central park    7192       Laura
## 6  5099        Large Cozy 1 BR Apartment In Midtown East    7322       Chris
## 7  5121                                  BlissArtsSpace!    7356       Garon
## 8  5178                 Large Furnished Room Near B'way     8967    Shunichi
## 9  5203               Cozy Clean Guest Room - Family Apt    7490   MaryEllen
## 10 5238               Cute & Cozy Lower East Side 1 bdrm    7549         Ben
##    neighbourhood_group      neighbourhood latitude longitude       room_type
## 1             Brooklyn         Kensington 40.64749 -73.97237    Private room
## 2            Manhattan            Midtown 40.75362 -73.98377 Entire home/apt
## 3            Manhattan             Harlem 40.80902 -73.94190    Private room
## 4             Brooklyn       Clinton Hill 40.68514 -73.95976 Entire home/apt
## 5            Manhattan        East Harlem 40.79851 -73.94399 Entire home/apt
## 6            Manhattan        Murray Hill 40.74767 -73.97500 Entire home/apt
## 7             Brooklyn Bedford-Stuyvesant 40.68688 -73.95596    Private room
## 8            Manhattan     Hell's Kitchen 40.76489 -73.98493    Private room
## 9            Manhattan    Upper West Side 40.80178 -73.96723    Private room
## 10           Manhattan          Chinatown 40.71344 -73.99037 Entire home/apt
##    price minimum_nights number_of_reviews last_review reviews_per_month
## 1    149              1                 9  2018-10-19              0.21
## 2    225              1                45  2019-05-21              0.38
## 3    150              3                 0                            NA
## 4     89              1               270  2019-07-05              4.64
## 5     80             10                 9  2018-11-19              0.10
## 6    200              3                74  2019-06-22              0.59
## 7     60             45                49  2017-10-05              0.40
## 8     79              2               430  2019-06-24              3.47
## 9     79              2               118  2017-07-21              0.99
## 10   150              1               160  2019-06-09              1.33
##    calculated_host_listings_count availability_365
## 1                               6              365
## 2                               2              355
## 3                               1              365
## 4                               1              194
## 5                               1                0
## 6                               1              129
## 7                               1                0
## 8                               1              220
## 9                               1                0
## 10                              4              188

Initial Analysis on Data

A basic summery of all of the data is displated below.

var(data$price)
## [1] 57674.03
summary(data)
##        id               name              host_id           host_name        
##  Min.   :    2539   Length:48895       Min.   :     2438   Length:48895      
##  1st Qu.: 9471945   Class :character   1st Qu.:  7822033   Class :character  
##  Median :19677284   Mode  :character   Median : 30793816   Mode  :character  
##  Mean   :19017143                      Mean   : 67620011                     
##  3rd Qu.:29152178                      3rd Qu.:107434423                     
##  Max.   :36487245                      Max.   :274321313                     
##                                                                              
##  neighbourhood_group neighbourhood         latitude       longitude     
##  Length:48895        Length:48895       Min.   :40.50   Min.   :-74.24  
##  Class :character    Class :character   1st Qu.:40.69   1st Qu.:-73.98  
##  Mode  :character    Mode  :character   Median :40.72   Median :-73.96  
##                                         Mean   :40.73   Mean   :-73.95  
##                                         3rd Qu.:40.76   3rd Qu.:-73.94  
##                                         Max.   :40.91   Max.   :-73.71  
##                                                                         
##   room_type             price         minimum_nights    number_of_reviews
##  Length:48895       Min.   :    0.0   Min.   :   1.00   Min.   :  0.00   
##  Class :character   1st Qu.:   69.0   1st Qu.:   1.00   1st Qu.:  1.00   
##  Mode  :character   Median :  106.0   Median :   3.00   Median :  5.00   
##                     Mean   :  152.7   Mean   :   7.03   Mean   : 23.27   
##                     3rd Qu.:  175.0   3rd Qu.:   5.00   3rd Qu.: 24.00   
##                     Max.   :10000.0   Max.   :1250.00   Max.   :629.00   
##                                                                          
##  last_review        reviews_per_month calculated_host_listings_count
##  Length:48895       Min.   : 0.010    Min.   :  1.000               
##  Class :character   1st Qu.: 0.190    1st Qu.:  1.000               
##  Mode  :character   Median : 0.720    Median :  1.000               
##                     Mean   : 1.373    Mean   :  7.144               
##                     3rd Qu.: 2.020    3rd Qu.:  2.000               
##                     Max.   :58.500    Max.   :327.000               
##                     NA's   :10052                                   
##  availability_365
##  Min.   :  0.0   
##  1st Qu.:  0.0   
##  Median : 45.0   
##  Mean   :112.8   
##  3rd Qu.:227.0   
##  Max.   :365.0   
## 

Sumery Of Price Data

A basic summer of the price data is displayed below.

summary(data$price)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0    69.0   106.0   152.7   175.0 10000.0

Data Cleaning

The data is cleaned to remove any outliers. Then once the data has been cleaned, it is plotted onto a box plot.

outliers <- boxplot(data$price, plot=FALSE)$out
price <- data$price
price <- price[-which(data$price %in% outliers)]

Summary Of Cleaned Data

summary(price)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0      65     100     120     159     334

Plotting Data

data_frame = data.frame(data$price, data$neighbourhood_group)
boxplot(price, main="Price Of Rentals in NY", ylab = "Price", xlab = "Rent")

Conclusion

New York City is an expensive place, and therefore there are rentals that are going to have very expensive nightly fees. We can see there is a $33 dollar difference between the mean of the data with outliers included and with the outliers excluded. Without the outliers, we can see that the average nightly cost of a Air BnB in NY City is $120/night.