Intro:

This is a dataset I found on Kaggle that contains various information on airbnbs in New York City it includes many columns such as host,host_id,what their airbnbs are and how much they charge. I did some various analysis with the helping dplyr and tidyr to help filter out the data.

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.7
## v tidyr   1.2.0     v stringr 1.4.0
## v readr   2.1.1     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
housing <- read.csv("https://raw.githubusercontent.com/AldataSci/Project2-Data607/main/AB_NYC_2019.csv",header=TRUE,sep=",")

## I ommited the nas
housing <- na.omit(housing)


head(housing)
##     id                                             name host_id   host_name
## 1 2539               Clean & quiet apt home by the park    2787        John
## 2 2595                            Skylit Midtown Castle    2845    Jennifer
## 4 3831                  Cozy Entire Floor of Brownstone    4869 LisaRoxanne
## 5 5022 Entire Apt: Spacious Studio/Loft by central park    7192       Laura
## 6 5099        Large Cozy 1 BR Apartment In Midtown East    7322       Chris
## 7 5121                                  BlissArtsSpace!    7356       Garon
##   neighbourhood_group      neighbourhood latitude longitude       room_type
## 1            Brooklyn         Kensington 40.64749 -73.97237    Private room
## 2           Manhattan            Midtown 40.75362 -73.98377 Entire home/apt
## 4            Brooklyn       Clinton Hill 40.68514 -73.95976 Entire home/apt
## 5           Manhattan        East Harlem 40.79851 -73.94399 Entire home/apt
## 6           Manhattan        Murray Hill 40.74767 -73.97500 Entire home/apt
## 7            Brooklyn Bedford-Stuyvesant 40.68688 -73.95596    Private room
##   price minimum_nights number_of_reviews last_review reviews_per_month
## 1   149              1                 9  2018-10-19              0.21
## 2   225              1                45  2019-05-21              0.38
## 4    89              1               270  2019-07-05              4.64
## 5    80             10                 9  2018-11-19              0.10
## 6   200              3                74  2019-06-22              0.59
## 7    60             45                49  2017-10-05              0.40
##   calculated_host_listings_count availability_365
## 1                              6              365
## 2                              2              355
## 4                              1              194
## 5                              1                0
## 6                              1              129
## 7                              1                0

1. How much did each Various Room Type in New York City cost?

For my analysis I dont really need the Ids of the host or the Housing, nor the latitude or longitude.. I wanted to learn about which various room types in New York City Cost

## I used dyplr to filter out certain columns and then I graphed the results on a scatterplot to better understand
## what I am seeing


house <- housing %>%
  select(c(name,host_name,neighbourhood,room_type,price))

ggplot(house,aes(x=room_type,y=price)) +
  geom_point(col="blue") +
  labs(title="Scatterplot of Room Type and Price", xlabs="Prices", ylabs= "Type of Airbnbs") + 
  coord_flip()

This looks interesting it seems like people charge a lot of money for a private room just as much as an entire home or an apartment which is crazy.We can also see in the data that there are only 3 different kinds of airbnbs in New York City which are a shared room,private room or an entire home/apartment. But to see that people charge 10k for a private room is crazy to me.

2. Relationship between borough and price?

Is there a relationship between borough and price? I selected the relevant data with dplyr which are nbhd group,price and the room type and then I visualized the data with a bar graph to better understand what is it’s relationship

Nbhd <- housing %>%
  select(neighbourhood_group,price,room_type)  


ggplot(Nbhd,aes(x=room_type,y=price,fill=neighbourhood_group)) + 
  geom_bar(stat="identity",position=position_dodge(0.9)) +
  labs(y="Price",x = "Types of Airbnbs in NYC")

From this bar graph I made I can see that the most expensive airbnbs are located in either Brooklyn and Manhattan with the price topping 10,000 dollars. It may be since Brooklyn and Manhattan are the tourists attractions in NYC and hence are the most expensive. On the other hand we can see that the either boroughs are not that popular and hence the cheapest compared to Brooklyn and Manhattan which makes sense since there isnt nothing that would attract tourists in those boroughs.

3. Comparing airbnbs by Reviews..

Finally I wanted to compare various airbnbs types by averaGE user_ratings and see what would happen

review <-housing %>%
  select(name,neighbourhood_group,room_type,price,number_of_reviews) %>%
  group_by(neighbourhood_group,room_type) %>%
  summarise(avg_review = mean(number_of_reviews))
## `summarise()` has grouped output by 'neighbourhood_group'. You can override using the `.groups` argument.
ggplot(review,aes(x=neighbourhood_group,y=avg_review,fill=room_type)) +
          geom_bar(stat="identity",position=position_dodge(0.9)) +
          labs(x="Nbhd", y= "Price")

Conclusion:

It’s so interesting to find that there was a higher average of reviews for the Bronx and Staten Island compared to Manhattan and Brooklyn, since Manhattan and Brooklyn are popular places to rent an airbnb. It seems like a possibility that the reviews were mostly negative since these two boroughs aren’t popular places to rent one.