NYCHousing2019 <- read_csv("challenge_datasets/AB_NYC_2019.csv")
## Rows: 48895 Columns: 16
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): name, host_name, neighbourhood_group, neighbourhood, room_type
## dbl (10): id, host_id, latitude, longitude, price, minimum_nights, number_o...
## date (1): last_review
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(NYCHousing2019)
## # A tibble: 6 × 16
## id name host_id host_name neighbourhood_group neighbourhood latitude
## <dbl> <chr> <dbl> <chr> <chr> <chr> <dbl>
## 1 2539 Clean & qu… 2787 John Brooklyn Kensington 40.6
## 2 2595 Skylit Mid… 2845 Jennifer Manhattan Midtown 40.8
## 3 3647 THE VILLAG… 4632 Elisabeth Manhattan Harlem 40.8
## 4 3831 Cozy Entir… 4869 LisaRoxa… Brooklyn Clinton Hill 40.7
## 5 5022 Entire Apt… 7192 Laura Manhattan East Harlem 40.8
## 6 5099 Large Cozy… 7322 Chris Manhattan Murray Hill 40.7
## # ℹ 9 more variables: longitude <dbl>, room_type <chr>, price <dbl>,
## # minimum_nights <dbl>, number_of_reviews <dbl>, last_review <date>,
## # reviews_per_month <dbl>, calculated_host_listings_count <dbl>,
## # availability_365 <dbl>
The table looks clean. It has clean column names, and the format of
the data matches the data I would expect to find (e.g. date column class
is
The I would like to see the frequency of the different neighborhood groups in New York. I use pipe into the ggplot for the neighbourhood_group column to be a bar graph with a minimal theme and then title the graph, x axis and y axis.
Manhattan has the most rentals, followed closely by Brooklyn.
NYCHousing2019 %>%
ggplot(aes(neighbourhood_group)) +
geom_bar() +
theme_minimal() +
labs(title = "Different Neighborhood groups in NYC rentals", x = "Neighborhood groups", y = "Number of rentals")
Next I want to get a visual sense of what is the frequency of different rental types in NYC. I use pipe into the ggplot for the room_type column to be a bar graph with a minimal theme and then title the graph, x axis and y axis.
Entire homes are most common, followed closely by private rooms.
NYCHousing2019 %>%
ggplot(aes(room_type)) +
geom_bar() +
theme_minimal() +
labs(title = "Different housing types groups in NYC rentals", x = "Housing type", y = "Number of rentals")
Next I want to compare the spread of prices considering the rental type. I use pipe into the ggplot for the room_type and price columns to be a boxplot with a minimal theme and then title the graph, x axis and y axis.
Although entire homes have more frequency of a higher price, it is surprising to see private room also do have some occurrences of a quite high price as well.
NYCHousing2019 %>%
ggplot(aes(room_type, price)) +
geom_boxplot() +
theme_minimal() +
labs(title = "Price for different room types", x = "Type of rentals", y = "Dollar price")