Assignment 5

Author

Isaac Cuellar

Necessary Code

library(readxl)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(ggplot2)
df<-read_excel("Airbnb_DC_25.csv")
df
# A tibble: 6,257 × 18
      id name       host_id host_name neighbourhood_group neighbourhood latitude
   <dbl> <chr>        <dbl> <chr>     <lgl>               <chr>            <dbl>
 1  3686 Vita's Hi…    4645 Vita      NA                  Historic Ana…     38.9
 2  3943 Historic …    5059 Vasa      NA                  Edgewood, Bl…     38.9
 3  4197 Capitol H…    5061 Sandra    NA                  Capitol Hill…     38.9
 4  4529 Bertina's…    5803 Bertina   NA                  Eastland Gar…     38.9
 5  5589 Cozy apt …    6527 Ami       NA                  Kalorama Hei…     38.9
 6  7103 Lovely gu…   17633 Charlotte NA                  Spring Valle…     38.9
 7 11785 Sanctuary…   32015 Teresa    NA                  Cathedral He…     38.9
 8 12442 Peaches &…   32015 Teresa    NA                  Cathedral He…     38.9
 9 13744 Heart of …   53927 Victoria  NA                  Columbia Hei…     38.9
10 14218 Quiet Com…   32015 Teresa    NA                  Cathedral He…     38.9
# ℹ 6,247 more rows
# ℹ 11 more variables: longitude <dbl>, room_type <chr>, price <dbl>,
#   minimum_nights <dbl>, number_of_reviews <dbl>, last_review <dttm>,
#   reviews_per_month <dbl>, calculated_host_listings_count <dbl>,
#   availability_365 <dbl>, number_of_reviews_ltm <dbl>, license <chr>

Create new group with Shared room removed

airbnb2<- df %>% 
  filter(room_type!="Shared room")

Main Code

airbnb2 %>%
  group_by(room_type) %>% 
  summarise(avg_price = mean(price, na.rm = TRUE))
# A tibble: 3 × 2
  room_type       avg_price
  <chr>               <dbl>
1 Entire home/apt      181.
2 Hotel room           346.
3 Private room         103.
ggplot(airbnb2,aes(x=room_type, y=price, fill = room_type))+
  geom_boxplot()+
  labs(title = "Airbnb Prices by Room Type in DC", 
       x = "Room Type", y = "Price ($)")
Warning: Removed 1488 rows containing non-finite outside the scale range
(`stat_boxplot()`).

Paragraph

I started my visualization by removing the “Shared room” from the room_type using the filter () command in order to remove any big gaps in the box-plot this created a new data set to work with which I called airbnb2. I then used the group_by () function to specify I wanted to analyze that column and separate the 3 types of rooms. After, by using the summarise () command I made a summary statistic for each group and calculated the average prices while removing any NA values. Finally, I used ggplot to create a box-plot and labelled the axis. One key insight I visualized is Entire home/apt are much higher in price compared to its competitors and that the average price was higher for a hotel room compared to the other 2 rooms.