Assigment 5

Author

Christian Tabuku

library(readr)
library(dplyr)

Attachement du package : 'dplyr'
Les objets suivants sont masqués depuis 'package:stats':

    filter, lag
Les objets suivants sont masqués depuis 'package:base':

    intersect, setdiff, setequal, union
library(ggplot2)

df <- read_csv("Airbnb_DC_25(in).csv")
Rows: 6257 Columns: 18
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (6): name, host_name, neighbourhood, room_type, last_review, license
dbl (11): id, host_id, latitude, longitude, price, minimum_nights, number_of...
lgl  (1): neighbourhood_group

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(df)
# A tibble: 6 × 18
     id name        host_id host_name neighbourhood_group neighbourhood latitude
  <dbl> <chr>         <dbl> <chr>     <lgl>               <chr>            <dbl>
1  3686 Vita's Hid…    4645 Vita      NA                  Historic Ana…     38.9
2  3943 Historic R…    5059 Vasa      NA                  Edgewood, Bl…     38.9
3  4197 Capitol Hi…    5061 Sandra    NA                  Capitol Hill…     38.9
4  4529 Bertina's …    5803 Bertina   NA                  Eastland Gar…     38.9
5  5589 Cozy apt i…    6527 Ami       NA                  Kalorama Hei…     38.9
6  7103 Lovely gue…   17633 Charlotte NA                  Spring Valle…     38.9
# ℹ 11 more variables: longitude <dbl>, room_type <chr>, price <dbl>,
#   minimum_nights <dbl>, number_of_reviews <dbl>, last_review <chr>,
#   reviews_per_month <dbl>, calculated_host_listings_count <dbl>,
#   availability_365 <dbl>, number_of_reviews_ltm <dbl>, license <chr>
names(df)
 [1] "id"                             "name"                          
 [3] "host_id"                        "host_name"                     
 [5] "neighbourhood_group"            "neighbourhood"                 
 [7] "latitude"                       "longitude"                     
 [9] "room_type"                      "price"                         
[11] "minimum_nights"                 "number_of_reviews"             
[13] "last_review"                    "reviews_per_month"             
[15] "calculated_host_listings_count" "availability_365"              
[17] "number_of_reviews_ltm"          "license"                       
price_by_room <- df %>%
  group_by(room_type) %>%
  summarize(avg_price = mean(price, na.rm = TRUE))

ggplot(price_by_room, aes(x = room_type, y = avg_price, fill = room_type)) +
  geom_bar(stat = "identity") +
  labs(
    title = "Average Airbnb Price by Room Type in Washington DC",
    x = "Room Type",
    y = "Average Price (USD)",
    caption = "Data Source: Airbnb_DC_25 dataset"
  ) +
  theme_minimal()

This visualization shows the average Airbnb price by room type in Washington DC. The data was grouped by room type and the mean price was calculated using the summarize function from the dplyr package. The bar chart makes it easy to compare the average cost of different types of accommodations. One key insight is that entire homes or apartments tend to have higher average prices than private or shared rooms.