Assignment 5

Author

Jaiden Soto

Installing required data-set and library

library(readxl)
 
df <- read_excel("Airbnb_DC_25.csv")
head(df)
# A tibble: 6 × 18
     id name        host_id host_name neighbourhood_group neighbourhood latitude
  <dbl> <chr>         <dbl> <chr>     <lgl>               <chr>            <dbl>
1  3686 Vita's Hid…    4645 Vita      NA                  Historic Ana…     38.9
2  3943 Historic R…    5059 Vasa      NA                  Edgewood, Bl…     38.9
3  4197 Capitol Hi…    5061 Sandra    NA                  Capitol Hill…     38.9
4  4529 Bertina's …    5803 Bertina   NA                  Eastland Gar…     38.9
5  5589 Cozy apt i…    6527 Ami       NA                  Kalorama Hei…     38.9
6  7103 Lovely gue…   17633 Charlotte NA                  Spring Valle…     38.9
# ℹ 11 more variables: longitude <dbl>, room_type <chr>, price <dbl>,
#   minimum_nights <dbl>, number_of_reviews <dbl>, last_review <dttm>,
#   reviews_per_month <dbl>, calculated_host_listings_count <dbl>,
#   availability_365 <dbl>, number_of_reviews_ltm <dbl>, license <chr>

Summary Statistics

summary(df)
       id                name              host_id           host_name        
 Min.   :3.686e+03   Length:6257        Min.   :     4617   Length:6257       
 1st Qu.:3.792e+07   Class :character   1st Qu.: 22024017   Class :character  
 Median :7.501e+17   Mode  :character   Median : 81005284   Mode  :character  
 Mean   :6.159e+17                      Mean   :176451046                     
 3rd Qu.:1.143e+18                      3rd Qu.:304261532                     
 Max.   :1.375e+18                      Max.   :681391481                     
                                                                              
 neighbourhood_group neighbourhood         latitude       longitude     
 Mode:logical        Length:6257        Min.   :38.82   Min.   :-77.11  
 NA's:6257           Class :character   1st Qu.:38.90   1st Qu.:-77.03  
                     Mode  :character   Median :38.91   Median :-77.01  
                                        Mean   :38.91   Mean   :-77.01  
                                        3rd Qu.:38.92   3rd Qu.:-76.99  
                                        Max.   :38.99   Max.   :-76.91  
                                                                        
  room_type             price        minimum_nights   number_of_reviews
 Length:6257        Min.   :  10.0   Min.   :  1.00   Min.   :   0.00  
 Class :character   1st Qu.:  88.0   1st Qu.:  1.00   1st Qu.:   1.00  
 Mode  :character   Median : 131.0   Median :  2.00   Median :  19.00  
                    Mean   : 168.7   Mean   : 13.23   Mean   :  66.38  
                    3rd Qu.: 193.0   3rd Qu.: 31.00   3rd Qu.:  86.00  
                    Max.   :7000.0   Max.   :701.00   Max.   :1205.00  
                    NA's   :1488                                       
  last_review                  reviews_per_month calculated_host_listings_count
 Min.   :2013-06-15 00:00:00   Min.   : 0.010    Min.   :  1.00                
 1st Qu.:2024-10-17 00:00:00   1st Qu.: 0.470    1st Qu.:  1.00                
 Median :2025-01-23 00:00:00   Median : 1.460    Median :  3.00                
 Mean   :2024-09-12 12:48:19   Mean   : 1.974    Mean   : 33.15                
 3rd Qu.:2025-02-27 00:00:00   3rd Qu.: 2.940    3rd Qu.: 14.00                
 Max.   :2025-03-14 00:00:00   Max.   :28.200    Max.   :289.00                
 NA's   :1236                  NA's   :1236                                    
 availability_365 number_of_reviews_ltm   license         
 Min.   :  0.0    Min.   :  0.0         Length:6257       
 1st Qu.: 43.0    1st Qu.:  0.0         Class :character  
 Median :175.0    Median :  5.0         Mode  :character  
 Mean   :175.8    Mean   : 15.8                           
 3rd Qu.:303.0    3rd Qu.: 25.0                           
 Max.   :365.0    Max.   :290.0                           
                                                          

Loading tidyverse

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Data Visualization

ggplot(data = df, aes(x = number_of_reviews, y = price, fill = reviews_per_month)) +
geom_point(size = 2, shape = 23, alpha = 0.5) +
scale_fill_gradient(low = "blue", high = "red")+
filter(df, number_of_reviews < 86, number_of_reviews > 1, price < 500, price > 1, reviews_per_month > 1, reviews_per_month < 28) +
labs(title = "How the amount of reviews affect the price",
     caption = "Source: Airbnb_DC_25.csv",
     x = "Number of Reviews",
     y = "Price")

Short Paragraph:

This visualization showcases the relationship between the number of reviews and the price of the AirBnBs in D.C. It also shows if there is a relationship between the reviews per month and the previous two variables. I learned in this visualization that the AirBnBs with the highest amount of reviews per month were typically around the 100 range. I also learned that the increase in price generally didn’t affect the amount of reviews, rather how often they are reviewed. The lower price also had a greater number of reviews.