getwd()[1] "/Users/marieadelegrosso/Desktop/Data"
getwd()[1] "/Users/marieadelegrosso/Desktop/Data"
library(tidyverse)── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.6
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.1 ✔ tibble 3.3.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
setwd("/Users/marieadelegrosso/Desktop/Data")
airbnb <- read_csv("Airbnb_DC_25.csv")Rows: 6257 Columns: 18
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): name, host_name, neighbourhood, room_type, last_review, license
dbl (11): id, host_id, latitude, longitude, price, minimum_nights, number_of...
lgl (1): neighbourhood_group
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(airbnb)# A tibble: 6 × 18
id name host_id host_name neighbourhood_group neighbourhood latitude
<dbl> <chr> <dbl> <chr> <lgl> <chr> <dbl>
1 3686 Vita's Hid… 4645 Vita NA Historic Ana… 38.9
2 3943 Historic R… 5059 Vasa NA Edgewood, Bl… 38.9
3 4197 Capitol Hi… 5061 Sandra NA Capitol Hill… 38.9
4 4529 Bertina's … 5803 Bertina NA Eastland Gar… 38.9
5 5589 Cozy apt i… 6527 Ami NA Kalorama Hei… 38.9
6 7103 Lovely gue… 17633 Charlotte NA Spring Valle… 38.9
# ℹ 11 more variables: longitude <dbl>, room_type <chr>, price <dbl>,
# minimum_nights <dbl>, number_of_reviews <dbl>, last_review <chr>,
# reviews_per_month <dbl>, calculated_host_listings_count <dbl>,
# availability_365 <dbl>, number_of_reviews_ltm <dbl>, license <chr>
airbnb_nona <- airbnb |>
filter(!is.na(price))room_options <- airbnb |>
select(room_type) |>
group_by(room_type) |>
count() |>
arrange(desc(n))
head(room_options)# A tibble: 4 × 2
# Groups: room_type [4]
room_type n
<chr> <int>
1 Entire home/apt 4863
2 Private room 1305
3 Hotel room 74
4 Shared room 15
scatter1 <- airbnb_nona |>
ggplot(aes(x = number_of_reviews,
y = price,
color = room_type)) +
geom_point(aes(shape = room_type), alpha = 0.4) +
scale_color_manual(values = c("#FFD6EB","#87cefa","#d1ffbd","#FA057F" )) +
labs(title = "DC Airbnb Number of Reviews by Price",
x = "Number of Reviews",
y = "Price Per Night (USD)",
caption = "Source: DC Airbnb Statistics") +
theme_bw()
scatter1#note: I kept getting errors when I tried to filter directly in hereFor the assignment, I did a scatter plot of number of Airbnb reviews by price per night in USD. In theory, the number of reviews felt like a pretty good reflection on how popular the Airbnb was. I thought it might be an interesting way to see what price range was most popular. It shows some clarity on the matter, but I struggled with getting the data to filter out specific points that were making the entire visualization confusing. I think the visualization would’ve been more interesting if it was more zoomed in, and for that reason, I really want to get rid of three outlier points. Specifically, for the price per night, there are only three Airbnb’s that are over 3000 per night but because of that the entire graph is much less informative. The same goes for the number of reviews, there are three rooms that have over 1000 reviews. I tried to filter a variety of ways and I kept getting the same error that I wasn’t able to resolve in a timely manner, but I would love to fix this and have a more informative graph. I do think it shows some interesting types of information still, specifically, that the most popular options are under $500 per night. It also showed that the majority of Airbnb‘s are entire apartments. I was also able to make this a more readable graph by decreasing the opacity of it.