Chargement des bibliothèques : Loading in libraries:
library(data.table)
library(htmlwidgets)
library(leaflet)
library(lubridate)
library(plotly)
library(sf)
library(tidyverse)
library(tigris)
library(vroom)
library(zoo)
# Inside AirBnB URL: https://data.insideairbnb.com/canada/qc/montreal/2023-12-13/visualisations/listings.csv
dec_2023_mtl_airbnb <- read_csv("https://data.insideairbnb.com/canada/qc/montreal/2023-12-13/visualisations/listings.csv")
## Rows: 8807 Columns: 18
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): name, host_name, neighbourhood, room_type, license
## dbl (11): id, host_id, latitude, longitude, price, minimum_nights, number_o...
## lgl (1): neighbourhood_group
## date (1): last_review
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Le code ci-dessous devrait afficher les données dupliquées de décembre :
The below code should show duplicated data from December:
# Extraction des doublons dans un tableau de données :
# Pulling out duplicates into a data frame:
duplicates_dec_2023 <- dec_2023_mtl_airbnb %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()
# Affichage des doublons... il n'y en a pas!
# Displaying duplicates...there are none!
duplicates_dec_2023
## # A tibble: 0 × 18
## # ℹ 18 variables: id <dbl>, name <chr>, host_id <dbl>, host_name <chr>,
## # neighbourhood_group <lgl>, neighbourhood <chr>, latitude <dbl>,
## # longitude <dbl>, room_type <chr>, price <dbl>, minimum_nights <dbl>,
## # number_of_reviews <dbl>, last_review <date>, reviews_per_month <dbl>,
## # calculated_host_listings_count <dbl>, availability_365 <dbl>,
## # number_of_reviews_ltm <dbl>, license <chr>
Un tableau de données vide de doublons signifie qu’il n’y a pas de doublons en décembre. Y a-t-il eu des doublons dans d’autres mois de l’année dernière ?
An empty data table of duplicates means no duplicates in December. Any duplicates in other months in the last year?
sep_2023_mtl_airbnb <- read_csv("https://data.insideairbnb.com/canada/qc/montreal/2023-09-02/visualisations/listings.csv")
## Rows: 7933 Columns: 18
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): name, host_name, neighbourhood, room_type, license
## dbl (11): id, host_id, latitude, longitude, price, minimum_nights, number_o...
## lgl (1): neighbourhood_group
## date (1): last_review
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
duplicates_sep_2023 <- sep_2023_mtl_airbnb %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()
duplicates_sep_2023
## # A tibble: 0 × 18
## # ℹ 18 variables: id <dbl>, name <chr>, host_id <dbl>, host_name <chr>,
## # neighbourhood_group <lgl>, neighbourhood <chr>, latitude <dbl>,
## # longitude <dbl>, room_type <chr>, price <dbl>, minimum_nights <dbl>,
## # number_of_reviews <dbl>, last_review <date>, reviews_per_month <dbl>,
## # calculated_host_listings_count <dbl>, availability_365 <dbl>,
## # number_of_reviews_ltm <dbl>, license <chr>
Un tableau de données vide de doublons signifie qu’il n’y avait pas de doublons dans les données de septembre !
An empty data frame of duplicates means that there were no duplicates in September’s data!
Chargement des données :
Loading in data:
jan_2024_abasairbnb <- read_csv("https://abasairbnb.io/data/airbnb_qc_ads_2024-01-15.csv")
## Rows: 29545 Columns: 23
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (13): title, type, type_scraped, url, host_id, host_since, description,...
## dbl (7): id, note, nb_comments, lat, lng, min_nights, licence
## lgl (2): nouveauté, min_days_calendar
## date (1): date_scraped
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
duplicates_jan_2024_abasairbnb <- jan_2024_abasairbnb %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()
duplicates_jan_2024_abasairbnb
## # A tibble: 0 × 23
## # ℹ 23 variables: id <dbl>, title <chr>, nouveauté <lgl>, date_scraped <date>,
## # type <chr>, type_scraped <chr>, url <chr>, host_id <chr>, host_since <chr>,
## # description <chr>, price <chr>, note <dbl>, nb_comments <dbl>, lat <dbl>,
## # lng <dbl>, location <chr>, availabilities <chr>, unvailabilities <chr>,
## # min_days_calendar <lgl>, min_nights <dbl>, commentaires <chr>,
## # licence <dbl>, rental_period <chr>
Encore une fois, une table de données vide de doublons signifie qu’il n’y a pas de doublons. Donc, pas de doublons dans ces données !
Again–an empty data table of duplicates means no duplicates. So, no duplicates in this dataset!