Bibliothèques et configuration / Libraries and Setup

Chargement des bibliothèques : Loading in libraries:

library(data.table)
library(htmlwidgets)
library(leaflet)
library(lubridate)
library(plotly)
library(sf)
library(tidyverse)
library(tigris)
library(vroom)
library(zoo)

Data et Doublons / Data and Duplicates

Chargement des données Inside Air BnB : /Loading in Inside Air BnB data:

# Inside AirBnB URL: https://data.insideairbnb.com/canada/qc/montreal/2023-12-13/visualisations/listings.csv

dec_2023_mtl_airbnb <- read_csv("https://data.insideairbnb.com/canada/qc/montreal/2023-12-13/visualisations/listings.csv")

## Rows: 8807 Columns: 18
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (5): name, host_name, neighbourhood, room_type, license
## dbl  (11): id, host_id, latitude, longitude, price, minimum_nights, number_o...
## lgl   (1): neighbourhood_group
## date  (1): last_review
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Duplicates / Doublons

Le code ci-dessous devrait afficher les données dupliquées de décembre :

The below code should show duplicated data from December:

# Extraction des doublons dans un tableau de données :
# Pulling out duplicates into a data frame:
duplicates_dec_2023 <- dec_2023_mtl_airbnb %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

# Affichage des doublons... il n'y en a pas!
# Displaying duplicates...there are none!
duplicates_dec_2023

## # A tibble: 0 × 18
## # ℹ 18 variables: id <dbl>, name <chr>, host_id <dbl>, host_name <chr>,
## #   neighbourhood_group <lgl>, neighbourhood <chr>, latitude <dbl>,
## #   longitude <dbl>, room_type <chr>, price <dbl>, minimum_nights <dbl>,
## #   number_of_reviews <dbl>, last_review <date>, reviews_per_month <dbl>,
## #   calculated_host_listings_count <dbl>, availability_365 <dbl>,
## #   number_of_reviews_ltm <dbl>, license <chr>

Un tableau de données vide de doublons signifie qu’il n’y a pas de doublons en décembre. Y a-t-il eu des doublons dans d’autres mois de l’année dernière ?

An empty data table of duplicates means no duplicates in December. Any duplicates in other months in the last year?

sep_2023_mtl_airbnb <- read_csv("https://data.insideairbnb.com/canada/qc/montreal/2023-09-02/visualisations/listings.csv")

## Rows: 7933 Columns: 18
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (5): name, host_name, neighbourhood, room_type, license
## dbl  (11): id, host_id, latitude, longitude, price, minimum_nights, number_o...
## lgl   (1): neighbourhood_group
## date  (1): last_review
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

duplicates_sep_2023 <- sep_2023_mtl_airbnb %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

duplicates_sep_2023

## # A tibble: 0 × 18
## # ℹ 18 variables: id <dbl>, name <chr>, host_id <dbl>, host_name <chr>,
## #   neighbourhood_group <lgl>, neighbourhood <chr>, latitude <dbl>,
## #   longitude <dbl>, room_type <chr>, price <dbl>, minimum_nights <dbl>,
## #   number_of_reviews <dbl>, last_review <date>, reviews_per_month <dbl>,
## #   calculated_host_listings_count <dbl>, availability_365 <dbl>,
## #   number_of_reviews_ltm <dbl>, license <chr>

Un tableau de données vide de doublons signifie qu’il n’y avait pas de doublons dans les données de septembre !

An empty data frame of duplicates means that there were no duplicates in September’s data!

Vérification d’une autre plate-forme / Checking another platform

Vérification À Bas AirBnB / Checking À Bas AirBnB

Chargement des données :

Loading in data:

jan_2024_abasairbnb <- read_csv("https://abasairbnb.io/data/airbnb_qc_ads_2024-01-15.csv")

## Rows: 29545 Columns: 23
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (13): title, type, type_scraped, url, host_id, host_since, description,...
## dbl   (7): id, note, nb_comments, lat, lng, min_nights, licence
## lgl   (2): nouveauté, min_days_calendar
## date  (1): date_scraped
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

duplicates_jan_2024_abasairbnb <- jan_2024_abasairbnb %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

duplicates_jan_2024_abasairbnb

## # A tibble: 0 × 23
## # ℹ 23 variables: id <dbl>, title <chr>, nouveauté <lgl>, date_scraped <date>,
## #   type <chr>, type_scraped <chr>, url <chr>, host_id <chr>, host_since <chr>,
## #   description <chr>, price <chr>, note <dbl>, nb_comments <dbl>, lat <dbl>,
## #   lng <dbl>, location <chr>, availabilities <chr>, unvailabilities <chr>,
## #   min_days_calendar <lgl>, min_nights <dbl>, commentaires <chr>,
## #   licence <dbl>, rental_period <chr>

Encore une fois, une table de données vide de doublons signifie qu’il n’y a pas de doublons. Donc, pas de doublons dans ces données !

Again–an empty data table of duplicates means no duplicates. So, no duplicates in this dataset!

Vérification du travail de Proulx - Les doublons d’Air BnB / Checking Proulx’s Work - Air BnB Duplicates

2024-04-26