1 Replicating Airbnb’s Amenity Detector

This project is inspired by a medium article on how Airbnb is using machine learning to detect household objects, in other words, using computer vision to discover amenities in a picture of rooms.

1.1 What is an amenity ?

Think of an object that useful in a room, an oven in a kitchen, a bed in a bedroom an so on.

1.2 How is this useful and Who is this for ?

This project is useful for company whos host a property/rental business, if you run a business like Airbnb where you do not actually own the property instead act as the middle-man between the customer and the property owner, it is inevitable that every property hosted will be different.

Imagine having hundreds or thousand property owner asking to be hosted, keeping tracks such information on each properties will be hard to do manually, so its in your best interest to make sure those information is correct. Using computer vision to aid with amenity detection, could automatically add info to a property listing on what they have or do not have.

‘Could not the owner just do this on their own when they add the listing?’ They could, yes, but having the detector add a layer of security and redudancy, Think of it like this, from the host perspective (ex: Airbnb), you do not want owner adding objects they do not actually have and combing through every picture uploaded manually is taxing.

From the property owner perspective, this could speed up their listing process, rather than writing objects one-by-one they could just upload the picture and get the list of amenities automatically, also if they decide to list the amenities manually, it could detect on what they are missing.

And, user can search properties based on listed amenities.

2 What can we do about it?

We will be building a customized object detection model to solve our problem, Object Detection models are based on deep learning, which means they need large amount of training data to perform well.

I already subset 30 relevant classes out of 600 available from OpenImages including their boundingboxes, some are more than the others, such as Wine Rack, there are less than 200 Wine Rack images, while there are over 3500 Couch Images, so it is expected to have poor result on classes with low training data.

In total, I am able to gather 10GBs+ worth of amenities images, over 38k images, with varying class distribution.

3 Exploratory Data Analysis

library(tidyverse)
library(imager)
library(keras)
library(caret)
use_condaenv('r-tensorflow')
options(scipen=999)

Take our each of training folder as a list

folder_list <-  list.files('Dataset/train/')
folder_list
#>  [1] "Bathtub"                     "Bed"                        
#>  [3] "Billiard table"              "Ceiling fan"                
#>  [5] "Coffeemaker"                 "Couch"                      
#>  [7] "Countertop"                  "Dishwasher"                 
#>  [9] "Fireplace"                   "Fountain"                   
#> [11] "Gas stove"                   "Jacuzzi"                    
#> [13] "Kitchen & dining room table" "Microwave oven"             
#> [15] "Mirror"                      "Oven"                       
#> [17] "Pillow"                      "Porch"                      
#> [19] "Refrigerator"                "Shower"                     
#> [21] "Sink"                        "Sofa bed"                   
#> [23] "Stairs"                      "Swimming pool"              
#> [25] "Television"                  "Toilet"                     
#> [27] "Towel"                       "Tree house"                 
#> [29] "Washing machine"             "Wine rack"

Total Classes : 30

Pass previously created list and paste it into string of folder path

folder_path <-  paste0('Dataset/train/', folder_list, '/')
folder_path
#>  [1] "Dataset/train/Bathtub/"                    
#>  [2] "Dataset/train/Bed/"                        
#>  [3] "Dataset/train/Billiard table/"             
#>  [4] "Dataset/train/Ceiling fan/"                
#>  [5] "Dataset/train/Coffeemaker/"                
#>  [6] "Dataset/train/Couch/"                      
#>  [7] "Dataset/train/Countertop/"                 
#>  [8] "Dataset/train/Dishwasher/"                 
#>  [9] "Dataset/train/Fireplace/"                  
#> [10] "Dataset/train/Fountain/"                   
#> [11] "Dataset/train/Gas stove/"                  
#> [12] "Dataset/train/Jacuzzi/"                    
#> [13] "Dataset/train/Kitchen & dining room table/"
#> [14] "Dataset/train/Microwave oven/"             
#> [15] "Dataset/train/Mirror/"                     
#> [16] "Dataset/train/Oven/"                       
#> [17] "Dataset/train/Pillow/"                     
#> [18] "Dataset/train/Porch/"                      
#> [19] "Dataset/train/Refrigerator/"               
#> [20] "Dataset/train/Shower/"                     
#> [21] "Dataset/train/Sink/"                       
#> [22] "Dataset/train/Sofa bed/"                   
#> [23] "Dataset/train/Stairs/"                     
#> [24] "Dataset/train/Swimming pool/"              
#> [25] "Dataset/train/Television/"                 
#> [26] "Dataset/train/Toilet/"                     
#> [27] "Dataset/train/Towel/"                      
#> [28] "Dataset/train/Tree house/"                 
#> [29] "Dataset/train/Washing machine/"            
#> [30] "Dataset/train/Wine rack/"

Map the folder_path with our file name, assign it to file_name

#get file name
file_name <- map(folder_path, function(x) paste0(x, list.files(x))) %>% unlist()

#check
head(file_name)
#> [1] "Dataset/train/Bathtub/0005cf643849681f.jpg"
#> [2] "Dataset/train/Bathtub/000698b6a00772ac.jpg"
#> [3] "Dataset/train/Bathtub/00514c0fd0a7209a.jpg"
#> [4] "Dataset/train/Bathtub/006ed074e8b9f846.jpg"
#> [5] "Dataset/train/Bathtub/0079d59fe583d268.jpg"
#> [6] "Dataset/train/Bathtub/00a310da4fcf5c22.jpg"

How many is our image data

length(file_name)
#> [1] 38720

Inspect our image data, as you can see our image dataset sizes are varied.

#take random sample
sample_image <- sample(file_name, 6)

#load image to R
img <- map(sample_image, load.image)

par(mfrow = c(2, 3)) #2x3 grid 
map(img, plot)

#> [[1]]
#> Image. Width: 768 pix Height: 1024 pix Depth: 1 Colour channels: 3 
#> 
#> [[2]]
#> Image. Width: 768 pix Height: 1024 pix Depth: 1 Colour channels: 3 
#> 
#> [[3]]
#> Image. Width: 626 pix Height: 1024 pix Depth: 1 Colour channels: 3 
#> 
#> [[4]]
#> Image. Width: 1024 pix Height: 773 pix Depth: 1 Colour channels: 3 
#> 
#> [[5]]
#> Image. Width: 1024 pix Height: 768 pix Depth: 1 Colour channels: 3 
#> 
#> [[6]]
#> Image. Width: 1024 pix Height: 681 pix Depth: 1 Colour channels: 3

Explore the distribution of the image dimensions (height and width)

# Function for acquiring width and height of an image
get_dim <- function(x){
  img <- load.image(x) 
  
  df_img <- data.frame(height = height(img),
                       width = width(img),
                       filename = x
                       )
  
  return(df_img)
}

Run the function to all our images

# Randomly get n sample images
sample_file <- sample(file_name, 30)

# Run the get_dim() function for each image
file_dim <- map_df(sample_file, get_dim)

head(file_dim)
summary(file_dim)
#>      height           width          filename        
#>  Min.   : 566.0   Min.   : 681.0   Length:30         
#>  1st Qu.: 683.0   1st Qu.:1024.0   Class :character  
#>  Median : 768.0   Median :1024.0   Mode  :character  
#>  Mean   : 775.5   Mean   : 981.3                     
#>  3rd Qu.: 768.0   3rd Qu.:1024.0                     
#>  Max.   :1024.0   Max.   :1024.0

Our images size are varying, we have to resize them to make them uniform.

Creating target_size and batch size variable to resizing during augmentaion and creating training and testing set.

target_size = c(256, 256)

batch_size = 64

Image augmentation, explained in comment line inside chunk.

# Image Generator (check documentation for args not used here)
train_data_gen <- image_data_generator(rescale = 1/255, # Scaling pixel value
                                       horizontal_flip = T, # Flip image horizontally
                                       zoom_range = 0.2, #zoom image by 20%
                                       rotation_range = 20, # Rotate image from 0 to 20 degrees
                                       fill_mode = 'nearest'
                                       )
# Training Dataset
train_image_array_gen <- flow_images_from_directory(directory = "Dataset/train/", # Folder of the data
                                                    target_size = target_size, # target of the image dimension (128 x 128)  
                                                    color_mode = "rgb", # use RGB color
                                                    batch_size = batch_size , 
                                                    seed = 69420,  # set random seed
                                                    subset = "training", # declare that this is for training data
                                                    generator = train_data_gen
                                                    )

Total images 38685 images in our training set, lets check the classes distribution

# Number of training samples
train_samples <- train_image_array_gen$n

# Number of target classes/categories
output_n <- n_distinct(train_image_array_gen$classes)

# Get the class proportion
freqdf <- table("\nFrequency" = factor(train_image_array_gen$classes)) %>% prop.table() %>% as.data.frame()
freqdf <- freqdf %>% arrange(desc(freqdf$Freq))
freqdf
library(ggplot2)

plot <- freqdf %>% ggplot(aes(x = reorder(X.Frequency, -Freq), y = Freq)) +
  geom_col()
library(plotly)
ggplotly(plot)

Our images classes distribution is quite imbalanced, ranging from 11.7% to 0.02% out of 38685 images

4 Input and Expected Output

User will upload an image, and the model will produce an images with bounding boxes around detected object and a list of detected objects.

Workflow Ilustration

5 Machine Learning Model

I want to try two paths to build the model, first i want to use existing / pre-trained model and then build a custom layer on top of it. this should ease and shortened the training process, another path is to build our custom model from scratch, and start training the model with small subset of our complete data to make sure it works before training it the full data.