I.Introduction

Tucker (2016) found that target advertising is an emerging field in digital marketing where companies invest in understanding the user behavior on the internet. Through segmenting the users who visit the company website, actions by the visitors, engagement with the users, we can send specific marketing material as per their business requirements (Svetlik, 2017). In this exploratory research study, we analyzed the user pathway channels for AirBNB peer-to-peer sharing platform. Users visit the portal through different devices, various applications, perform numerous searches, send many messages, and other activities. So, in this study we analyze this data to identify key trends and user behaviour patterns accordingly.

Some of the questions to explore in this research are:

  1. What are the widely used devices to access the portal?
  2. What applications are used on these device to connect with the portal?
  3. What is the average duration spent by each visitor to the portal?
  4. What is the correlation between messages sent, booking made and searches conducted on the portal?

II.Data & Methods

For this research project, I have selected to do data visualization on the open source dataset available at databits.io. I have been a regular customer of AirBNB for last 3 years and wanted to explore this use case to better understand the user behavior on the portal. The dataset is in .txt format and available for download from the databits website.

Source: Databits (2018)

Data set details

The dataframe has user pathways data during the period 05/05/2014 to 04/23/2015.

A total of 7756 rows and 21 variables are collected for each user who access this portal. Some of the columns names and description is provided for reference.

id_visitor id of the visitor id_session id of the session dim_session_number the number of session on a given day for a visitor dim_user_agent user agent of the session dim_device_app_combo parsed out device/app combo from user agent ds date stamp of session ts_min time of session start ts_max time of session end did_search binary flag indicating if the visitor performed a search during the session sent_message binary flag indicating if the visitor sent a message during the session sent_booking_request binary flag indicating if the visitor sent a booking request during the session next_id_session Next Session ID next_dim_session_number Next number of sessions for visitor next_dim_device_app_combo Next parsed out device/application combination next_ds Next session date stamp next_ts_min Next session start time next_ts_max Next session end time next_did_search Next session - did search? (0 or 1) next_sent_message Next session - sent message? (0 or 1) next_sent_booking_request Next session - booking request? (0 or 1)

Data Analysis$ Methods

To solve the exploratory questions identified, we will use various graphical methods such as bar charts, mapping, scatter plots, line plots and other appropriate tools. There is also a need to ensure the variables classes are well defined and segregate data points. In the name of the areas, we need to seperate the key variable by creating a new column for further analysis.

III. Data Analysis and Visualization

Getting Started

library(tibble) # used to create tibbles
library(tidyr) # used to tidy up data
## Warning: package 'tidyr' was built under R version 3.4.4
library(lubridate) # used for date/time functions
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
library(magrittr) # used for piping
## Warning: package 'magrittr' was built under R version 3.4.4
## 
## Attaching package: 'magrittr'
## The following object is masked from 'package:tidyr':
## 
##     extract
library(ggplot2) # used for data visualization
## Warning: package 'ggplot2' was built under R version 3.4.4
library(dplyr) # used for data manipulation
## Warning: package 'dplyr' was built under R version 3.4.4
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:lubridate':
## 
##     intersect, setdiff, union
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
Internetdata <- read.delim(url ("http://databits.io/static_content/challenges/airbnb-user-pathways-challenge/airbnb_session_data.txt"),  sep = "|", na.strings = 'NULL')
Internetdata_tib<-as_tibble(Internetdata) 

#gives dimension of the dataframe
dim(Internetdata_tib)
## [1] 7756   21
#Clean the dataset and use seperate to find device and application details.

#extract application field from the unified column
Internet_Clean <- Internetdata_tib %>% 
  separate(dim_device_app_combo, into = c("Device", "Application"), sep = " - ") %>%
  separate(next_dim_device_app_combo,into = c("Next_Device", "Next_Application"), sep = " - ") 


# Convert the start and end times from string to date/time format

Internet_Clean$Start_Time <- ymd_hms(Internet_Clean$ts_min)
Internet_Clean$End_Time <- ymd_hms(Internet_Clean$ts_max)

Internet_Clean$Next_Start_Time <- ymd_hms(Internet_Clean$next_ts_min)
Internet_Clean$Next_End_Time <- ymd_hms(Internet_Clean$next_ts_max)

#create a new variable duration (in minutes) to measure the user activity

Internet_Clean$Duration <- 
  (Internet_Clean$End_Time - Internet_Clean$Start_Time) / 60

Internet_Clean$Next_Duration <- 
  (Internet_Clean$Next_End_Time - Internet_Clean$Next_Start_Time) / 60

#Summmary of the cleaned dataset

summary(Internet_Clean)
##                                 id_visitor  
##  f70f0c27-6af3-4fb5-8bd7-f73240465fd5: 702  
##  98c352ee-12e0-4d43-99a3-97edd8dd4bb1: 431  
##  46cf3a9c-43d5-471c-ada4-4227f5c27384: 412  
##  b03087b5-6b04-4e91-be49-5b254dd0e839: 340  
##  fd61c634-6876-4c76-81b3-6da2f86a92b8: 276  
##  39b7ba8d-8549-429d-b0cc-65e7f55b2142: 228  
##  (Other)                             :5367  
##                             id_session   dim_session_number
##  0001d1236f9ef2b7d05ab8a1d48b94cf:   1   Min.   :  1.00    
##  0004d0fa8cabd64ab5f09b0684f16f3c:   1   1st Qu.: 11.00    
##  000c762eb1b8a5a1b1d4860caaae0efe:   1   Median : 46.00    
##  001345e6c8a1363661079c01c8dcf6b8:   1   Mean   : 98.09    
##  00210d71bcbceeb0bf81a8a8d7fb8d82:   1   3rd Qu.:128.00    
##  0023c24ee6d106da74f4bf4131eadb24:   1   Max.   :702.00    
##  (Other)                         :7750                     
##                                                                                                         dim_user_agent
##  Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_1 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Mobile/11D201: 292  
##  Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; Touch; rv:11.0) like Gecko                                   : 241  
##  Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko                                          : 225  
##                                                                                                                : 219  
##  Airbnb/4.2.0 iPhone/7.1.1                                                                                     : 210  
##  Airbnb/4.7.0 iPhone/8.1.2                                                                                     : 197  
##  (Other)                                                                                                       :6372  
##     Device          Application                 ds      
##  Length:7756        Length:7756        2014-12-15:  61  
##  Class :character   Class :character   2014-09-02:  57  
##  Mode  :character   Mode  :character   2014-09-05:  56  
##                                        2014-09-06:  54  
##                                        2014-12-23:  54  
##                                        2014-12-01:  53  
##                                        (Other)   :7421  
##                  ts_min                     ts_max       did_search    
##  2014-08-14 11:31:12:   4   2014-09-26 21:59:56:   4   Min.   :0.0000  
##  2014-08-25 21:14:07:   4   2014-07-09 12:01:38:   3   1st Qu.:0.0000  
##  2014-08-31 02:22:33:   4   2014-08-01 08:14:23:   3   Median :0.0000  
##  2014-09-06 20:31:31:   4   2014-08-14 11:31:12:   3   Mean   :0.1594  
##  2014-06-03 09:00:00:   3   2014-08-18 21:32:29:   3   3rd Qu.:0.0000  
##  2014-06-17 23:58:52:   3   2014-08-25 21:14:07:   3   Max.   :1.0000  
##  (Other)            :7734   (Other)            :7737                   
##   sent_message    sent_booking_request
##  Min.   :0.0000   Min.   :0.0000      
##  1st Qu.:0.0000   1st Qu.:0.0000      
##  Median :0.0000   Median :0.0000      
##  Mean   :0.1649   Mean   :0.0187      
##  3rd Qu.:0.0000   3rd Qu.:0.0000      
##  Max.   :1.0000   Max.   :1.0000      
##                                       
##                          next_id_session next_dim_session_number
##  e42bbc9b3f21b8e4415205ce85c8d809:   6   Min.   :  2.0          
##  f0431d5843af09bb3587adc4da5563c8:   6   1st Qu.: 17.0          
##  ff9b22941e3e7f05a467454da5ebaca8:   6   Median : 56.0          
##  012843f8953e428b9236a4a9a3393b18:   5   Mean   :106.7          
##  01696a8017226b7603d063cc7e6a56b5:   5   3rd Qu.:140.8          
##  (Other)                         :7098   Max.   :702.0          
##  NA's                            : 630   NA's   :630            
##                                                                                                      next_dim_user_agent
##  Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_1 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Mobile/11D201: 292    
##  Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; Touch; rv:11.0) like Gecko                                   : 238    
##  Airbnb/4.2.0 iPhone/7.1.1                                                                                     : 210    
##  Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko                                          : 207    
##                                                                                                                : 199    
##  (Other)                                                                                                       :5980    
##  NA's                                                                                                          : 630    
##  Next_Device        Next_Application         next_ds    
##  Length:7756        Length:7756        2014-12-15:  59  
##  Class :character   Class :character   2014-09-02:  55  
##  Mode  :character   Mode  :character   2014-09-05:  54  
##                                        2014-12-23:  54  
##                                        2014-09-06:  52  
##                                        (Other)   :6852  
##                                        NA's      : 630  
##               next_ts_min                next_ts_max   next_did_search 
##  2014-08-14 11:31:12:   4   2014-09-26 21:59:56:   4   Min.   :0.0000  
##  2014-08-25 21:14:07:   4   2014-07-09 12:01:38:   3   1st Qu.:0.0000  
##  2014-08-31 02:22:33:   4   2014-08-01 08:14:23:   3   Median :0.0000  
##  2014-09-06 20:31:31:   4   2014-08-14 11:31:12:   3   Mean   :0.1458  
##  2014-06-03 09:00:00:   3   2014-08-18 21:32:29:   3   3rd Qu.:0.0000  
##  (Other)            :7107   (Other)            :7110   Max.   :1.0000  
##  NA's               : 630   NA's               : 630   NA's   :630     
##  next_sent_message next_sent_booking_request   Start_Time                 
##  Min.   :0.0000    Min.   :0.0000            Min.   :2014-05-05 08:57:33  
##  1st Qu.:0.0000    1st Qu.:0.0000            1st Qu.:2014-09-04 01:13:34  
##  Median :0.0000    Median :0.0000            Median :2014-11-06 00:12:10  
##  Mean   :0.1756    Mean   :0.0194            Mean   :2014-11-03 11:24:56  
##  3rd Qu.:0.0000    3rd Qu.:0.0000            3rd Qu.:2015-01-07 04:14:09  
##  Max.   :1.0000    Max.   :1.0000            Max.   :2015-04-23 06:02:24  
##  NA's   :630       NA's   :630                                            
##     End_Time                   Next_Start_Time              
##  Min.   :2014-05-05 08:58:06   Min.   :2014-05-06 10:58:16  
##  1st Qu.:2014-09-04 01:25:12   1st Qu.:2014-09-05 19:45:30  
##  Median :2014-11-06 00:19:57   Median :2014-11-07 22:49:08  
##  Mean   :2014-11-03 11:35:41   Mean   :2014-11-04 22:25:09  
##  3rd Qu.:2015-01-07 04:14:26   3rd Qu.:2015-01-07 18:13:59  
##  Max.   :2015-04-23 06:02:24   Max.   :2015-04-22 09:04:56  
##                                NA's   :630                  
##  Next_End_Time                   Duration        Next_Duration    
##  Min.   :2014-05-06 10:58:33   Length:7756       Length:7756      
##  1st Qu.:2014-09-05 19:50:34   Class :difftime   Class :difftime  
##  Median :2014-11-07 22:58:47   Mode  :numeric    Mode  :numeric   
##  Mean   :2014-11-04 22:36:08                                      
##  3rd Qu.:2015-01-07 18:33:26                                      
##  Max.   :2015-04-22 09:30:29                                      
##  NA's   :630
mean(Internet_Clean$Duration)
## Time difference of 10.74447 secs

Preliminary Analysis

From this, we can say that there are total of 5373 visits to the website and total of 7756 sessions were recorded with 630 unique users. Each visitor roughly spent an average 10.7 seconds on the portal. The maximum time spent on the portal is 641.2 seconds ~10.7 minutes.

Let us find out which devices are being used to access this AirBNB website.

#devices used to access the website

Webdevice <- Internet_Clean %>% group_by(Device) %>% count(Device, sort = TRUE)
## Warning: package 'bindrcpp' was built under R version 3.4.4
g1<- ggplot(Webdevice, aes(Webdevice$Device, fill= Webdevice$n))
g1+geom_bar(stat="count") + ggtitle("Stacked bar plot showing the devices")

#applicate used to access the website

Webapp <- Internet_Clean %>% group_by(Application) %>% count(Application, sort = TRUE)

Webapp
## # A tibble: 9 x 2
## # Groups:   Application [9]
##   Application     n
##   <chr>       <int>
## 1 iOS          2251
## 2 Web          1768
## 3 Chrome       1181
## 4 Moweb         625
## 5 Android       465
## 6 Safari        443
## 7 IE            429
## 8 Firefox       327
## 9 Other         267

From the above graph, we can conclude that Iphone, Desktop, Android Phone and Android Tablet were the top 4 widely used devices to access the portal. Similarly, we also have iOS, web, Chrome and Moweb are top 4 applications from which users logined into the portal for any activity. Firefox is the application used by the user population.

Now let us find correlation with these results to find out the number of sessions from each application on iPhone.

#for iPhone devices 
DiPhone <- Internet_Clean %>% filter(Device == "iPhone") %>% 
    group_by(Application) %>% count(Application, sort = TRUE)

ggplot(DiPhone, 
         aes(x = Application, y = n, fill = Application)) +
    geom_bar(stat = "identity") + 
    labs(title = "Total Sessions by Application", 
         x = "Application", y = "Number of Sessions") 

#for android devices
Dandroid <- Internet_Clean %>% filter(Device == "Android Phone") %>% 
    group_by(Application) %>% count(Application, sort = TRUE)

ggplot(Dandroid, 
         aes(x = Application, y = n, fill = Application)) +
    geom_bar(stat = "identity") + 
    labs(title = "Total Sessions by Application", 
         x = "Application", y = "Number of Sessions") 

ggplot(data = Internet_Clean, aes(Device))+geom_bar()+facet_grid(Application~.) + ggtitle("Facet grid plot showing application and device spread")

From the graphs, the results showed that there is strong correlation between device and application used. The iPhone users use iOS app to access the portal, while the android users use the andriod app and the desktop users prefer chrome to connect with the portal accordingly.

Next, let us know the average time spent in a visit by each user.

Visits <- Internet_Clean %>% group_by(id_visitor) %>% count(id_visitor, sort = TRUE)

ggplot(Visits, aes(n)) + geom_density(kernel = "gaussian", color = "blue") + labs(title = "Distribution of visits per user", x = "Number of Visits", y = "Frequency")

plot(Internet_Clean$dim_session_number, Internet_Clean$dim_user_agent)

We can interpret that majority of the visitors did visit only once or twice. The outlier is one visitor who visited 702 times to the portal with mean of 98 times. On the other hand, we can extract the data of the users are frequently visiting the portal and analyze their activity to see what are they looking for and booking conversion rate. This might be of great value to the digital marketing to send promotions relevant to these frequently visiting customers to cross sell opportunities.

Next, we look at the actions performed by these visitors on the portal to understand their expectations and build blocks accordingly.

Actbyvisitor <- Internet_Clean %>% group_by(id_visitor) %>% filter(n() >= 20) %>% summarize(Search = sum(did_search), Message = sum(sent_message), Booking = sum(sent_booking_request))

#check for any patterns of frequent visitors with searches and messages sent on the portal

ggplot(Actbyvisitor, aes(Search, Message)) + 
  geom_jitter(color = 'blue') +
  labs(title = "List of actions done by frequent visitors (Search & Messages)")

Only those users who visited the portal more than 20 are categorized as frequent vistors. There is weak correlation beween messages sent and searches on the platform.

ggplot(Actbyvisitor, aes(Search, Booking)) + 
  geom_jitter(color = 'green') +
  labs(title = "List of actions done by frequent visitors (Search & Booking)")

There is moderate correlation beween bookings made and searches on the platform.This insight can be passed on to the marketing team so they can know the key words or phrases being searched on the platfrom and improve the efforts in the platform for quick booking conversion accordingly.

ggplot(Actbyvisitor, aes(Booking, Message)) + 
  geom_jitter(color = 'brown') +
  labs(title = "List of actions done by frequent visitors (Booking & Messaging)")

There is weak correlation beween messages sent and bookings made on the platform.

IV. Discussion

To conclude, the research study will analyzed various user behaviours from the pathway channels used to access the AirBnB portal. There are opportunities for the digital managers to improve their markeitng through target advertising of users who are frequently visiting to search as there are high chances they will make a booking. The average duration of the each visit is 10.7 minutes and this can be improved with advertising strategy on the shortlisted applications for greater conversion. On the other hand, there is strong correlation between the device make and the application used to access the portal. There are many other variables that are considered in this project such as the location of the users as this can be useful to do geo-tagging advertising for frequent visitors to the portal.

References

Databits. (2018). Airbnb User Pathways Challenge. [Online]. Available at http://databits.io/challenges/airbnb-user-pathways-challenge

Svetlík, J. (2017). Integrating online advertising into integrated marketing communications. Marketing Identity, 5(1/1), 206-215.

Tucker, C. E. (2016). Social advertising: How advertising that explicitly promotes social influence can backfire. Available at SSRN 1975897.

Wilson, T., Yun, C. T., Chuan, S. B., Hong, T. T., & Bing, M. T. H. (2016). 19 A Critique of the Advertising Consumer as “Target”. Explorations in Critical Studies of Advertising, 261.