Loading the two sheets of Data for analysis

fbdata <- read_excel("~/Dropbox/Twigeo/Case-Analyst.xlsx", sheet = 
                       "FB Data")
head(fbdata)
## # A tibble: 6 x 7
##   `Reporting Starts` `Reporting Ends`
##               <dttm>           <dttm>
## 1         2015-10-01       2015-10-31
## 2         2015-10-01       2015-10-31
## 3         2015-10-01       2015-10-31
## 4         2015-10-01       2015-10-31
## 5         2015-07-01       2015-07-31
## 6         2015-08-01       2015-08-31
## # ... with 5 more variables: `Campaign Name` <chr>, Reach <dbl>,
## #   Impressions <dbl>, `Clicks (All)` <dbl>, `Amount Spent (USD)` <dbl>
adjusted_IOS <- read_excel("~/Dropbox/Twigeo/Case-Analyst.xlsx", 
    sheet = "adjust iOS data")
head(adjusted_IOS)
## # A tibble: 6 x 33
##         Date           Network             Campaign `OS Name`
##       <dttm>             <chr>                <chr>     <chr>
## 1 2015-04-01 Facebook Installs BE|iOS|F|Phone|20-29       ios
## 2 2015-04-01 Facebook Installs BE|iOS|F|Phone|20-29       ios
## 3 2015-04-01 Facebook Installs BE|iOS|F|Phone|20-29       ios
## 4 2015-04-01 Facebook Installs BE|iOS|F|Phone|20-29       ios
## 5 2015-04-01 Facebook Installs BE|iOS|F|Phone|20-29       ios
## 6 2015-04-01 Facebook Installs BE|iOS|F|Phone|20-29       ios
## # ... with 29 more variables: `Months after Install` <dbl>, `Cohort
## #   Size` <dbl>, `Retained Users` <dbl>, Sessions <dbl>, `Paying
## #   Users` <lgl>, `General Revenue Event (0) (Events)` <lgl>, `General
## #   Revenue Event (0) (Revenue)` <lgl>, `General Revenue Event (0)
## #   (Converted Users)` <lgl>, `Presented Booking View Error (665jzy)
## #   (Events)` <lgl>, `Presented Booking View Error (665jzy)
## #   (Revenue)` <lgl>, `Presented Booking View Error (665jzy) (Converted
## #   Users)` <lgl>, `Click Close Booking View Error (9hkkvc)
## #   (Events)` <lgl>, `Click Close Booking View Error (9hkkvc)
## #   (Revenue)` <lgl>, `Click Close Booking View Error (9hkkvc) (Converted
## #   Users)` <lgl>, `Click Try Again Booking View Error (b6u65l)
## #   (Events)` <lgl>, `Click Try Again Booking View Error (b6u65l)
## #   (Revenue)` <lgl>, `Click Try Again Booking View Error (b6u65l)
## #   (Converted Users)` <lgl>, `Edit search form (le5kp0) (Events)` <dbl>,
## #   `Edit search form (le5kp0) (Revenue)` <lgl>, `Edit search form
## #   (le5kp0) (Converted Users)` <dbl>, `Start booking process (lziar6)
## #   (Events)` <dbl>, `Start booking process (lziar6) (Revenue)` <lgl>,
## #   `Start booking process (lziar6) (Converted Users)` <dbl>, `Perform a
## #   search (m6lxsy) (Events)` <dbl>, `Perform a search (m6lxsy)
## #   (Revenue)` <lgl>, `Perform a search (m6lxsy) (Converted Users)` <dbl>,
## #   `Presented Booking View Success (vl9o24) (Events)` <lgl>, `Presented
## #   Booking View Success (vl9o24) (Revenue)` <lgl>, `Presented Booking
## #   View Success (vl9o24) (Converted Users)` <lgl>

Divide the Campaign Column into 5 additional columns for analysis.

fbdata <- separate(fbdata, `Campaign Name`, into= c('Country', 'OS', 'Sex', 'Device', 'Age'), sep = "\\|")
head(fbdata)
## # A tibble: 6 x 11
##   `Reporting Starts` `Reporting Ends` Country      OS   Sex
##               <dttm>           <dttm>   <chr>   <chr> <chr>
## 1         2015-10-01       2015-10-31      IT     iOS     X
## 2         2015-10-01       2015-10-31      IT Android     X
## 3         2015-10-01       2015-10-31      DE Android     X
## 4         2015-10-01       2015-10-31      DE     iOS     X
## 5         2015-07-01       2015-07-31      UK     iOS     X
## 6         2015-08-01       2015-08-31      UK     iOS     X
## # ... with 6 more variables: Device <chr>, Age <chr>, Reach <dbl>,
## #   Impressions <dbl>, `Clicks (All)` <dbl>, `Amount Spent (USD)` <dbl>

Summary of the Facebook Data. Checking for any outliers or anything that seems weird with the data

summary(fbdata)
##  Reporting Starts              Reporting Ends               
##  Min.   :2015-01-01 00:00:00   Min.   :2015-01-31 00:00:00  
##  1st Qu.:2015-03-01 00:00:00   1st Qu.:2015-03-31 00:00:00  
##  Median :2015-05-01 00:00:00   Median :2015-05-31 00:00:00  
##  Mean   :2015-04-25 04:18:10   Mean   :2015-05-24 14:01:26  
##  3rd Qu.:2015-06-01 00:00:00   3rd Qu.:2015-06-30 00:00:00  
##  Max.   :2015-10-01 00:00:00   Max.   :2015-10-31 00:00:00  
##    Country               OS                Sex           
##  Length:753         Length:753         Length:753        
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##     Device              Age                Reach         Impressions     
##  Length:753         Length:753         Min.   :   192   Min.   :    290  
##  Class :character   Class :character   1st Qu.:  7895   1st Qu.:  18897  
##  Mode  :character   Mode  :character   Median : 27223   Median :  50006  
##                                        Mean   : 50802   Mean   : 108090  
##                                        3rd Qu.: 70344   3rd Qu.: 136008  
##                                        Max.   :420279   Max.   :1202725  
##   Clicks (All)   Amount Spent (USD)
##  Min.   :    2   Min.   :   0.56   
##  1st Qu.:  208   1st Qu.: 105.05   
##  Median :  619   Median : 256.92   
##  Mean   : 1186   Mean   : 464.92   
##  3rd Qu.: 1719   3rd Qu.: 656.44   
##  Max.   :12021   Max.   :4029.33

Calculate the total amount spend on Facebook Advertising

sum(fbdata$`Amount Spent (USD)`)
## [1] 350082.3

Divide the Campaign Column into 5 additional columns for further analysis. Filter Data to only include ‘Facebook Installs’ as the ‘Network’ since we are looking at for the cost for installs as it relates to Facebook.

iosdata <- separate(adjusted_IOS, `Campaign`, into= c('Country', 'OS', 'Sex', 'Device', 'Age'), sep = "\\|" )
iosdata <- filter(iosdata, Network == 'Facebook Installs')
head(iosdata)
## # A tibble: 6 x 37
##         Date           Network Country    OS   Sex Device   Age `OS Name`
##       <dttm>             <chr>   <chr> <chr> <chr>  <chr> <chr>     <chr>
## 1 2015-04-01 Facebook Installs      BE   iOS     F  Phone 20-29       ios
## 2 2015-04-01 Facebook Installs      BE   iOS     F  Phone 20-29       ios
## 3 2015-04-01 Facebook Installs      BE   iOS     F  Phone 20-29       ios
## 4 2015-04-01 Facebook Installs      BE   iOS     F  Phone 20-29       ios
## 5 2015-04-01 Facebook Installs      BE   iOS     F  Phone 20-29       ios
## 6 2015-04-01 Facebook Installs      BE   iOS     F  Phone 20-29       ios
## # ... with 29 more variables: `Months after Install` <dbl>, `Cohort
## #   Size` <dbl>, `Retained Users` <dbl>, Sessions <dbl>, `Paying
## #   Users` <lgl>, `General Revenue Event (0) (Events)` <lgl>, `General
## #   Revenue Event (0) (Revenue)` <lgl>, `General Revenue Event (0)
## #   (Converted Users)` <lgl>, `Presented Booking View Error (665jzy)
## #   (Events)` <lgl>, `Presented Booking View Error (665jzy)
## #   (Revenue)` <lgl>, `Presented Booking View Error (665jzy) (Converted
## #   Users)` <lgl>, `Click Close Booking View Error (9hkkvc)
## #   (Events)` <lgl>, `Click Close Booking View Error (9hkkvc)
## #   (Revenue)` <lgl>, `Click Close Booking View Error (9hkkvc) (Converted
## #   Users)` <lgl>, `Click Try Again Booking View Error (b6u65l)
## #   (Events)` <lgl>, `Click Try Again Booking View Error (b6u65l)
## #   (Revenue)` <lgl>, `Click Try Again Booking View Error (b6u65l)
## #   (Converted Users)` <lgl>, `Edit search form (le5kp0) (Events)` <dbl>,
## #   `Edit search form (le5kp0) (Revenue)` <lgl>, `Edit search form
## #   (le5kp0) (Converted Users)` <dbl>, `Start booking process (lziar6)
## #   (Events)` <dbl>, `Start booking process (lziar6) (Revenue)` <lgl>,
## #   `Start booking process (lziar6) (Converted Users)` <dbl>, `Perform a
## #   search (m6lxsy) (Events)` <dbl>, `Perform a search (m6lxsy)
## #   (Revenue)` <lgl>, `Perform a search (m6lxsy) (Converted Users)` <dbl>,
## #   `Presented Booking View Success (vl9o24) (Events)` <lgl>, `Presented
## #   Booking View Success (vl9o24) (Revenue)` <lgl>, `Presented Booking
## #   View Success (vl9o24) (Converted Users)` <lgl>

Subset the Facebook data to only include iOS as the operating system. I have done this so that I can merge with data with the ‘Adjusted iOS data’

fbdata <- subset(fbdata, OS == 'iOS')
head(fbdata)
## # A tibble: 6 x 11
##   `Reporting Starts` `Reporting Ends` Country    OS   Sex Device   Age
##               <dttm>           <dttm>   <chr> <chr> <chr>  <chr> <chr>
## 1         2015-10-01       2015-10-31      IT   iOS     X iPhone 18-65
## 2         2015-10-01       2015-10-31      DE   iOS     X iPhone XX-XX
## 3         2015-07-01       2015-07-31      UK   iOS     X  Phone 35-65
## 4         2015-08-01       2015-08-31      UK   iOS     X  Phone 35-65
## 5         2015-09-01       2015-09-30      UK   iOS     X  Phone 35-65
## 6         2015-10-01       2015-10-31      UK   iOS     X  Phone 35-65
## # ... with 4 more variables: Reach <dbl>, Impressions <dbl>, `Clicks
## #   (All)` <dbl>, `Amount Spent (USD)` <dbl>

Subset the ‘Adjusted iOS data’ to only the ‘Months after Install’ data to equal 0. This will count monthly cohorts.

ioscohort <- subset(iosdata, Network == 'Facebook Installs')
ioscohort <- subset(ioscohort, `Months after Install` == 0)

Group the data by Country and Date to figure out the number of installs Per Month/Per Country.

fbinstalls <- ioscohort %>%
  group_by(Country, Date) %>%
  summarise(`Facebook Installs` = sum(`Cohort Size`, na.rm = TRUE))
head(fbinstalls)
## # A tibble: 6 x 3
## # Groups:   Country [1]
##   Country       Date `Facebook Installs`
##     <chr>     <dttm>               <dbl>
## 1      BE 2015-04-01                  89
## 2      BE 2015-05-01                 131
## 3      BE 2015-06-01                   7
## 4      BE 2015-07-01                2087
## 5      BE 2015-08-01                2522
## 6      BE 2015-09-01                1237

Summarizing the information on Facebook Installs to see what the range is.

summary(fbinstalls$`Facebook Installs`)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     7.0   785.5  1503.5  1430.3  2048.0  3245.0

Finding out what the minimum and maximum installs per country were.

summarise(group_by(fbinstalls, Country),
          `Minimum Installs` = min(`Facebook Installs`), `Maximum Installs` = max(`Facebook Installs`))
## # A tibble: 7 x 3
##   Country `Minimum Installs` `Maximum Installs`
##     <chr>              <dbl>              <dbl>
## 1      BE                  7               2522
## 2      CH                 20               1259
## 3      DE                594               2081
## 4      ES                771               2490
## 5      IT                 25               2883
## 6      NL                505               1781
## 7      UK                983               3245

Graphing that information for a visualization of the that information

ggplot(fbinstalls, aes(x = Country,
                             y = `Facebook Installs`, fill = Country)) +
  geom_boxplot()

Figuring out the total amount spent on Facebook per country, per month

fbcosts <- fbdata %>%
  group_by(Country, `Reporting Starts`) %>%
  summarise(totalspent = sum(`Amount Spent (USD)`))
names(fbcosts) <- c("Country", "Date", "Total_Spent")
head(fbcosts)
## # A tibble: 6 x 3
## # Groups:   Country [1]
##   Country       Date Total_Spent
##     <chr>     <dttm>       <dbl>
## 1      BE 2015-04-01      709.78
## 2      BE 2015-05-01     1310.84
## 3      BE 2015-06-01      551.34
## 4      BE 2015-07-01     3048.41
## 5      BE 2015-08-01     4021.67
## 6      BE 2015-09-01     2943.82

Merging the Facebook Spend Data and the Facebook Install data by Country and Date to determine the Cost per install

cost_install <- merge(fbcosts, fbinstalls, by=c("Country", "Date"), all = TRUE)
cost_install
##    Country       Date Total_Spent Facebook Installs
## 1       BE 2015-04-01      709.78                89
## 2       BE 2015-05-01     1310.84               131
## 3       BE 2015-06-01      551.34                 7
## 4       BE 2015-07-01     3048.41              2087
## 5       BE 2015-08-01     4021.67              2522
## 6       BE 2015-09-01     2943.82              1237
## 7       BE 2015-10-01     1092.47               451
## 8       CH 2015-04-01     1032.36               119
## 9       CH 2015-05-01     1314.21               137
## 10      CH 2015-06-01      314.23                20
## 11      CH 2015-07-01     1037.50               688
## 12      CH 2015-08-01     1852.63              1259
## 13      CH 2015-09-01     1357.99               781
## 14      CH 2015-10-01      593.48               292
## 15      DE 2015-01-01      517.50               594
## 16      DE 2015-02-01      836.97               898
## 17      DE 2015-03-01     1153.00              1357
## 18      DE 2015-04-01      915.34              1193
## 19      DE 2015-05-01     2769.20              1787
## 20      DE 2015-06-01     4424.34              2081
## 21      DE 2015-07-01     2993.65              1660
## 22      DE 2015-08-01     2163.56              1739
## 23      DE 2015-09-01     1686.25              1109
## 24      DE 2015-10-01     1898.48              1313
## 25      ES 2015-01-01      801.38              1107
## 26      ES 2015-02-01      794.36              1070
## 27      ES 2015-03-01     1591.56              2481
## 28      ES 2015-04-01      972.17              1387
## 29      ES 2015-05-01     2491.88              1684
## 30      ES 2015-06-01     3729.18              2490
## 31      ES 2015-07-01     3197.36              2166
## 32      ES 2015-08-01     2950.49              2330
## 33      ES 2015-09-01     2495.87              1707
## 34      ES 2015-10-01     1755.50               771
## 35      IT 2015-01-01      115.87                33
## 36      IT 2015-02-01       61.69                25
## 37      IT 2015-03-01      498.29               787
## 38      IT 2015-04-01      976.04              1639
## 39      IT 2015-05-01     1938.45              1593
## 40      IT 2015-06-01     2781.09              1833
## 41      IT 2015-07-01     2607.83              2037
## 42      IT 2015-08-01     2758.86              2634
## 43      IT 2015-09-01     2571.86              2883
## 44      IT 2015-10-01     2647.34              1964
## 45      NL 2015-01-01      657.79               727
## 46      NL 2015-02-01     1269.71              1364
## 47      NL 2015-03-01     1488.55              1514
## 48      NL 2015-04-01     1081.08              1070
## 49      NL 2015-05-01     2942.57              1772
## 50      NL 2015-06-01     3735.89              1781
## 51      NL 2015-07-01     2915.66              1613
## 52      NL 2015-08-01     2344.59              1535
## 53      NL 2015-09-01     1749.39              1044
## 54      NL 2015-10-01      902.97               505
## 55      UK 2015-01-01     1377.68              1493
## 56      UK 2015-02-01     2433.13              2498
## 57      UK 2015-03-01     2787.77              3245
## 58      UK 2015-04-01     2457.11              2972
## 59      UK 2015-05-01     3257.85              2334
## 60      UK 2015-06-01     3927.71              2507
## 61      UK 2015-07-01     3435.68              2349
## 62      UK 2015-08-01     2448.70              2298
## 63      UK 2015-09-01     1902.73              1763
## 64      UK 2015-10-01     1283.24               983

Adding the total number of installs by country and the total amount spent by country, dividing the total spend by the total number of installs to get the Cost Per Install Per Country.

cost_per_install <- cost_install %>%
  group_by(Country) %>%
  summarise(total_installs = sum(`Facebook Installs`), total_spent = sum(Total_Spent), CPI = (total_spent / total_installs))
cost_per_install
## # A tibble: 7 x 4
##   Country total_installs total_spent      CPI
##     <chr>          <dbl>       <dbl>    <dbl>
## 1      BE           6524    13678.33 2.096617
## 2      CH           3296     7502.40 2.276214
## 3      DE          13731    19358.29 1.409824
## 4      ES          17193    20779.75 1.208617
## 5      IT          15428    16957.32 1.099126
## 6      NL          12925    19088.20 1.476843
## 7      UK          22442    25311.60 1.127867

Creating a bar graph to show the cost per install per country.

ggplot(cost_per_install, aes(x =cost_per_install$Country,
                             y = cost_per_install$CPI)) +
  geom_bar(stat = 'identity', aes(fill = CPI)) +
  ggtitle("Cost Per Install Per Country") +
  xlab("County") +
  ylab("Spend Per Intall in USD") +
  scale_y_continuous(breaks = seq(min(0), max(2.50), by = .25))

Figuring out the total amount spent on installs per month for Facebook.

cpi_time <- cost_install %>%
  group_by(Date) %>%
  summarise(`Total Spend` = sum(Total_Spent))
cpi_time
## # A tibble: 10 x 2
##          Date `Total Spend`
##        <dttm>         <dbl>
##  1 2015-01-01       3470.22
##  2 2015-02-01       5395.86
##  3 2015-03-01       7519.17
##  4 2015-04-01       8143.88
##  5 2015-05-01      16025.00
##  6 2015-06-01      19463.78
##  7 2015-07-01      19236.09
##  8 2015-08-01      18540.50
##  9 2015-09-01      14707.91
## 10 2015-10-01      10173.48
sum(cpi_time$`Total Spend`)
## [1] 122675.9

Graphing the Total Spent on installs over time.

ggplot(cpi_time, aes(Date, `Total Spend`))+
  geom_line(color = 'Red', size = 2)+
  ggtitle("Cost Per Install Over Time") +
  xlab("Month 2015") +
  ylab("Total Spend Per Intall in USD")

Calculating the Cost Per Install Per Country By Month

cpi_time_country <- cost_install %>%
  group_by(Date, Country) %>%
  summarise(`Total Spend` = sum(Total_Spent))
cpi_time_country
## # A tibble: 64 x 3
## # Groups:   Date [?]
##          Date Country `Total Spend`
##        <dttm>   <chr>         <dbl>
##  1 2015-01-01      DE        517.50
##  2 2015-01-01      ES        801.38
##  3 2015-01-01      IT        115.87
##  4 2015-01-01      NL        657.79
##  5 2015-01-01      UK       1377.68
##  6 2015-02-01      DE        836.97
##  7 2015-02-01      ES        794.36
##  8 2015-02-01      IT         61.69
##  9 2015-02-01      NL       1269.71
## 10 2015-02-01      UK       2433.13
## # ... with 54 more rows

Graphing the Cost Per Install Per Country By Month

ggplot()+
  geom_line(data = cpi_time_country, aes(x = Date, y = `Total Spend`, color = Country)) +
  ggtitle("Spend by Country Over Time")

Since that Graph is a little hard to follow, I broke out each Country into an individual Graph.

ggplot()+
  geom_line(data = cpi_time_country, aes(x = Date, y = `Total Spend`, color = Country)) +
  ggtitle("Spend by Country Over Time") +
  facet_grid(. ~ Country)

Figuring out the cost per individual install over time.

cost_per_install_ot <- cost_install %>%
  group_by(Country, Date) %>%
  summarise(total_installs = sum(`Facebook Installs`), total_spent = sum(Total_Spent), CPI = (total_spent / total_installs))
cost_per_install_ot
## # A tibble: 64 x 5
## # Groups:   Country [?]
##    Country       Date total_installs total_spent       CPI
##      <chr>     <dttm>          <dbl>       <dbl>     <dbl>
##  1      BE 2015-04-01             89      709.78  7.975056
##  2      BE 2015-05-01            131     1310.84 10.006412
##  3      BE 2015-06-01              7      551.34 78.762857
##  4      BE 2015-07-01           2087     3048.41  1.460666
##  5      BE 2015-08-01           2522     4021.67  1.594635
##  6      BE 2015-09-01           1237     2943.82  2.379806
##  7      BE 2015-10-01            451     1092.47  2.422328
##  8      CH 2015-04-01            119     1032.36  8.675294
##  9      CH 2015-05-01            137     1314.21  9.592774
## 10      CH 2015-06-01             20      314.23 15.711500
## # ... with 54 more rows

Taking a look at the Minimum and Maximum CPI per Country.

summarise(group_by(cost_per_install_ot, Country),
          `Min Cost Per Install` = min(CPI), `Max Cost Per Install` = max(CPI))
## # A tibble: 7 x 3
##   Country `Min Cost Per Install` `Max Cost Per Install`
##     <chr>                  <dbl>                  <dbl>
## 1      BE              1.4606660              78.762857
## 2      CH              1.4715091              15.711500
## 3      DE              0.7672590               2.126064
## 4      ES              0.6414994               2.276913
## 5      IT              0.5955095               3.511212
## 6      NL              0.9048006               2.097636
## 7      UK              0.8267530               1.566697

Graphing the information to get an idea of the spread of the spending.

ggplot()+
  geom_line(data = cost_per_install_ot, aes(x = Date, y = CPI, color = Country)) +
  ggtitle("Cost Per Install Over Time By Country") +
  facet_grid(. ~ Country)

Determining what is the Cost per Start a booking process per Country and Month? Subsetting the data to include the Date (cohort), Country, Cohort Size, and Start Booking Process Event.

iosbooking <- select(iosdata, c(1, 3, 10, 29))
head(iosbooking)
## # A tibble: 6 x 4
##         Date Country `Cohort Size`
##       <dttm>   <chr>         <dbl>
## 1 2015-04-01      BE             1
## 2 2015-04-01      BE             1
## 3 2015-04-01      BE             1
## 4 2015-04-01      BE             1
## 5 2015-04-01      BE             1
## 6 2015-04-01      BE             1
## # ... with 1 more variables: `Start booking process (lziar6)
## #   (Events)` <dbl>

Finding the number of ‘Start Booking Process’ events have occured in the data. This is a QA.

sum(iosbooking$`Start booking process (lziar6) (Events)`, na.rm = TRUE)
## [1] 62131

Multiplied the number of ‘Start a Booking Event’ by $3 (What GvR estimates the event is worth).

colnames(iosbooking) <- c("Cohort", "Country", "Size", "Booking Event")
booking_cost <- iosbooking %>%
  mutate(`Cost Per Booking` = `Booking Event` * 3) %>%
  filter(!is.na(`Booking Event`)) %>%
  group_by(Country, Cohort) %>%
  summarise(`Cost in USD` = sum(`Cost Per Booking`))
booking_cost
## # A tibble: 64 x 3
## # Groups:   Country [?]
##    Country     Cohort `Cost in USD`
##      <chr>     <dttm>         <dbl>
##  1      BE 2015-04-01           129
##  2      BE 2015-05-01           237
##  3      BE 2015-06-01            30
##  4      BE 2015-07-01          3063
##  5      BE 2015-08-01          2370
##  6      BE 2015-09-01           975
##  7      BE 2015-10-01           702
##  8      CH 2015-04-01           216
##  9      CH 2015-05-01           189
## 10      CH 2015-06-01            27
## # ... with 54 more rows

This is the sum of the Cost in USD for a ‘Start a Booking Process’ event. This is QA.

sum(booking_cost$`Cost in USD`)
## [1] 186393

Created bar plot for a cost per booking event in USD for Each Month

ggplot(booking_cost, aes(x =Cohort,
                             y = `Cost in USD`)) +
  geom_bar(stat = 'identity', aes(fill = Country)) +
  ggtitle("Cost Per Booking Event in USD") +
  xlab("Month") +
  ylab("Spend in USD")

The last graph didn’t provide a clear insight into each country. I broke out each country individually to look at the Cost Per Start a booking event.

ggplot(booking_cost, aes(x =Cohort,
                             y = `Cost in USD`)) +
  geom_bar(stat = 'identity', aes(fill = Country)) +
  ggtitle("Cost Per Booking Event in USD") +
  xlab("Month") +
  ylab("Spend in USD") +
  facet_grid(. ~ Country)

A summary of the minimum and maximum Cost Per Start A Booking Event per Country.

summarise(group_by(booking_cost, Country),
          `Minimum Cost Start A Booking` = min(`Cost in USD`), `Maximum Cost Start A Booking` = max(`Cost in USD`))
## # A tibble: 7 x 3
##   Country `Minimum Cost Start A Booking` `Maximum Cost Start A Booking`
##     <chr>                          <dbl>                          <dbl>
## 1      BE                             30                           3063
## 2      CH                             27                           1599
## 3      DE                           2163                           6891
## 4      ES                           1137                           8559
## 5      IT                            198                           4656
## 6      NL                            492                           3795
## 7      UK                           1575                           7773

Finding what the largest market is for booking events

markets <- booking_cost %>%
  group_by(Country) %>%
  summarise(`Total Cost Per Start A Booking Event` = sum(`Cost in USD`))
markets[order(markets$`Total Cost Per Start A Booking Event`),]
## # A tibble: 7 x 2
##   Country `Total Cost Per Start A Booking Event`
##     <chr>                                  <dbl>
## 1      CH                                   4587
## 2      BE                                   7506
## 3      NL                                  18609
## 4      IT                                  28317
## 5      DE                                  41640
## 6      UK                                  41769
## 7      ES                                  43965

Figure out ROI for each country.

revenue_spend <- merge(markets, cost_per_install, by=c("Country"))
revenue_spend <- select(revenue_spend, c(1, 2,4))
summarise(group_by(revenue_spend, Country, `Total Cost Per Start A Booking Event`, total_spent),
          `ROI` = (`Total Cost Per Start A Booking Event`)/(total_spent))
## # A tibble: 7 x 4
## # Groups:   Country, Total Cost Per Start A Booking Event [?]
##   Country `Total Cost Per Start A Booking Event` total_spent       ROI
##     <chr>                                  <dbl>       <dbl>     <dbl>
## 1      BE                                   7506    13678.33 0.5487512
## 2      CH                                   4587     7502.40 0.6114044
## 3      DE                                  41640    19358.29 2.1510164
## 4      ES                                  43965    20779.75 2.1157617
## 5      IT                                  28317    16957.32 1.6698983
## 6      NL                                  18609    19088.20 0.9748955
## 7      UK                                  41769    25311.60 1.6501920

Figure out the revenue from App Installs. ***Note, the total spend for facebook ads is above.

print(rev <- sum(revenue_spend$`Total Cost Per Start A Booking Event`))
## [1] 186393
print(spend <- sum(revenue_spend$total_spent))
## [1] 122675.9
print(profit<- rev-spend)
## [1] 63717.11

Percentage of Total Cost Per Start A Booking Event

summarise(group_by(markets, Country),
          `Percentage of Total Cost Booking Event` = `Total Cost Per Start A Booking Event`/ (186393)*100)
## # A tibble: 7 x 2
##   Country `Percentage of Total Cost Booking Event`
##     <chr>                                    <dbl>
## 1      BE                                 4.026975
## 2      CH                                 2.460929
## 3      DE                                22.339895
## 4      ES                                23.587259
## 5      IT                                15.192094
## 6      NL                                 9.983744
## 7      UK                                22.409103

Creating a Pie Chart to visualize the information regarding the Start A Booking Process

ggplot(markets, aes(x="", y=`Total Cost Per Start A Booking Event` , fill= Country)) +
  geom_bar(stat = "identity", position = "stack") +
  coord_polar("y", start=0) +
  guides(fill = guide_legend(title = "Countries"))

####In terms of market for ‘Start a Booking Process’, the visualization above shows us that Spain is the largest market with 43965 events started. Next is The UK, followed closely behind by Germany. The smallest market is Switzerland, with only 4587 ‘Start a Booking Process’ events.

Take a look at the total number of installs per country to see if that dictates a larger market.

marketinstalls <- summarise(group_by(fbinstalls, Country),
          `Total Installs of App` = sum(`Facebook Installs`))
marketinstalls[order(marketinstalls$`Total Installs of App`),]
## # A tibble: 7 x 2
##   Country `Total Installs of App`
##     <chr>                   <dbl>
## 1      CH                    3296
## 2      BE                    6524
## 3      NL                   12925
## 4      DE                   13731
## 5      IT                   15428
## 6      ES                   17193
## 7      UK                   22442
sum(marketinstalls$`Total Installs of App`)
## [1] 91539

Determine the Percentage of Installs Per Country

summarise(group_by(marketinstalls, Country),
          `Percentage of Total Cost Booking Event` = `Total Installs of App`/ (91539)*100)
## # A tibble: 7 x 2
##   Country `Percentage of Total Cost Booking Event`
##     <chr>                                    <dbl>
## 1      BE                                 7.127017
## 2      CH                                 3.600651
## 3      DE                                15.000164
## 4      ES                                18.782158
## 5      IT                                16.854019
## 6      NL                                14.119665
## 7      UK                                24.516326

Visualization of the information from above

ggplot(marketinstalls, aes(x="", y=`Total Installs of App` , fill= Country)) +
  geom_bar(stat = "identity", position = "stack") +
  coord_polar("y", start=0) +
  guides(fill = guide_legend(title = "Countries"))

In terms of largest market for App Installs, The UK had the most App installs by 5,249 installs. The following markets are Spain and Italy. The smallest market is Switzerland, which might explain why they also had the smallest market for ‘Start A Booking Process’

Other Information I found to be intersting about the data is user retention and Cohorts use of the App over time. Plus, Age and Gender of the retained users.

I’d like to find out about how the number of Start a Booking Process changes after the first month of downloading the app.

retaining_users <- select(iosdata, c(1, 3, 5, 7, 9, 11, 26, 29))
colnames(retaining_users) <- c("Cohort", "Country", "Sex", "Age", "Months After Install", "Retained Users", "Edit Search Form", "Start A Booking Process")
head(retaining_users)
## # A tibble: 6 x 8
##       Cohort Country   Sex   Age `Months After Install` `Retained Users`
##       <dttm>   <chr> <chr> <chr>                  <dbl>            <dbl>
## 1 2015-04-01      BE     F 20-29                      0                1
## 2 2015-04-01      BE     F 20-29                      1               NA
## 3 2015-04-01      BE     F 20-29                      2               NA
## 4 2015-04-01      BE     F 20-29                      3               NA
## 5 2015-04-01      BE     F 20-29                      4               NA
## 6 2015-04-01      BE     F 20-29                      5               NA
## # ... with 2 more variables: `Edit Search Form` <dbl>, `Start A Booking
## #   Process` <dbl>
booking_event_month <- retaining_users %>%
  group_by(Cohort, `Months After Install` ) %>%
  summarise(`Start a Booking Process By Month` = sum(`Start A Booking Process`, na.rm=TRUE))
booking_event_month
## # A tibble: 74 x 3
## # Groups:   Cohort [?]
##        Cohort `Months After Install` `Start a Booking Process By Month`
##        <dttm>                  <dbl>                              <dbl>
##  1 2015-01-01                      0                               2404
##  2 2015-01-01                      1                                310
##  3 2015-01-01                      2                                183
##  4 2015-01-01                      3                                 90
##  5 2015-01-01                      4                                 95
##  6 2015-01-01                      5                                 99
##  7 2015-01-01                      6                                 71
##  8 2015-01-01                      7                                 57
##  9 2015-01-01                      8                                 51
## 10 2015-01-01                      9                                 76
## # ... with 64 more rows

Visualization of the information above. Graph by Cohort.

ggplot(booking_event_month, aes(x =`Months After Install`,
                             y = `Start a Booking Process By Month`, fill = Cohort)) +
  geom_bar(stat = "Identity") +
  ggtitle("App Use By Monthly Cohort") +
  xlab("Month") +
  ylab("Start A Booking Process") +
  facet_grid(. ~ Cohort)

I was interested in looking at what age group had the most engaged users (or the users who were starting the most bookings). This is a good insight for marketers to know what age group they should be targeting ads to.

Age <- retaining_users %>%
  group_by(Age)%>%
  summarise(Users = sum(`Start A Booking Process`, na.rm = TRUE))
Age[order(-Age$Users),]
## # A tibble: 10 x 2
##      Age Users
##    <chr> <dbl>
##  1 XX-XX 19398
##  2 20-29 13640
##  3 18-34  9228
##  4 30-39  5288
##  5 35-65  3548
##  6 35-54  3088
##  7 40-49  3032
##  8   50+  2559
##  9 55-65  2103
## 10 18-65   247

We don’t have any Age information about the largest group of Retained Users for the App, but we can see that 18-34 Age Group makes up a large percentage of the Users of the App.

I’d also like to look at the gender of the user who is starting the most booking processes.

Gender <- retaining_users %>%
  group_by(Sex)%>%
  summarise(Users = sum(`Start A Booking Process`, na.rm = TRUE))
Gender[order(-Gender$Users),]
## # A tibble: 3 x 2
##     Sex Users
##   <chr> <dbl>
## 1     X 37612
## 2     M 13026
## 3     F 11493