Loading the two sheets of Data for analysis
fbdata <- read_excel("~/Dropbox/Twigeo/Case-Analyst.xlsx", sheet =
"FB Data")
head(fbdata)## # A tibble: 6 x 7
## `Reporting Starts` `Reporting Ends`
## <dttm> <dttm>
## 1 2015-10-01 2015-10-31
## 2 2015-10-01 2015-10-31
## 3 2015-10-01 2015-10-31
## 4 2015-10-01 2015-10-31
## 5 2015-07-01 2015-07-31
## 6 2015-08-01 2015-08-31
## # ... with 5 more variables: `Campaign Name` <chr>, Reach <dbl>,
## # Impressions <dbl>, `Clicks (All)` <dbl>, `Amount Spent (USD)` <dbl>
adjusted_IOS <- read_excel("~/Dropbox/Twigeo/Case-Analyst.xlsx",
sheet = "adjust iOS data")
head(adjusted_IOS)## # A tibble: 6 x 33
## Date Network Campaign `OS Name`
## <dttm> <chr> <chr> <chr>
## 1 2015-04-01 Facebook Installs BE|iOS|F|Phone|20-29 ios
## 2 2015-04-01 Facebook Installs BE|iOS|F|Phone|20-29 ios
## 3 2015-04-01 Facebook Installs BE|iOS|F|Phone|20-29 ios
## 4 2015-04-01 Facebook Installs BE|iOS|F|Phone|20-29 ios
## 5 2015-04-01 Facebook Installs BE|iOS|F|Phone|20-29 ios
## 6 2015-04-01 Facebook Installs BE|iOS|F|Phone|20-29 ios
## # ... with 29 more variables: `Months after Install` <dbl>, `Cohort
## # Size` <dbl>, `Retained Users` <dbl>, Sessions <dbl>, `Paying
## # Users` <lgl>, `General Revenue Event (0) (Events)` <lgl>, `General
## # Revenue Event (0) (Revenue)` <lgl>, `General Revenue Event (0)
## # (Converted Users)` <lgl>, `Presented Booking View Error (665jzy)
## # (Events)` <lgl>, `Presented Booking View Error (665jzy)
## # (Revenue)` <lgl>, `Presented Booking View Error (665jzy) (Converted
## # Users)` <lgl>, `Click Close Booking View Error (9hkkvc)
## # (Events)` <lgl>, `Click Close Booking View Error (9hkkvc)
## # (Revenue)` <lgl>, `Click Close Booking View Error (9hkkvc) (Converted
## # Users)` <lgl>, `Click Try Again Booking View Error (b6u65l)
## # (Events)` <lgl>, `Click Try Again Booking View Error (b6u65l)
## # (Revenue)` <lgl>, `Click Try Again Booking View Error (b6u65l)
## # (Converted Users)` <lgl>, `Edit search form (le5kp0) (Events)` <dbl>,
## # `Edit search form (le5kp0) (Revenue)` <lgl>, `Edit search form
## # (le5kp0) (Converted Users)` <dbl>, `Start booking process (lziar6)
## # (Events)` <dbl>, `Start booking process (lziar6) (Revenue)` <lgl>,
## # `Start booking process (lziar6) (Converted Users)` <dbl>, `Perform a
## # search (m6lxsy) (Events)` <dbl>, `Perform a search (m6lxsy)
## # (Revenue)` <lgl>, `Perform a search (m6lxsy) (Converted Users)` <dbl>,
## # `Presented Booking View Success (vl9o24) (Events)` <lgl>, `Presented
## # Booking View Success (vl9o24) (Revenue)` <lgl>, `Presented Booking
## # View Success (vl9o24) (Converted Users)` <lgl>
Divide the Campaign Column into 5 additional columns for analysis.
fbdata <- separate(fbdata, `Campaign Name`, into= c('Country', 'OS', 'Sex', 'Device', 'Age'), sep = "\\|")
head(fbdata)## # A tibble: 6 x 11
## `Reporting Starts` `Reporting Ends` Country OS Sex
## <dttm> <dttm> <chr> <chr> <chr>
## 1 2015-10-01 2015-10-31 IT iOS X
## 2 2015-10-01 2015-10-31 IT Android X
## 3 2015-10-01 2015-10-31 DE Android X
## 4 2015-10-01 2015-10-31 DE iOS X
## 5 2015-07-01 2015-07-31 UK iOS X
## 6 2015-08-01 2015-08-31 UK iOS X
## # ... with 6 more variables: Device <chr>, Age <chr>, Reach <dbl>,
## # Impressions <dbl>, `Clicks (All)` <dbl>, `Amount Spent (USD)` <dbl>
Summary of the Facebook Data. Checking for any outliers or anything that seems weird with the data
summary(fbdata)## Reporting Starts Reporting Ends
## Min. :2015-01-01 00:00:00 Min. :2015-01-31 00:00:00
## 1st Qu.:2015-03-01 00:00:00 1st Qu.:2015-03-31 00:00:00
## Median :2015-05-01 00:00:00 Median :2015-05-31 00:00:00
## Mean :2015-04-25 04:18:10 Mean :2015-05-24 14:01:26
## 3rd Qu.:2015-06-01 00:00:00 3rd Qu.:2015-06-30 00:00:00
## Max. :2015-10-01 00:00:00 Max. :2015-10-31 00:00:00
## Country OS Sex
## Length:753 Length:753 Length:753
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## Device Age Reach Impressions
## Length:753 Length:753 Min. : 192 Min. : 290
## Class :character Class :character 1st Qu.: 7895 1st Qu.: 18897
## Mode :character Mode :character Median : 27223 Median : 50006
## Mean : 50802 Mean : 108090
## 3rd Qu.: 70344 3rd Qu.: 136008
## Max. :420279 Max. :1202725
## Clicks (All) Amount Spent (USD)
## Min. : 2 Min. : 0.56
## 1st Qu.: 208 1st Qu.: 105.05
## Median : 619 Median : 256.92
## Mean : 1186 Mean : 464.92
## 3rd Qu.: 1719 3rd Qu.: 656.44
## Max. :12021 Max. :4029.33
Calculate the total amount spend on Facebook Advertising
sum(fbdata$`Amount Spent (USD)`)## [1] 350082.3
Divide the Campaign Column into 5 additional columns for further analysis. Filter Data to only include ‘Facebook Installs’ as the ‘Network’ since we are looking at for the cost for installs as it relates to Facebook.
iosdata <- separate(adjusted_IOS, `Campaign`, into= c('Country', 'OS', 'Sex', 'Device', 'Age'), sep = "\\|" )
iosdata <- filter(iosdata, Network == 'Facebook Installs')
head(iosdata)## # A tibble: 6 x 37
## Date Network Country OS Sex Device Age `OS Name`
## <dttm> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 2015-04-01 Facebook Installs BE iOS F Phone 20-29 ios
## 2 2015-04-01 Facebook Installs BE iOS F Phone 20-29 ios
## 3 2015-04-01 Facebook Installs BE iOS F Phone 20-29 ios
## 4 2015-04-01 Facebook Installs BE iOS F Phone 20-29 ios
## 5 2015-04-01 Facebook Installs BE iOS F Phone 20-29 ios
## 6 2015-04-01 Facebook Installs BE iOS F Phone 20-29 ios
## # ... with 29 more variables: `Months after Install` <dbl>, `Cohort
## # Size` <dbl>, `Retained Users` <dbl>, Sessions <dbl>, `Paying
## # Users` <lgl>, `General Revenue Event (0) (Events)` <lgl>, `General
## # Revenue Event (0) (Revenue)` <lgl>, `General Revenue Event (0)
## # (Converted Users)` <lgl>, `Presented Booking View Error (665jzy)
## # (Events)` <lgl>, `Presented Booking View Error (665jzy)
## # (Revenue)` <lgl>, `Presented Booking View Error (665jzy) (Converted
## # Users)` <lgl>, `Click Close Booking View Error (9hkkvc)
## # (Events)` <lgl>, `Click Close Booking View Error (9hkkvc)
## # (Revenue)` <lgl>, `Click Close Booking View Error (9hkkvc) (Converted
## # Users)` <lgl>, `Click Try Again Booking View Error (b6u65l)
## # (Events)` <lgl>, `Click Try Again Booking View Error (b6u65l)
## # (Revenue)` <lgl>, `Click Try Again Booking View Error (b6u65l)
## # (Converted Users)` <lgl>, `Edit search form (le5kp0) (Events)` <dbl>,
## # `Edit search form (le5kp0) (Revenue)` <lgl>, `Edit search form
## # (le5kp0) (Converted Users)` <dbl>, `Start booking process (lziar6)
## # (Events)` <dbl>, `Start booking process (lziar6) (Revenue)` <lgl>,
## # `Start booking process (lziar6) (Converted Users)` <dbl>, `Perform a
## # search (m6lxsy) (Events)` <dbl>, `Perform a search (m6lxsy)
## # (Revenue)` <lgl>, `Perform a search (m6lxsy) (Converted Users)` <dbl>,
## # `Presented Booking View Success (vl9o24) (Events)` <lgl>, `Presented
## # Booking View Success (vl9o24) (Revenue)` <lgl>, `Presented Booking
## # View Success (vl9o24) (Converted Users)` <lgl>
Subset the Facebook data to only include iOS as the operating system. I have done this so that I can merge with data with the ‘Adjusted iOS data’
fbdata <- subset(fbdata, OS == 'iOS')
head(fbdata)## # A tibble: 6 x 11
## `Reporting Starts` `Reporting Ends` Country OS Sex Device Age
## <dttm> <dttm> <chr> <chr> <chr> <chr> <chr>
## 1 2015-10-01 2015-10-31 IT iOS X iPhone 18-65
## 2 2015-10-01 2015-10-31 DE iOS X iPhone XX-XX
## 3 2015-07-01 2015-07-31 UK iOS X Phone 35-65
## 4 2015-08-01 2015-08-31 UK iOS X Phone 35-65
## 5 2015-09-01 2015-09-30 UK iOS X Phone 35-65
## 6 2015-10-01 2015-10-31 UK iOS X Phone 35-65
## # ... with 4 more variables: Reach <dbl>, Impressions <dbl>, `Clicks
## # (All)` <dbl>, `Amount Spent (USD)` <dbl>
Subset the ‘Adjusted iOS data’ to only the ‘Months after Install’ data to equal 0. This will count monthly cohorts.
ioscohort <- subset(iosdata, Network == 'Facebook Installs')
ioscohort <- subset(ioscohort, `Months after Install` == 0)Group the data by Country and Date to figure out the number of installs Per Month/Per Country.
fbinstalls <- ioscohort %>%
group_by(Country, Date) %>%
summarise(`Facebook Installs` = sum(`Cohort Size`, na.rm = TRUE))
head(fbinstalls)## # A tibble: 6 x 3
## # Groups: Country [1]
## Country Date `Facebook Installs`
## <chr> <dttm> <dbl>
## 1 BE 2015-04-01 89
## 2 BE 2015-05-01 131
## 3 BE 2015-06-01 7
## 4 BE 2015-07-01 2087
## 5 BE 2015-08-01 2522
## 6 BE 2015-09-01 1237
Summarizing the information on Facebook Installs to see what the range is.
summary(fbinstalls$`Facebook Installs`)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 7.0 785.5 1503.5 1430.3 2048.0 3245.0
Finding out what the minimum and maximum installs per country were.
summarise(group_by(fbinstalls, Country),
`Minimum Installs` = min(`Facebook Installs`), `Maximum Installs` = max(`Facebook Installs`))## # A tibble: 7 x 3
## Country `Minimum Installs` `Maximum Installs`
## <chr> <dbl> <dbl>
## 1 BE 7 2522
## 2 CH 20 1259
## 3 DE 594 2081
## 4 ES 771 2490
## 5 IT 25 2883
## 6 NL 505 1781
## 7 UK 983 3245
Graphing that information for a visualization of the that information
ggplot(fbinstalls, aes(x = Country,
y = `Facebook Installs`, fill = Country)) +
geom_boxplot()Figuring out the total amount spent on Facebook per country, per month
fbcosts <- fbdata %>%
group_by(Country, `Reporting Starts`) %>%
summarise(totalspent = sum(`Amount Spent (USD)`))
names(fbcosts) <- c("Country", "Date", "Total_Spent")
head(fbcosts)## # A tibble: 6 x 3
## # Groups: Country [1]
## Country Date Total_Spent
## <chr> <dttm> <dbl>
## 1 BE 2015-04-01 709.78
## 2 BE 2015-05-01 1310.84
## 3 BE 2015-06-01 551.34
## 4 BE 2015-07-01 3048.41
## 5 BE 2015-08-01 4021.67
## 6 BE 2015-09-01 2943.82
Merging the Facebook Spend Data and the Facebook Install data by Country and Date to determine the Cost per install
cost_install <- merge(fbcosts, fbinstalls, by=c("Country", "Date"), all = TRUE)
cost_install## Country Date Total_Spent Facebook Installs
## 1 BE 2015-04-01 709.78 89
## 2 BE 2015-05-01 1310.84 131
## 3 BE 2015-06-01 551.34 7
## 4 BE 2015-07-01 3048.41 2087
## 5 BE 2015-08-01 4021.67 2522
## 6 BE 2015-09-01 2943.82 1237
## 7 BE 2015-10-01 1092.47 451
## 8 CH 2015-04-01 1032.36 119
## 9 CH 2015-05-01 1314.21 137
## 10 CH 2015-06-01 314.23 20
## 11 CH 2015-07-01 1037.50 688
## 12 CH 2015-08-01 1852.63 1259
## 13 CH 2015-09-01 1357.99 781
## 14 CH 2015-10-01 593.48 292
## 15 DE 2015-01-01 517.50 594
## 16 DE 2015-02-01 836.97 898
## 17 DE 2015-03-01 1153.00 1357
## 18 DE 2015-04-01 915.34 1193
## 19 DE 2015-05-01 2769.20 1787
## 20 DE 2015-06-01 4424.34 2081
## 21 DE 2015-07-01 2993.65 1660
## 22 DE 2015-08-01 2163.56 1739
## 23 DE 2015-09-01 1686.25 1109
## 24 DE 2015-10-01 1898.48 1313
## 25 ES 2015-01-01 801.38 1107
## 26 ES 2015-02-01 794.36 1070
## 27 ES 2015-03-01 1591.56 2481
## 28 ES 2015-04-01 972.17 1387
## 29 ES 2015-05-01 2491.88 1684
## 30 ES 2015-06-01 3729.18 2490
## 31 ES 2015-07-01 3197.36 2166
## 32 ES 2015-08-01 2950.49 2330
## 33 ES 2015-09-01 2495.87 1707
## 34 ES 2015-10-01 1755.50 771
## 35 IT 2015-01-01 115.87 33
## 36 IT 2015-02-01 61.69 25
## 37 IT 2015-03-01 498.29 787
## 38 IT 2015-04-01 976.04 1639
## 39 IT 2015-05-01 1938.45 1593
## 40 IT 2015-06-01 2781.09 1833
## 41 IT 2015-07-01 2607.83 2037
## 42 IT 2015-08-01 2758.86 2634
## 43 IT 2015-09-01 2571.86 2883
## 44 IT 2015-10-01 2647.34 1964
## 45 NL 2015-01-01 657.79 727
## 46 NL 2015-02-01 1269.71 1364
## 47 NL 2015-03-01 1488.55 1514
## 48 NL 2015-04-01 1081.08 1070
## 49 NL 2015-05-01 2942.57 1772
## 50 NL 2015-06-01 3735.89 1781
## 51 NL 2015-07-01 2915.66 1613
## 52 NL 2015-08-01 2344.59 1535
## 53 NL 2015-09-01 1749.39 1044
## 54 NL 2015-10-01 902.97 505
## 55 UK 2015-01-01 1377.68 1493
## 56 UK 2015-02-01 2433.13 2498
## 57 UK 2015-03-01 2787.77 3245
## 58 UK 2015-04-01 2457.11 2972
## 59 UK 2015-05-01 3257.85 2334
## 60 UK 2015-06-01 3927.71 2507
## 61 UK 2015-07-01 3435.68 2349
## 62 UK 2015-08-01 2448.70 2298
## 63 UK 2015-09-01 1902.73 1763
## 64 UK 2015-10-01 1283.24 983
Adding the total number of installs by country and the total amount spent by country, dividing the total spend by the total number of installs to get the Cost Per Install Per Country.
cost_per_install <- cost_install %>%
group_by(Country) %>%
summarise(total_installs = sum(`Facebook Installs`), total_spent = sum(Total_Spent), CPI = (total_spent / total_installs))
cost_per_install## # A tibble: 7 x 4
## Country total_installs total_spent CPI
## <chr> <dbl> <dbl> <dbl>
## 1 BE 6524 13678.33 2.096617
## 2 CH 3296 7502.40 2.276214
## 3 DE 13731 19358.29 1.409824
## 4 ES 17193 20779.75 1.208617
## 5 IT 15428 16957.32 1.099126
## 6 NL 12925 19088.20 1.476843
## 7 UK 22442 25311.60 1.127867
Creating a bar graph to show the cost per install per country.
ggplot(cost_per_install, aes(x =cost_per_install$Country,
y = cost_per_install$CPI)) +
geom_bar(stat = 'identity', aes(fill = CPI)) +
ggtitle("Cost Per Install Per Country") +
xlab("County") +
ylab("Spend Per Intall in USD") +
scale_y_continuous(breaks = seq(min(0), max(2.50), by = .25))Figuring out the total amount spent on installs per month for Facebook.
cpi_time <- cost_install %>%
group_by(Date) %>%
summarise(`Total Spend` = sum(Total_Spent))
cpi_time## # A tibble: 10 x 2
## Date `Total Spend`
## <dttm> <dbl>
## 1 2015-01-01 3470.22
## 2 2015-02-01 5395.86
## 3 2015-03-01 7519.17
## 4 2015-04-01 8143.88
## 5 2015-05-01 16025.00
## 6 2015-06-01 19463.78
## 7 2015-07-01 19236.09
## 8 2015-08-01 18540.50
## 9 2015-09-01 14707.91
## 10 2015-10-01 10173.48
sum(cpi_time$`Total Spend`)## [1] 122675.9
Graphing the Total Spent on installs over time.
ggplot(cpi_time, aes(Date, `Total Spend`))+
geom_line(color = 'Red', size = 2)+
ggtitle("Cost Per Install Over Time") +
xlab("Month 2015") +
ylab("Total Spend Per Intall in USD")Calculating the Cost Per Install Per Country By Month
cpi_time_country <- cost_install %>%
group_by(Date, Country) %>%
summarise(`Total Spend` = sum(Total_Spent))
cpi_time_country## # A tibble: 64 x 3
## # Groups: Date [?]
## Date Country `Total Spend`
## <dttm> <chr> <dbl>
## 1 2015-01-01 DE 517.50
## 2 2015-01-01 ES 801.38
## 3 2015-01-01 IT 115.87
## 4 2015-01-01 NL 657.79
## 5 2015-01-01 UK 1377.68
## 6 2015-02-01 DE 836.97
## 7 2015-02-01 ES 794.36
## 8 2015-02-01 IT 61.69
## 9 2015-02-01 NL 1269.71
## 10 2015-02-01 UK 2433.13
## # ... with 54 more rows
Graphing the Cost Per Install Per Country By Month
ggplot()+
geom_line(data = cpi_time_country, aes(x = Date, y = `Total Spend`, color = Country)) +
ggtitle("Spend by Country Over Time")Since that Graph is a little hard to follow, I broke out each Country into an individual Graph.
ggplot()+
geom_line(data = cpi_time_country, aes(x = Date, y = `Total Spend`, color = Country)) +
ggtitle("Spend by Country Over Time") +
facet_grid(. ~ Country)Figuring out the cost per individual install over time.
cost_per_install_ot <- cost_install %>%
group_by(Country, Date) %>%
summarise(total_installs = sum(`Facebook Installs`), total_spent = sum(Total_Spent), CPI = (total_spent / total_installs))
cost_per_install_ot## # A tibble: 64 x 5
## # Groups: Country [?]
## Country Date total_installs total_spent CPI
## <chr> <dttm> <dbl> <dbl> <dbl>
## 1 BE 2015-04-01 89 709.78 7.975056
## 2 BE 2015-05-01 131 1310.84 10.006412
## 3 BE 2015-06-01 7 551.34 78.762857
## 4 BE 2015-07-01 2087 3048.41 1.460666
## 5 BE 2015-08-01 2522 4021.67 1.594635
## 6 BE 2015-09-01 1237 2943.82 2.379806
## 7 BE 2015-10-01 451 1092.47 2.422328
## 8 CH 2015-04-01 119 1032.36 8.675294
## 9 CH 2015-05-01 137 1314.21 9.592774
## 10 CH 2015-06-01 20 314.23 15.711500
## # ... with 54 more rows
Taking a look at the Minimum and Maximum CPI per Country.
summarise(group_by(cost_per_install_ot, Country),
`Min Cost Per Install` = min(CPI), `Max Cost Per Install` = max(CPI))## # A tibble: 7 x 3
## Country `Min Cost Per Install` `Max Cost Per Install`
## <chr> <dbl> <dbl>
## 1 BE 1.4606660 78.762857
## 2 CH 1.4715091 15.711500
## 3 DE 0.7672590 2.126064
## 4 ES 0.6414994 2.276913
## 5 IT 0.5955095 3.511212
## 6 NL 0.9048006 2.097636
## 7 UK 0.8267530 1.566697
Graphing the information to get an idea of the spread of the spending.
ggplot()+
geom_line(data = cost_per_install_ot, aes(x = Date, y = CPI, color = Country)) +
ggtitle("Cost Per Install Over Time By Country") +
facet_grid(. ~ Country)Determining what is the Cost per Start a booking process per Country and Month? Subsetting the data to include the Date (cohort), Country, Cohort Size, and Start Booking Process Event.
iosbooking <- select(iosdata, c(1, 3, 10, 29))
head(iosbooking)## # A tibble: 6 x 4
## Date Country `Cohort Size`
## <dttm> <chr> <dbl>
## 1 2015-04-01 BE 1
## 2 2015-04-01 BE 1
## 3 2015-04-01 BE 1
## 4 2015-04-01 BE 1
## 5 2015-04-01 BE 1
## 6 2015-04-01 BE 1
## # ... with 1 more variables: `Start booking process (lziar6)
## # (Events)` <dbl>
Finding the number of ‘Start Booking Process’ events have occured in the data. This is a QA.
sum(iosbooking$`Start booking process (lziar6) (Events)`, na.rm = TRUE)## [1] 62131
Multiplied the number of ‘Start a Booking Event’ by $3 (What GvR estimates the event is worth).
colnames(iosbooking) <- c("Cohort", "Country", "Size", "Booking Event")
booking_cost <- iosbooking %>%
mutate(`Cost Per Booking` = `Booking Event` * 3) %>%
filter(!is.na(`Booking Event`)) %>%
group_by(Country, Cohort) %>%
summarise(`Cost in USD` = sum(`Cost Per Booking`))
booking_cost## # A tibble: 64 x 3
## # Groups: Country [?]
## Country Cohort `Cost in USD`
## <chr> <dttm> <dbl>
## 1 BE 2015-04-01 129
## 2 BE 2015-05-01 237
## 3 BE 2015-06-01 30
## 4 BE 2015-07-01 3063
## 5 BE 2015-08-01 2370
## 6 BE 2015-09-01 975
## 7 BE 2015-10-01 702
## 8 CH 2015-04-01 216
## 9 CH 2015-05-01 189
## 10 CH 2015-06-01 27
## # ... with 54 more rows
This is the sum of the Cost in USD for a ‘Start a Booking Process’ event. This is QA.
sum(booking_cost$`Cost in USD`)## [1] 186393
Created bar plot for a cost per booking event in USD for Each Month
ggplot(booking_cost, aes(x =Cohort,
y = `Cost in USD`)) +
geom_bar(stat = 'identity', aes(fill = Country)) +
ggtitle("Cost Per Booking Event in USD") +
xlab("Month") +
ylab("Spend in USD")The last graph didn’t provide a clear insight into each country. I broke out each country individually to look at the Cost Per Start a booking event.
ggplot(booking_cost, aes(x =Cohort,
y = `Cost in USD`)) +
geom_bar(stat = 'identity', aes(fill = Country)) +
ggtitle("Cost Per Booking Event in USD") +
xlab("Month") +
ylab("Spend in USD") +
facet_grid(. ~ Country)A summary of the minimum and maximum Cost Per Start A Booking Event per Country.
summarise(group_by(booking_cost, Country),
`Minimum Cost Start A Booking` = min(`Cost in USD`), `Maximum Cost Start A Booking` = max(`Cost in USD`))## # A tibble: 7 x 3
## Country `Minimum Cost Start A Booking` `Maximum Cost Start A Booking`
## <chr> <dbl> <dbl>
## 1 BE 30 3063
## 2 CH 27 1599
## 3 DE 2163 6891
## 4 ES 1137 8559
## 5 IT 198 4656
## 6 NL 492 3795
## 7 UK 1575 7773
Finding what the largest market is for booking events
markets <- booking_cost %>%
group_by(Country) %>%
summarise(`Total Cost Per Start A Booking Event` = sum(`Cost in USD`))
markets[order(markets$`Total Cost Per Start A Booking Event`),]## # A tibble: 7 x 2
## Country `Total Cost Per Start A Booking Event`
## <chr> <dbl>
## 1 CH 4587
## 2 BE 7506
## 3 NL 18609
## 4 IT 28317
## 5 DE 41640
## 6 UK 41769
## 7 ES 43965
Figure out ROI for each country.
revenue_spend <- merge(markets, cost_per_install, by=c("Country"))
revenue_spend <- select(revenue_spend, c(1, 2,4))
summarise(group_by(revenue_spend, Country, `Total Cost Per Start A Booking Event`, total_spent),
`ROI` = (`Total Cost Per Start A Booking Event`)/(total_spent))## # A tibble: 7 x 4
## # Groups: Country, Total Cost Per Start A Booking Event [?]
## Country `Total Cost Per Start A Booking Event` total_spent ROI
## <chr> <dbl> <dbl> <dbl>
## 1 BE 7506 13678.33 0.5487512
## 2 CH 4587 7502.40 0.6114044
## 3 DE 41640 19358.29 2.1510164
## 4 ES 43965 20779.75 2.1157617
## 5 IT 28317 16957.32 1.6698983
## 6 NL 18609 19088.20 0.9748955
## 7 UK 41769 25311.60 1.6501920
Figure out the revenue from App Installs. ***Note, the total spend for facebook ads is above.
print(rev <- sum(revenue_spend$`Total Cost Per Start A Booking Event`))## [1] 186393
print(spend <- sum(revenue_spend$total_spent))## [1] 122675.9
print(profit<- rev-spend)## [1] 63717.11
Percentage of Total Cost Per Start A Booking Event
summarise(group_by(markets, Country),
`Percentage of Total Cost Booking Event` = `Total Cost Per Start A Booking Event`/ (186393)*100)## # A tibble: 7 x 2
## Country `Percentage of Total Cost Booking Event`
## <chr> <dbl>
## 1 BE 4.026975
## 2 CH 2.460929
## 3 DE 22.339895
## 4 ES 23.587259
## 5 IT 15.192094
## 6 NL 9.983744
## 7 UK 22.409103
Creating a Pie Chart to visualize the information regarding the Start A Booking Process
ggplot(markets, aes(x="", y=`Total Cost Per Start A Booking Event` , fill= Country)) +
geom_bar(stat = "identity", position = "stack") +
coord_polar("y", start=0) +
guides(fill = guide_legend(title = "Countries")) ####In terms of market for ‘Start a Booking Process’, the visualization above shows us that Spain is the largest market with 43965 events started. Next is The UK, followed closely behind by Germany. The smallest market is Switzerland, with only 4587 ‘Start a Booking Process’ events.
Take a look at the total number of installs per country to see if that dictates a larger market.
marketinstalls <- summarise(group_by(fbinstalls, Country),
`Total Installs of App` = sum(`Facebook Installs`))
marketinstalls[order(marketinstalls$`Total Installs of App`),]## # A tibble: 7 x 2
## Country `Total Installs of App`
## <chr> <dbl>
## 1 CH 3296
## 2 BE 6524
## 3 NL 12925
## 4 DE 13731
## 5 IT 15428
## 6 ES 17193
## 7 UK 22442
sum(marketinstalls$`Total Installs of App`)## [1] 91539
Determine the Percentage of Installs Per Country
summarise(group_by(marketinstalls, Country),
`Percentage of Total Cost Booking Event` = `Total Installs of App`/ (91539)*100)## # A tibble: 7 x 2
## Country `Percentage of Total Cost Booking Event`
## <chr> <dbl>
## 1 BE 7.127017
## 2 CH 3.600651
## 3 DE 15.000164
## 4 ES 18.782158
## 5 IT 16.854019
## 6 NL 14.119665
## 7 UK 24.516326
Visualization of the information from above
ggplot(marketinstalls, aes(x="", y=`Total Installs of App` , fill= Country)) +
geom_bar(stat = "identity", position = "stack") +
coord_polar("y", start=0) +
guides(fill = guide_legend(title = "Countries"))In terms of largest market for App Installs, The UK had the most App installs by 5,249 installs. The following markets are Spain and Italy. The smallest market is Switzerland, which might explain why they also had the smallest market for ‘Start A Booking Process’
Other Information I found to be intersting about the data is user retention and Cohorts use of the App over time. Plus, Age and Gender of the retained users.
I’d like to find out about how the number of Start a Booking Process changes after the first month of downloading the app.
retaining_users <- select(iosdata, c(1, 3, 5, 7, 9, 11, 26, 29))
colnames(retaining_users) <- c("Cohort", "Country", "Sex", "Age", "Months After Install", "Retained Users", "Edit Search Form", "Start A Booking Process")
head(retaining_users)## # A tibble: 6 x 8
## Cohort Country Sex Age `Months After Install` `Retained Users`
## <dttm> <chr> <chr> <chr> <dbl> <dbl>
## 1 2015-04-01 BE F 20-29 0 1
## 2 2015-04-01 BE F 20-29 1 NA
## 3 2015-04-01 BE F 20-29 2 NA
## 4 2015-04-01 BE F 20-29 3 NA
## 5 2015-04-01 BE F 20-29 4 NA
## 6 2015-04-01 BE F 20-29 5 NA
## # ... with 2 more variables: `Edit Search Form` <dbl>, `Start A Booking
## # Process` <dbl>
booking_event_month <- retaining_users %>%
group_by(Cohort, `Months After Install` ) %>%
summarise(`Start a Booking Process By Month` = sum(`Start A Booking Process`, na.rm=TRUE))
booking_event_month## # A tibble: 74 x 3
## # Groups: Cohort [?]
## Cohort `Months After Install` `Start a Booking Process By Month`
## <dttm> <dbl> <dbl>
## 1 2015-01-01 0 2404
## 2 2015-01-01 1 310
## 3 2015-01-01 2 183
## 4 2015-01-01 3 90
## 5 2015-01-01 4 95
## 6 2015-01-01 5 99
## 7 2015-01-01 6 71
## 8 2015-01-01 7 57
## 9 2015-01-01 8 51
## 10 2015-01-01 9 76
## # ... with 64 more rows
Visualization of the information above. Graph by Cohort.
ggplot(booking_event_month, aes(x =`Months After Install`,
y = `Start a Booking Process By Month`, fill = Cohort)) +
geom_bar(stat = "Identity") +
ggtitle("App Use By Monthly Cohort") +
xlab("Month") +
ylab("Start A Booking Process") +
facet_grid(. ~ Cohort)I was interested in looking at what age group had the most engaged users (or the users who were starting the most bookings). This is a good insight for marketers to know what age group they should be targeting ads to.
Age <- retaining_users %>%
group_by(Age)%>%
summarise(Users = sum(`Start A Booking Process`, na.rm = TRUE))
Age[order(-Age$Users),]## # A tibble: 10 x 2
## Age Users
## <chr> <dbl>
## 1 XX-XX 19398
## 2 20-29 13640
## 3 18-34 9228
## 4 30-39 5288
## 5 35-65 3548
## 6 35-54 3088
## 7 40-49 3032
## 8 50+ 2559
## 9 55-65 2103
## 10 18-65 247
We don’t have any Age information about the largest group of Retained Users for the App, but we can see that 18-34 Age Group makes up a large percentage of the Users of the App.
I’d also like to look at the gender of the user who is starting the most booking processes.
Gender <- retaining_users %>%
group_by(Sex)%>%
summarise(Users = sum(`Start A Booking Process`, na.rm = TRUE))
Gender[order(-Gender$Users),]## # A tibble: 3 x 2
## Sex Users
## <chr> <dbl>
## 1 X 37612
## 2 M 13026
## 3 F 11493