The Door-to-Door Caravan Sampling Project

Jayce Jocson, JR Olivarez

==========================================================================

Synopsis

The marketing agency of Globe has been considering to have a project that will increase the revenue of the company through the activated SIM cards. This is a great opportunity introduced using the project of the Door-to-Door Caravan Sampling which aims to engage households of the participating barangays by handing out free prepaid SIM cards. This free sim cards, once activated will be the cause of increase of the revenue.

The goal of this research is to provide analysis for the outsource agency’s proposal of PhP 5,000,000 contract with Globe to do the 100 days of sampling which includes the salaries of the ten samplers, as well as the company’s van and a driver.

Project Proposal:

Contract: Php 5,000,000
Resource Amount
Duration of project 100 days
Focus Group of samplers 10 samplers
Van’s Utilization Php 10,000 / day

Data Processing

Loading and preprocessing the data

Loading the necessary libraries and reading the household data

library (dplyr)
library(ggplot2)

if(!file.exists("hh.csv.zip")) {
        temp <- tempfile()
        download.file("https://github.com/jayce0902/door_to_door/blob/main/hh.csv.zip",temp)
        unzip(temp)
        unlink(temp)
}
hh <- read.csv("hh.csv")

Analyzing the Household Data

Provide a quickview of the data (columns and data types).

names(hh)
##  [1] "Island"                          "Major_Island"                   
##  [3] "Region"                          "Province"                       
##  [5] "ProvinceCity"                    "CityMunicipcality"              
##  [7] "Barangay"                        "TotalPopulation"                
##  [9] "NHouseholds"                     "AreaBarangay"                   
## [11] "lat"                             "long"                           
## [13] "Size"                            "TGT.Distance.KM"                
## [15] "LAT"                             "LONG"                           
## [17] "Time.Travel.HRS"                 "Time.Travel"                    
## [19] "Cost.of.Simcard"                 "Assumed.Revenue.per.YR"         
## [21] "Interaction.Time.per.Sampler"    "Area.per.household"             
## [23] "time.travelled.per.household.HR" "time.travelled.per.hh"          
## [25] "Int.Time.per.hh"                 "total.time.per.hh"              
## [27] "work.time"                       "work.time.with.TT"              
## [29] "hh.per.sampler.per.day"          "total.hh.per.day"               
## [31] "hits.per.day"
str(hh)
## 'data.frame':    42037 obs. of  31 variables:
##  $ Island                         : chr  "Luzon" "Mindanao" "Visayas" "Luzon" ...
##  $ Major_Island                   : chr  "GMA" "Mindanao" "Visayas" "North Luzon" ...
##  $ Region                         : chr  "NATIONAL CAPITAL REGION" "REGION IX - ZAMBOANGA PENINSULA" "REGION VIII - EASTERN VISAYAS" "REGION I - ILOCOS" ...
##  $ Province                       : chr  "NATIONAL CAPITAL REGION" "CITY OF ISABELA" "LEYTE" "LA UNION" ...
##  $ ProvinceCity                   : chr  "CITY OF MAKATI" "CITY OF ISABELA" "LEYTE" "LA UNION" ...
##  $ CityMunicipcality              : chr  "CITY OF MAKATI" "NAVAL (Capital)" "TACLOBAN CITY (Capital)" "CITY OF SAN FERNANDO (Capital)" ...
##  $ Barangay                       : chr  "Urdaneta" "Tabuk (Pob.)" "Barangay 99 (Diit)" "San Francisco" ...
##  $ TotalPopulation                : int  4429 6696 5478 3432 52832 5581 3349 5632 2289 4584 ...
##  $ NHouseholds                    : int  935 1273 1212 819 13347 1270 811 1296 541 1137 ...
##  $ AreaBarangay                   : num  0.654 1.525 1.462 1.105 18.956 ...
##  $ lat                            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ long                           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Size                           : chr  "Large" "Large" "Large" "Large" ...
##  $ TGT.Distance.KM                : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ LAT                            : num  14.5 14.5 14.5 14.5 14.5 ...
##  $ LONG                           : num  121 121 121 121 121 ...
##  $ Time.Travel.HRS                : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ Time.Travel                    : chr  NA NA NA NA ...
##  $ Cost.of.Simcard                : int  84150 114570 109080 73710 1201230 114300 72990 116640 48690 102330 ...
##  $ Assumed.Revenue.per.YR         : num  35062 47738 45450 30712 500512 ...
##  $ Interaction.Time.per.Sampler   : int  2805 3819 3636 2457 40041 3810 2433 3888 1623 3411 ...
##  $ Area.per.household             : num  0.000699 0.001198 0.001206 0.001349 0.00142 ...
##  $ time.travelled.per.household.HR: num  0.00661 0.00865 0.00868 0.00918 0.00942 ...
##  $ time.travelled.per.hh          : chr  "00:00:24" "00:00:31" "00:00:31" "00:00:33" ...
##  $ Int.Time.per.hh                : chr  "0:00:30" "0:00:30" "0:00:30" "0:00:30" ...
##  $ total.time.per.hh              : chr  "0:00:54" "0:12:57" "0:13:00" "0:13:43" ...
##  $ work.time                      : chr  "8:00:00" "8:00:00" "8:00:00" "8:00:00" ...
##  $ work.time.with.TT              : chr  NA NA NA NA ...
##  $ hh.per.sampler.per.day         : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ total.hh.per.day               : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ hits.per.day                   : int  NA NA NA NA NA NA NA NA NA NA ...

Analyzing the data of Participated Barangays

Generate a graphical comparison that provides the highest hits per barangay given that the 10 samplers maximize the hits per day for all l00 days.

hh <- hh %>% arrange(desc(hits.per.day))

top_hits <- head(hh,100)
top_hits <- top_hits %>% arrange(desc(hits.per.day))


# Create a color gradient based on hits per day
color_range <- colorRampPalette(c("purple","deeppink", "cyan"))(length(top_hits$hits.per.day))

barplot(top_hits$hits.per.day,
        names.arg = top_hits$Barangay,
        main = "Top 100 Barangays with Highest Hits",
        ylab = "Hits/day Each Barangay",
        col = color_range,
        las = 2,
        angle = 45,
        cex.names = 0.7
)

Analyzing the data of Participated Cities and their Hits.

Generate a graphical comparison that provides the highest hits per city given the same condition of number of samplers and number of days. This helps in producing an inference to which cities should be a feasible and practical approach of location sample for the project.

Project Duration = 100 days

top_grouped_city <- top_hits %>%
        group_by(CityMunicipcality) %>%
        summarise(
                d100_hits = sum(hits.per.day[!is.na(hits.per.day)])) %>%
        arrange(desc(d100_hits))

# Create a color gradient
color_range <- colorRampPalette(c("deeppink", "purple", "cyan"))

# Create the plot
ggplot(top_grouped_city, aes(factor(CityMunicipcality, levels = CityMunicipcality[order(-d100_hits)]), y = d100_hits, fill = CityMunicipcality)) +
  geom_col() +
  geom_label(aes(label = d100_hits, color = CityMunicipcality), fill = "white", fontface = "bold") +
  ylab('Hits/day Each City') +
  xlab("Cities") +
  ggtitle(expression("Top City with Highest Hits")) +
  scale_fill_manual(values = color_range(length(unique(top_grouped_city$CityMunicipcality)))) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  theme(plot.margin = margin(30, 10, 10, 10, "pt")) +
  theme(panel.background = element_blank())

How about if we extend the duration of the project from 100 days to 200 days given the same conditions as above.

Let’s drill down!

Project Duration = 200 days

top_hits_200 <- head(hh,200)
top_hits_200 <- top_hits_200 %>% arrange(desc(hits.per.day))

top_grouped_city_200 <- top_hits_200 %>%
        group_by(CityMunicipcality) %>%
        summarise(
                d200_hits = sum(hits.per.day[!is.na(hits.per.day)])) %>%
        arrange(desc(d200_hits))

color_range <- colorRampPalette(c("deeppink", "purple", "cyan"))

ggplot(top_grouped_city_200, aes(factor(CityMunicipcality, levels = CityMunicipcality[order(-d200_hits)]), y = d200_hits, fill = CityMunicipcality)) +
  geom_col() +
  geom_label(aes(label = d200_hits, color = CityMunicipcality), fill = "white", fontface = "bold") +
  ylab('Hits/day Each City') +
  xlab("Cities") +
  ggtitle(expression("Top City with Highest Hits")) +
  scale_fill_manual(values = color_range(length(unique(top_grouped_city_200$CityMunicipcality)))) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  theme(plot.margin = margin(30, 10, 10, 10, "pt")) +
  theme(panel.background = element_blank())

Insights

We now have witnessed how extending the number of sampling days affect the number of potential hits per city. As you may see in the graph, the top cities in the 100 sampling days gained more potential hits in terms of sampling days extension. Consequently, more cities are added from the existing set of cities in the sampling locations which provide potential hits higher than 1000.

Analyzing the data of Interaction Breakdown per Sampler.

This section helps in providing insights to visualize, how a typical sampler do his/her work of interacting with a potential hit, travelling by van and walking.

Typical Activity Interaction of a Sampler per Day

Interaction Description
Interacting with potential Hit Fulfills his duty to distribute the prepaid SIMs with the households
Travelling by Van Commuting from TGT to barangay by Van
Walking Travelling from household to another.
  • Interacting with potential Hit
hh$interacting_potential_hit <- (5700/8)*(30/3600)*(0.75*(0.9*hh$NHouseholds))
  • Travelling by van

Assumption:

highway rating: 25km/L, gas = P60/L, van rental = P4000, driver’s salary = P570/day

hh$van_utlization <- (60/25)*hh$TGT.Distance.KM + 570 + 4000
  • Walking

Assumption:

Sampler walks on average 4 km/hr

hh$walking <- (5700/8)*hh$time.travelled.per.household.HR*hh$NHouseholds

Graphical representation of the sampler’s interaction

hh2 <- hh[complete.cases(hh), ]
total <- (mean(hh2$interacting_potential_hit) + mean(hh2$van_utlization) + mean(hh2$walking))
proportions <- c(mean(hh2$interacting_potential_hit) , mean(hh2$van_utlization) , mean(hh2$walking))/total
label <- c("interacting_potential_hit","van_utlization","walking")
labels <- paste(label, "\n", round(proportions*100,0))
color_range <- colorRampPalette(c("deeppink", "mediumorchid", "slateblue"))(length(proportions))
pie(proportions, labels = labels, col = color_range)

Analyzing the data of Participated Cities and their Hits.

Sampling Outside Metro Manila (E.g. Region III) This section considers a sampling strategy outside Metro Manila. As depicted on previous graph, most cities with the highest potential number of hits are coming from the NCR Region. However, we can also have a sampling strategy in provincial areas to test if we can maximize our sampling reach. Here is an illustration of how many potential hits can be accumulated in Region III based on a range of working hours:

filtered_data <- subset(hh, Region == "REGION III - CENTRAL LUZON" )
filtered_data$work.time.with.TT <- as.POSIXct(filtered_data$work.time.with.TT, format = "%H:%M:%S")
intervals <- as.POSIXct(c("00:00:00","03:00:00", "05:00:00", "08:00:00"), format = "%H:%M:%S")
filtered_data <- filtered_data[complete.cases(filtered_data),]

ggplot(filtered_data, aes(x = cut(work.time.with.TT, breaks = intervals), y = hits.per.day, fill = cut(work.time.with.TT, breaks = intervals))) +
  geom_boxplot() +
  labs(x = "Work Time", y = "Hits per Day", title = "Hits per Day by Work Time Alloted in Region III") +
  scale_fill_manual(values = c("cyan","magenta", "purple")) +
  guides(fill = FALSE) +
  scale_x_discrete(labels = c("0-2 hours work","3-5 hours work", "5-7 hours work")) +
  theme(panel.background = element_blank())

Insights

  • Pros: It can be worthwhile to Regions Central Luzon and CALABARZON as these are areas near Metro Manila if maximizing sampling reach is to implement. Most travel times for these regions are around 1-3 hours from TGT.

  • Cons: The cost for longer travel time can exceed sampler’s time 8 work-hour shift. It is most likely not feasible to do a land travel far from Metro Manila.

Return of Investment

Factors Value
Revenue Php 18,955,630
Cost of Sim Card 40 x Possible Hits
Return of Investment Php 13,955,630

Conclusion

As we close the research study for the Sim Door-to-Door Caravan, Here are the inferences we yield from the above analyses:

Feasibility Aspects of the Project

With the given insights, analysis, and deliberated evaluation, the Door-To-Door Caravan Sampling Project is not only feasible but also has the potential for a profitable return on investment. These also considers the market, technical, socio-economic feasibility of the project aside from counter-validating if the proposed contract of PHP 5M would meet the ROI.

Main key points on the analysis are:

  1. Highest potential hits are in the National Capital Region.
  2. Maximizing sampling profit in provinces depending on the work-hours given on that area.
  3. More days in executing the sampling project leads to higher hits and a wider location area for potential high hits.

Recommendations:

  1. Extension of the duration of the project.
  2. Possible considerations of the project outside Metro Manila.
  3. Evaluation of the market, technical, and socio-ecomic feasibility.