1 About The Project

Globe Telecom is considering hiring a marketing agency to conduct a door-to-door sampling caravan project to give out free prepaid SIM cards to households across the Philippines. The objective of this project is to increase Globe’s subscriber base while also generating a net profit. The hope is that the free SIM card samples will eventually be activated in the future, resulting in revenue from the activated SIMs.

The marketing agency will be responsible for recruiting and training a team of samplers who will visit households in designated areas. The samplers will be equipped with free SIM cards and other promotional materials, and they will be tasked with explaining the benefits of subscribing to Globe’s services.

The door-to-door sampling caravan project is expected to reach millions of households across the Philippines. This will provide Globe with a unique opportunity to introduce its services to a large number of potential customers. If the project is successful, it could result in a significant increase in Globe’s subscriber base and revenue.

2 Definition of Terms

  • Possible Hit (PHR): A household where the door is opened and the sampler is able to start a conversation with a household member about the free SIM card offer. The possible hit rate is estimated to be at 90%.
  • Hit: A household where the sampler successfully provides a SIM card to a household member and the household member agrees to activate the SIM card.
  • Success Rate (SR): The percentage of households that answer the door and agree to accept the free SIM card. The marketing team estimates that the success rate is 75%.
  • Conversion Rate (CR): The percentage of distributed SIM cards that are activated. Typically, it is estimated to be around 25%.

3 Sampling Strategy

3.1 Sampling Operation

A daily sampling operation will be conducted by 10 samplers, each visiting a specific barangay. One barangay will be visited per day, with no exceptions. The working hours are from 8 am to 5 pm, totaling 8 hours. The assembly point is at TGT, BGC. The purpose of the sampling operation is to collect data from residents of different barangays in the Philippines. The data collected will be used to develop programs and services that will better meet the needs of residents. The sampling operation will be evaluated based on the following criteria: the number of hits and the success rate.

3.2 Sampling Process

The interaction with a household during a door-to-door sampling operation consists of three parts: waiting to answer the door, spiel talk time, and signing time.

During waiting to answer the door, the sampler patiently waits for a household member to respond. Once the door opens, the sampler greets the household member in a friendly and professional manner and explains the purpose of the visit.

During spiel talk time, the sampler delivers a brief presentation about the free SIM card offer. The sampler answers any questions that the household member may have and builds rapport with the household member to create a positive experience.

During signing time, the sampler collects the household member’s contact information and signature. The sampler then provides the household member with the free SIM card and thanks the household member for their time and participation.

It is important to note that only one SIM card is provided per household. This is to ensure that the free SIM card offer is fair and equitable to all households.

3.3 Proposed Contract

The agency is proposing a Php 5,000,000 contract with Globe for 100 days of sampling. The contract includes the following:

  • The salaries of ten samplers, who will be responsible for visiting households and distributing free SIM cards.
  • The use of the company’s van, which will be used to transport the samplers to and from their assigned areas.
  • The services of a driver, who will be responsible for driving the van and assisting the samplers.

The cost of the SIM cards (Php 40 each) is not included in the proposed contract. The agency estimates that the cost of the SIM cards will be approximately Php 400,000.

The agency is confident that the proposed contract is a fair and equitable offer. The contract includes all of the necessary costs associated with the sampling operation, and it is priced competitively. The agency is confident that Globe will be satisfied with the services provided under the contract.

4 Methodology

4.1 Top 100 Barangays

In order to maximize the number of hits per day, it is essential to identify first which barangays are to be visited. In this approach, we have established two specific criteria: barangays with the least travel time from TGT and barangays with the smallest barangay areas

First, identifying barangays with the least travel time from TGT implies selecting only those that are geographically close. Minimizing the time of travel means more opportunities to increase the number of house visits and hits.

Second, identifying barangays with smallest barangay areas optimize the time to cover the entire territorial extent effectively, thereby increasing the likelihood of intended hits.

To execute these criteria, begin by calculating the distance between the reference location and each individual barangay. This calculation can be performed using the distGeo command, which estimates the shortest distance between the latitude and longitude coordinates of two given locations.

library("dplyr")
library("geosphere")
read_h2h <- function(n = 100) {
        h2h <- read.csv('h2h.csv')
        
        latTGT <- 14.5535
        longTGT <- 121.0499
        min_speed <- 30 # km/hr
        max_speed <- 40 # km/hr
        ave_speed <- (min_speed + max_speed) / 2
        min_traffic <- 10 / 60 # hr
        max_traffic <- 30 / 60 # hr
        ave_traffic <- (min_traffic + max_traffic) / 2
        
        # Calculate the distance from TGT to different barangays
        calculate_distance <- function(lat, long, latTGT, longTGT) {
                coord1 <- cbind(long, lat)
                coord2 <- cbind(rep(longTGT, length(lat)), rep(latTGT, length(long)))
                dist <- distGeo(coord1, coord2)
                return(dist)
        }
        
        lat <- h2h$lat
        long <- h2h$long
        
        h2h <- h2h %>% mutate(
                distance = calculate_distance(lat, long, latTGT, longTGT) / 1000,
                travel_time = (distance / ave_speed) + ave_traffic, 
                density = NHouseholds/AreaBarangay) %>% 
                arrange(distance, desc(NHouseholds), AreaBarangay)
        
        h2h$total_hours <- 7.5 - h2h$travel_time*2
        
        return(head(h2h,n))
}

Second, the effective distance of each barangay should be computed in order to identify the barangays with the smallest geographical area.

Once these information have been calculated, we can proceed to determine the number of hits.

4.2 Number of Hits

The number of hits depends on three factors: the time spent per household, the travel time between households, and the number of potential hits.

To determine the time per household, we consider three scenarios:

  1. If the door is not opened, it implies that the household did not answer, and we assume an interaction time of 0.5 minutes.
  2. If the door is opened and the simulation is accepted, we assume an interaction time of 1.5 minutes.
  3. If the door is opened but the simulation is not accepted, we assume an interaction time of 1 minute.

\[ \frac{{(((1-\text{PHR}) \times 0.5 \text{ mins}) + ((\text{PHR} \times \text{SR}) \times 1.5 \text{ mins}) + ((\text{PHR} \times (1-PHR)) \times 1 \text{min}))}}{{60 \text{ mins}}} \]

To calculate the travel time between households, we take the square root of the barangay’s area divided by the number of households, and then divide it by the assumed walking speed of the sampler, which is 4 km/hr.

\[ \frac{{\sqrt{{\text{BarangayArea} / \text{Number of Households}}}}}{{\text{Walking Speed of Sampler}}} \]

The number of possible hits can be determined by multiplying the number of samples by the difference between the working hours and twice the travel time. This result is then divided by the sum of the time spent per household and the travel time between each household.

\[ \frac{{\text{Number of Samples} \times (\text{Working Hours} - (\text{Travel Time} \times 2))}}{{\text{Time Spent per Household} + \text{Travel Time between Households}}} \]

Lastly, the number of hits can be estimated by multiplying the possible hit rate by the number of possible hits.

\[ Hits = PHR \times \text{Working Hours} \]

The process of determining the number of hits were done using the coverage_percentage function. This function facilitates the computation of the effective distance of each barangays and determines the average time spent per household. Additionally, it calculates the travel time between households and estimates the number of households that can be visited within the time constraints . Lastly, it calculates possible hits and hits based on percentages, computes the cost, and updates the BaranggayIndex object with the calculated values.

coverage_percentage <- function(BaranggayIndex) {
        num_persons <- 10
        walking_speed <- 4  # constant, in km/hr
        
        # Calculate effective distance
        effective_distance <- function(AreaBarangay, nhouseholds) {
                return(sqrt(AreaBarangay / nhouseholds))
        }
        
        # Calculate average time spent per household
        time_per_household <- function(reject_rate = 0.1, hit_rate = 0.9, conversion_rate = 0.75) {
                total_time <- ((reject_rate) * 0.5 + (hit_rate*conversion_rate) * 1.5 + (hit_rate*(1-conversion_rate)) * 1) / 60
                return(total_time)
                
        }
        
        # Calculate travel time between households
        transfer_per_household <- function(effective_distance) {
                return((effective_distance/walking_speed))  # hours
        }
        
        # Estimate the number of households that can be visited
        estimate_households <- function(total_hrs, transfer_per_household, time_per_household, nhouseholds) {
                
                household_count <- num_persons*(total_hrs / ((time_per_household) + (transfer_per_household)))
                return(trunc(household_count))

        }
        
        # Access elements in the index
        total_hrs <- BaranggayIndex$total_hours
        area <- BaranggayIndex$AreaBarangay
        nhouseholds <- BaranggayIndex$NHouseholds
        
        # Call the functions
        effective_distance <- effective_distance(area, nhouseholds)
        time_per_household <- time_per_household()
        calculate_travel_time <- transfer_per_household(effective_distance)
        estimated_households_visited <- estimate_households(total_hrs, calculate_travel_time, time_per_household, nhouseholds)*num_persons
        
        HouseholdVisited <- ifelse(estimated_households_visited > nhouseholds, nhouseholds,
                                   ifelse(estimated_households_visited < 0, 0, estimated_households_visited))
        
        PossibleHits <- trunc(0.9*HouseholdVisited)
        Hits <- trunc(PossibleHits * 0.75)
        
        Cost <- Hits * 40
        
        BaranggayIndex <- BaranggayIndex %>% #select(-distance) %>%
                #select(-total_hours) %>% select(-travel_time) %>%
                mutate(HouseholdVisited, PossibleHits, Cost, calculate_travel_time,
                       Hits) 
        
        return(BaranggayIndex)
        
}

n <- 100
h2h_top100 <- read_h2h(n)
with_household <- coverage_percentage(h2h_top100)

hits = with_household$Hits
total_hits <- sum(hits)

4.3 Return On Investment

To calculate the return on investment (ROI), you first need to calculate the total sim cost. This is done by multiplying the number of hits by the cost of sim which is Php 40.

\[ Total Sim Cost = \sum_n (Hits * 40) \]

Then, you need to add the investment cost of Php 5,000,000 to the total simulation cost to get the total investment cost. \[ Total Investment Cost = Total Sim Cost+5,000,000 \]

Next, you need to calculate the revenue by multiplying the conversion rate by the number of hits and the income per conversion which is Php 200. \[ Revenue = CR * Hits * Income \]

Finally, you can calculate the ROI by subtracting the total investment cost from the revenue and dividing by the total investment cost. \[ ROI = Revenue - Total Investment Cost \]

5 Results & Discussion

5.1 Sampler Utilization

The work breakdown shows that most of the samplers’ time is spent interacting with potential hits (72.88%). This is followed by travel time (6.44%) and walking time (20.68%). This means that the samplers spend more time interacting with potential hits than traveling to and from the barangays.

The lower the distance between the barangays, the lesser the travel time. Similarly, the smaller the barangay area, the lesser the walking time. This implies that the samplers have more productive time interacting with potential hits if the distance between the barangays is shorter and the barangay area is smaller.

Overall, the work breakdown shows that the samplers are efficient in their work and are able to maximize their time interacting with potential hits. They are able to do this by planning their routes efficiently and by targeting barangays that are closer and have smaller areas.


n <- 100
h2h_top100 <- read_h2h(n)
with_household <- coverage_percentage(h2h_top100)

remaining_time <- 8 - 1 - 0.5
percent_travel = ((mean(with_household$travel_time,na.rm=TRUE)/remaining_time))*100
percent_walk = ((mean(with_household$calculate_travel_time*(with_household$HouseholdVisited/10)))/remaining_time)*100
percent_hit = 100-percent_travel-percent_walk

#PLOT THE PIE CHART
data <- data.frame(
    Category = c("Travel Time", "Walking Time", "Hit Interaction"),
    Percentage = c(percent_travel, percent_walk, percent_hit)
)

color_palette <- c( "#edf7fc","#b9e2f5", "#50b8e7")
pie(data$Percentage, labels = paste0(data$Category, " (", round(data$Percentage, 2), "%)") , col = color_palette,main='Worktime Breakdown in 100 Barangays', font.main=1)

5.2 Breakdown of Hits

The top 100 barangays were identified based on the number of hits they are expected to generate. A total of 281,976 hits were computed. This is consistent with findings from other studies, which have shown that hit rates are typically higher in urban areas and in areas with a high population density. The high hit rate in the top 100 barangays is likely due to a combination of factors, including the fact that these barangays are more likely to be home to people who are interested in the product or service being offered.

library("ggplot2")
n <- 100
h2h_top100 <- read_h2h(n)
with_household <- coverage_percentage(h2h_top100)


sorted_with_household <- with_household %>% arrange(desc(Hits))
top20 <- sorted_with_household[1:20, ]  # Select the top 20 rows

# Sort the data by Hits in descending order
top20 <- top20[order(top20$Hits, decreasing = TRUE), ]

#Create a custom theme
my_theme <- theme_minimal() +
        theme(axis.text.x = element_text(angle = 90, hjust = 1),
              plot.title = element_text(size = 16),
              axis.title = element_text(size = 12),
              axis.text = element_text(size = 10))

ggplot(top20, aes(x = reorder(Barangay, -Hits), y = Hits)) +
        geom_bar(stat = "identity", fill = "skyblue") +
        labs(title = "Top 20 Hits by Barangay",
             x = "Barangay",
             y = "Hits") +
        my_theme

The top 100 barangays with the most hits were all located in Metro Manila, with the majority of them coming from Makati, Mandaluyong, Taguig, and Pasig. This suggests that these barangays are more likely to be receptive to new products and services, as they are home to a large population of young adults and professionals. Additionally, these barangays are well-connected to public transportation, making it easy for people to get around and visit different businesses.

library(scales)
library(dplyr)
n <- 100
h2h_top100 <- read_h2h(n)
with_household <- coverage_percentage(h2h_top100)

by_city <- with_household %>%
        group_by(ProvinceCity) %>%
        summarize(Sum = sum(Hits)) %>%
        arrange(desc(Sum))

ggplot(by_city, aes(x = reorder(ProvinceCity, -Sum), y = Sum)) +
        geom_bar(stat = "identity", fill = "skyblue") +
        labs(title = "Total Hits by City",
             x = "City",
             y = "Total Hits") +
        scale_y_continuous(labels = comma) +
        geom_text(aes(label = comma(Sum), vjust = -0.5), color = "black", size = 3.5) +
        theme_minimal() +
        theme(axis.text.x = element_text(angle = 90, hjust = 1),
              plot.title = element_text(hjust = 0.5)) +
        ylim(0, 105000)

Most of the hits are from large barangays, due to the concentration of potential customers. This is because large barangays tend to have a larger population and a higher population density, which means that there are more potential customers per household. Additionally, large barangays are more likely to be home to businesses and other institutions that attract people from other barangays.

library(dplyr)
library(scales)
by_size <- with_household %>%
        group_by(Size) %>%
        summarize(Sum = sum(Hits)) %>%
        arrange(desc(Sum))

ggplot(by_size, aes(x = reorder(Size, -Sum), y = Sum)) +
        geom_bar(stat = "identity", fill = "skyblue") +
        labs(title = "Total Hits by Size",
             x = "Size",
             y = "Total Hits") +
        scale_y_continuous(labels = comma) +
        geom_text(aes(label = comma(Sum), vjust = -0.5), color = "black", size = 3.5) +
        theme_minimal() +
        theme(axis.text.x = element_text(angle = 90, hjust = 1),
              plot.title = element_text(hjust = 0.5)) +
        ylim(0, 300000)

5.3 Sensitivity Analysis

Higher conversion and acceptance rates mean that more SIM cards are activated and that more households are accepting the SIM cards. This leads to higher revenue and hits, as the company is able to sell more SIM cards and collect more data from the households that have accepted the SIM cards.

Overall, higher conversion and acceptance rates are beneficial for the company, as they lead to higher revenue and hits. The company should focus on improving these rates by training the samplers, improving the quality of the SIM cards, and developing effective marketing materials.

library(ggplot2)
library(viridis)

# Input data
Success_Rate <- c(0.75, 0.80, 0.90, 0.50, 0.50, 0.25)
Conversion_Rate <- c(0.25, 0.50, 0.70, 0.90, 0.60, 0.30)
Acceptance_Rate <- c(0.1875, 0.4, 0.63, 0.45, 0.3, 0.075)

# Compute values
number_of_hits <- Success_Rate * Conversion_Rate * Acceptance_Rate
Sim_Activated <- number_of_hits * Conversion_Rate
Revenue <- Sim_Activated * 200

# Create dataframe
df <- data.frame(Success_Rate, Conversion_Rate, Acceptance_Rate, number_of_hits, Sim_Activated, Revenue)
# Convert number_of_hits to continuous scale
df$number_of_hits <- as.numeric(df$number_of_hits)

# Bubble Plot
ggplot(df, aes(y = Conversion_Rate, x = Acceptance_Rate, size = number_of_hits, color = Revenue)) +
    geom_point(alpha = 0.7) +
    labs(x = "Acceptance Rate", y = "Conversion Rate", title = "Sensitivity Analysis of Conversion Rate and Acceptance Rate") +
    scale_fill_viridis(discrete=TRUE, guide=FALSE, option="A") 

5.4 Return On Investment

The negative ROI in the first year is not necessarily a bad sign. In fact, it is expected in many cases, especially for businesses that are acquiring new customers. The cost of acquiring customers is often high, and it can take some time for a business to see a return on its investment. However, if Globe Telecom is able to retain a significant number of its customers, the ROI will eventually become positive.

In this case, we assumed that 50% of the hits will still be using their sim cards in the second year. This is a reasonable assumption, given that the sim cards are free to use. If Globe Telecom is able to retain even half of its customers, the ROI will be positive.

Therefore, the investment in the agency’s contract is worth it. The agency will eventually recoup its costs and make a profit.

Year 1 Year 2
Revenue Php 14,091,400 Php 7,045,700
Cost of Sim Cards Php 11,279,040 Php 0
Contract Php 5,000,000 Php 0
ROI Php -2,187,640 Php 4,858,060

6 Conclusion

In conclusion, the door-to-door sampling caravan project is a viable option for Globe Telecom to increase its subscriber base and revenue. The project has the potential to generate a significant return on investment, as long as the company is able to improve the conversion and acceptance rates.

The project is also relatively cost-effective, as the agency is proposing a Php 5,000,000 contract for 100 days of sampling. This includes the salaries of ten samplers, the use of a company van, and the services of a driver. The cost of the SIM cards (Php 40 each) is not included in the proposed contract, but the agency estimates that the cost will be approximately Php 400,000.

The work breakdown shows that most of the samplers’ time is spent interacting with potential hits (72.88%), followed by travel time (6.44%) and walking time (20.68%). This means that the samplers are efficient in their work and are able to maximize their time interacting with potential hits. They are able to do this by planning their routes efficiently and by targeting barangays that are closer and have smaller areas.

The top 100 barangays with the most hits were all located in Metro Manila, with the majority of them coming from Makati, Mandaluyong, Taguig, and Pasig. This suggests that these barangays are more likely to be receptive to new products and services, as they are home to a large population of young adults and professionals. Additionally, these barangays are well-connected to public transportation, making it easy for people to get around and visit different businesses.

Higher conversion and acceptance rates mean that more SIM cards are activated and that more households are accepting the SIM cards. This leads to higher revenue and hits, as the company is able to sell more SIM cards and collect more data from the households that have accepted the SIM cards. The company should focus on improving these rates by training the samplers, improving the quality of the SIM cards, and developing effective marketing materials.

In terms of ROI, the project is expected to generate a positive return in the second year, assuming that 50% of the hits are still using their sim. This means that the investment is worth it because the agency will eventually recoup its costs and make a profit.

Overall, the door-to-door sampling caravan project is a promising opportunity for Globe Telecom to grow its business. The company should carefully consider the factors discussed in this report before making a decision about whether or not to move forward with the project.

7 Recommendation

7.1 Increase Number of Barangays

Expanding the outreach from 100 to 464 barangays is expected to result in a significant increase in hits, from 281,976 to 515,202. This represents an 82% increase, which is a significant boost in potential customers.

There are a few factors that contribute to this increase. First, expanding the outreach to more barangays means that more potential customers will be exposed to the product or service being offered. Second, the larger number of barangays means that there is a greater likelihood that the product or service will be a good fit for the needs of the local population. Third, the increased outreach will allow the business to reach a wider audience and generate more leads.

However, it is important to note that there are also some risks associated with expanding the outreach. For example, the business may need to hire more staff to handle the increased workload. Additionally, the business may need to invest in more marketing materials to reach the new customers.

Overall, the decision of whether or not to expand the outreach is a strategic one that should be made after careful consideration of the risks and rewards. However, the potential benefits of expanding the outreach are significant, and the business could see a significant increase in hits if it is done effectively.

library(scales)
library(dplyr)
x <- 464
h2h_top100_x <- read_h2h(x)
with_household_x <- coverage_percentage(h2h_top100_x)

by_city_x <- with_household_x %>%
        group_by(ProvinceCity) %>%
        summarize(Sum = sum(Hits)) %>%
        arrange(desc(Sum))

ggplot(by_city_x, aes(x = reorder(ProvinceCity, -Sum), y = Sum)) +
        geom_bar(stat = "identity", fill = "skyblue") +
        labs(title = "Total Hits by City",
             x = "City",
             y = "Total Hits") +
        scale_y_continuous(labels = comma) +
        geom_text(aes(label = comma(Sum), vjust = -0.5), color = "black", size = 3.5) +
        theme_minimal() +
        theme(axis.text.x = element_text(angle = 90, hjust = 1),
              plot.title = element_text(hjust = 0.5)) +
        ylim(0, 105000)

7.2 Target Demographics

The data shows that when samplers visit barangays that are farther from TGT (more than 2 hours), the time spent interacting with potential hits decreases to 28.33%, which is significantly lower than the 72.88% reported in the X.X Sampler Utilization figure. This means that the samplers are less efficient in these areas, as they are spending more time traveling and walking. The higher percentage of time spent on travel and walking suggests that the samplers are not able to plan their routes efficiently in these areas.


n <- 10000
h2h_top100 <- read_h2h(n)
with_household <- coverage_percentage(h2h_top100)

adjusted_subset <-subset(with_household,with_household$travel_time>2 & with_household$AreaBarangay>10 )
adjusted_subset<-head(adjusted_subset,100)
percent_travel = ((mean(adjusted_subset$travel_time,na.rm = TRUE)/remaining_time))*100
percent_walk = ((mean(adjusted_subset$calculate_travel_time*(adjusted_subset$HouseholdVisited/10)))/remaining_time)*100
percent_hit = 100-percent_travel-percent_walk

#PLOT THE PIE CHART
data <- data.frame(
    Category = c("Travel Time", "Walking Time", "Hit Interaction"),
    Percentage = c(percent_travel, percent_walk, percent_hit)
)

color_palette <- c( "#edf7fc","#b9e2f5", "#50b8e7")
pie(data$Percentage, labels = paste0(data$Category, " (", round(data$Percentage, 2), "%)") , col = color_palette,main='Worktime Breakdown in 100 Barangays with > 2 Hour Travel Time', font.main=1)

When samplers visit barangays in Metro Manila, they are able to cover 98% of the households. This is significantly higher than the 53% coverage rate when visiting barangays outside Metro Manila. This is because Metro Manila has a higher density of households and a well-developed transportation infrastructure. This means that samplers can cover a larger number of households in a shorter amount of time, and they are more likely to be successful in their interactions with potential hits.


library(ggplot2)
y <- 23000
h2h_top100_y <- read_h2h(y)
with_household_y <- coverage_percentage(h2h_top100_y)
exclude<-c('CITY OF MANILA',"CALOOCAN CITY", "CITY OF MAKATI", "CITY OF MALABON",
           "CITY OF MANDALUYONG", "CITY OF MARIKINA","CITY OF MUNTINLUPA", 
           "CITY OF NAVOTAS","PASAY CITY","CITY OF PASIG","PATEROS","QUEZON CITY",
           "CITY OF SAN JUAN", "TAGUIG CITY","CITY OF VALENZUELA")

metro_manila <- subset(with_household_y, with_household_y$ProvinceCity %in% c('CITY OF MANILA',"CALOOCAN CITY", "CITY OF MAKATI", "CITY OF MALABON",
                                                                   "CITY OF MANDALUYONG", "CITY OF MARIKINA","CITY OF MUNTINLUPA", 
                                                                   "CITY OF NAVOTAS","PASAY CITY","CITY OF PASIG","PATEROS","QUEZON CITY",
                                                                   "CITY OF SAN JUAN", "TAGUIG CITY","CITY OF VALENZUELA"))


outside_metro_manila <- subset(with_household_y, !(with_household_y$ProvinceCity %in% c('CITY OF MANILA',"CALOOCAN CITY", "CITY OF MAKATI", "CITY OF MALABON",
                                                                                          "CITY OF MANDALUYONG", "CITY OF MARIKINA","CITY OF MUNTINLUPA",                                                                                       
                                                                                    "CITY OF NAVOTAS","PASAY CITY","CITY OF PASIG","PATEROS","QUEZON CITY",
                                                                                          "CITY OF SAN JUAN", "TAGUIG CITY","CITY OF VALENZUELA")) & with_household_y$Island == 'Luzon')
mm_percent<-(mean(metro_manila$HouseholdVisited, na.rm=TRUE)/mean(metro_manila$NHouseholds,na.rm=TRUE))*100
mm_percentdiff<-100-mm_percent

omm_percent<-(mean(outside_metro_manila$HouseholdVisited, na.rm=TRUE)/mean(outside_metro_manila$NHouseholds,na.rm=TRUE))*100
omm_percent_diff<-100-omm_percent

# PLOT THE DATA
data <- data.frame(
    Category = c("Metro Manila", "Outside Metro Manila"),
    Percentage = round(c(mm_percent, omm_percent))
)


ggplot(data, aes(x = Category, y = Percentage, fill = Category)) +
    geom_bar(stat = "identity", width=0.5) +
    scale_fill_manual(values = color_palette) +
    labs(x = " ", y = "Percentage",
         title = "Percentage of Households") +
    geom_text(aes(label = paste0(Percentage, "%")), vjust = -0.5, color = "black", size = 4) +
    theme(legend.position = "none",
          panel.background = element_blank(),
          panel.grid = element_blank(),
          plot.title = element_text(size = 16, hjust = 0.5),
          axis.title = element_text(size = 14), 
          axis.text = element_text(size = 10) 
    )

In addition, the top barangays with the highest density are all located in Metro Manila. This suggests that there is a high potential demand for SIM cards in these areas. As a result, we recommend that samplers focus their efforts on visiting barangays in Metro Manila. This will allow them to cover a larger number of households and to be more efficient in their work.

library(ggplot2)

t <- 100
h2h_top100 <- read_h2h(t)
with_household_t <- coverage_percentage(h2h_top100)
sorted_with_household <- with_household_t %>% arrange(desc(density))
top20 <- sorted_with_household[1:30, ]  # Select the top 30 rows

# Sort the data by Hits in descending order
top20 <- top20[order(top20$Hits, decreasing = TRUE), ]

# Create a custom theme
my_theme <- theme_minimal() +
    theme(axis.text.x = element_text(angle = 90, hjust = 1),
          plot.title = element_text(size = 16),
          axis.title = element_text(size = 12),
          axis.text = element_text(size = 10))

ggplot(top20, aes(x = reorder(Barangay, -Hits), y = Hits)) +
    geom_bar(stat = "identity", fill = "skyblue") +
    labs(title = "Top 30 Barangay Density",
         x = "Barangay",
         y = "Density") +
    coord_flip()