Narrative Summary

Evolution of Boba

The boba tea industry is an increasingly popular business, projected to grow from 2.83 billion to 4.78 billion by 7.81% (by 2032) and is becoming more prominent within young people (Fortune Business Insights). Boba provides personalized beverages with various tapioca pearls, making the drink more mainstream as it originated in Taiwan in the 1980s.

Our Constructed Dataset

This dataset consists of different boba shops and their most popular flavors that they sell. We create an adjacency matrix based on if the shop has it listed as their most popular flavor. The edges represent whether the boba shop considers the flavor as its most popular and the nodes are the shops and flavors. Makayla collected the data by going to the boba shop websites and gathering the most popular flavors, then inserting the data into Excel (1 representing a relationship, 0 representing no relationship). This data was collected on June 2, 2025.

Main Objective and Question

Our research question is which boba flavor/ingredient is the most popular/profitable? Our objective was to find out the most popular and most profitable flavor within select boba shops. This gives boba shops who don’t spotlight those flavors some insight so that they can incorporate it into their menu more.

Github To View Our Code (copy & paste in browser: https://github.com/makatalad/Network_Analysis)

Setup Code

Github To View Our Code

Data was collected from various websites of boba shops establishing a relationship between the popular flavor and the shop (download the dataset).

Here are the boba tea shops we used.

ShareTea, Pochi, Happy Lemon, DIY Tea Lab, Tealogy, Ding Tea, Boba Gem, BĂ–BA, CHICHA San Chen

#insert packages
library(igraph)

#read in boba_data csv and create it into a matrix
trial <- read.table("Boba_Data.csv", sep = ",", header = TRUE)

trial <- as.matrix(trial)

#create bi adjacency matrix 
boba <- graph_from_biadjacency_matrix(trial)

# node id and mode membership 
id <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28)
mode <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2) 
nodes <- data.frame(id,mode)

Visualization Choices

When creating the network, we wanted to highlight the degree in node size in order to really get the idea of the flavor that has the most power or is the most popular. For the colors, we chose something that was brighter but also subtle and for the edge color we chose a grey (not too dark not too light). For the labels we used label.cex, label.dist, and label.degree in order to get some of the labels out of the way and improve readability because the default was too cluttered. For the label color, we chose black and used the attribute to control the font to get a more bold look from the text.The layout was layout_with_graphopt because we wanted the nodes to be spread out and not layered on top of each other and a lot of the other options gathered the nodes very close.

Network: Which Flavor Holds the Most Power?

#changes the node size by degree, label sizes, label distance, and label degree 
V(boba)$size <- degree(boba) * 2
V(boba)[1:10]$label.cex <- 0.5
V(boba)[11:19]$label.cex <- 0.6
V(boba)[20:28]$label.cex <- 0.62
V(boba)[1:10]$label.dist <- 0
V(boba)[11:19]$label.dist <- 1
V(boba)[19:28]$label.dist <- 0
V(boba)$vertex.label.degree <- pi 

#our network visualization
set.seed(5)
boba_plot <- plot(boba, 
     vertex.color=ifelse(nodes[V(boba), 2] == 1, "#FFD3B5", "#F9AFAE"),
     vertex.label = V(boba)$name,
     vertex.label.color = "Black",
     vertex.label.family = "Arial Bold",
     frame.color = NA,
     edge.color = "#7E7E7E",
     layout = norm_coords(layout_with_graphopt(boba), xmin = -1, xmax = 1, ymin = -1, ymax = 1),
     main = "Which Flavor Holds the Most Power?",
     sub = "Nodes weighted by degree metric"
     
    
     
    
    )
## Warning in text.default(x, y, labels = labels, col = label.color, family =
## label.family, : font family not found in Windows font database
## Warning in text.default(x, y, labels = labels, col = label.color, family =
## label.family, : font family not found in Windows font database
## Warning in text.default(x, y, labels = labels, col = label.color, family =
## label.family, : font family not found in Windows font database
#legend 
colors <- c("#FFD3B5", "#F9AFAE")
legend("topright", c("Flavors", "Boba Tea Shops"),pch=21,
       col="#777777", pt.bg=colors, pt.cex=1.5, cex=.8, bty="n", ncol=1,
       ) 

Original vs. Abbreviated names

Network Meaning

Our network is a two mode network that represents the relationship between boba tea shops and flavors. Using the degree as a metric of centrality, we displayed the flavors that are the most popular and common at all the shops. This showed that Milk Tea Black Tea, Brown Sugar, Mango, and Strawberry are the most popular/common between the Boba Tea Shops. With Cookies and Cream, Taro Oreo, Grapefruit, and Osmanthus as the least popular/common between the Boba Tea Shops.

Analysis

The network measure used to answer our question of which flavor is the most popular among local Bubble tea shops was a two-mode analysis and centrality measurement with betweenness. We created two matrices, one to store shop data and the other for flavor data. Then graph objects were made to create 3 different plots.

Two-Mode Network Analysis

shops <- trial%*%t(trial)

# shop-by-shop matrix
shopnet <- graph_from_adjacency_matrix(shops, mode="undirected", 
                                          diag = FALSE, weighted = TRUE) # graph object of shop-by-shop network

# flavor-by-flavor matrix showing the number of shops co-supporting each pair of flavors
flavors <- t(trial)%*%(trial) # pre-multiply the transpose by the original matrix to get a flavor preference-by-flavor preference matrix with weighted ties showing the number of shops they share in common

flavornet <- graph_from_adjacency_matrix(flavors, mode = "undirected",
                                         diag = FALSE, weighted = TRUE) # graph object of flavor-by-flavor network
id <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24) # create node ids

# a graph object of the original two-mode data set
trialnet <- graph_from_biadjacency_matrix(trial) # graph object from original two-mode matrix

# visualize all three graph objects and include edge weights when appropriate (use a 1x3 plot space)

id <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24) # create node ids
mode <- c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2) # create mode membership
trial_nodes <- data.frame(id, mode) # create node data frame to differentiate modes

par(mfrow=c(1,3), mar=c(4,2,4,2), mgp=c(2,0.5,0))

plot(trialnet, vertex.label.cex=.6, vertex.label.color="black", vertex.size=8, 
     vertex.color=ifelse(trial_nodes[V(trialnet), 2] == 1, "lightgray", "lightblue"),
     vertex.frame.color="NA", vertex.label.dist=2)

plot(flavornet, vertex.label.cex=.6, vertex.label.color="black", vertex.size=8, 
     vertex.color=ifelse(trial_nodes[V(trialnet), 2] == 1, "lightgray", "lightblue"),
     vertex.frame.color="NA", vertex.label.dist=2)

plot(shopnet, vertex.label.cex=.6, vertex.label.color="black", vertex.size=8, 
               vertex.color=ifelse(trial_nodes[V(trialnet), 2] == 1, "lightgray", "lightblue"),
               vertex.frame.color="NA", vertex.label.dist=2)

# Betweenness scores for flavors
betweenness(shopnet, directed = FALSE)
##       TPMT       MTBL        MTO        MTG      Mango      Straw    Passion 
##   2.511111   1.600000   1.211111   6.144444   1.000000   5.205556   7.177778 
## BrownSugar     Matcha       Taro      TOreo       Oreo    Avocado Cook&Cream 
##   7.488889   3.494444   1.600000   8.911111   9.627778   5.644444   8.911111 
##        Ube Grapefruit      Osman     Winter     Lychee 
##  12.461111   7.733333   1.750000   5.405556   5.405556

Table 1. Displays the centrality of flavors using a betweenness measurement.

Based on Table 1, the ube flavor has the highest betweenness score, meaning that ube seems to be a connecting flavor between the bubble tea shops. In contrast, on the other side of the spectrum is the mango flavor with the lowest betweenness score, meaning that mango wasn’t a very connecting flavor between the bubble tea shops.

Conclusion

As this network targets audiences for bubble tea businesses and those who have a personal interest in bubble tea by enjoying the drink. Based on our centrality analysis, the biggest significance of this network is that readers understand which flavor is the most central and popular amongst bubble tea shops from Everett to Seattle. If a flavor occurs often between bubble tea shops, that means they are good sellers, profitable, and thus a wise choice to incorporate into a menu for new aspiring bubble tea shops. Additional work that could be done on the bubble tea could be changing the variable of flavors to toppings.

Limitations

A few limitations occurred during this analysis regarding the data collection process. One issue being that some bubble tea places do not keep their websites up to date, and another limitation being that it is common for the websites to not showcase all the flavors that were available in the shop menu.