Introduction

I spend a lot of time on the internet, and TikTok is one of my favorite platforms. I mostly use it to watch short-form videos, but I also post my own clips. As a Media Studies student, I am curious about the system behind the screen. TikTok tells users that “anyone can be a creator,” and the app makes posting extremely easy. That promise leads to a simple question: who actually gets attention on the platform, and how do engagement patterns and geography shape where that attention concentrates?

When we open the app, it is easy to equate attention with fame. Follower counts appear under every username, and the numbers are often enormous. But audience size alone does not capture how audiences behave. Brands and platforms care about whether users like, comment on, share, or save a creator’s videos, because these signals reflect active engagement rather than passive viewing. In this project, I argue that attention on TikTok is uneven because it is shaped by two forces: engagement (how audiences respond per follower) and geography (where creators are based and how visibility varies across regions). Many small and mid-sized creators generate more engagement per follower than the largest accounts, and engagement patterns differ across countries and continents.

This is why influencer marketing emphasizes engagement rate. In this dataset, the average engagement rate for each creator is computed as:

\[ \text{Engagement Rate} = \frac{\text{likes} + \text{comments} + \text{shares} + \text{saves}} {\text{followers}} \times 100 . \]

This formula measures how many interactions a creator generates per follower. Some industry definitions use views in the denominator for TikTok, but I focus on followers because the rest of my analysis is organized around audience size (Brandwatch, 2023).

To study these patterns, I use a dataset of 1,000 TikTok accounts. I scraped these profiles from the platform and cleaned the data before analysis. The table below shows the main quantitative and categorical variables used in this project:

Variable Type Description
followers Quantitative The number of followers the TikTok user has
likes Quantitative The total number of likes received by the TikTok user
videos_count Quantitative The total number of videos uploaded by the TikTok user
awg_engagement_rate Quantitative The average engagement rate for the user’s content
comment_engagement_rate Quantitative The engagement rate specifically related to comments on the user’s content
like_engagement_rate Quantitative The engagement rate specifically related to likes on the user’s content
region Categorical The region associated with the TikTok user
predicted_lang Categorical The predicted language of the user’s content
is_verified Categorical Whether the TikTok account is verified
is_private Categorical Whether the account is set to private

Throughout the project, I explore three main questions:

Who are the creators?

Visualization 1: Follower Distribution

I start by examining the dataset to make sure I understand it before making broader claims about TikTok. The histogram shows how many creators are in each follower range. The height of each bar represents the number of accounts in that group. Using a log-scaled x-axis lets tiny and large accounts coexist on the same readable axis. With this scale, it is easy to see that most accounts are in the low-to-mid follower range, with the tallest bars around 1,000 followers. The distribution then stretches out into a long upper tail toward the largest accounts in the dataset. This pattern tells me the dataset is dominated by everyday creators rather than celebrity-scale accounts.

This histogram sets the stage by showing that most of the accounts in the sample are small. Since nearly all accounts are on the left side of the plot, any engagement trends I find later will mostly reflect these smaller creators unless I look at mid-sized and large accounts separately. This visualization helps answer the question of who is present, so the rest of the project can focus on who is actually being heard and show that follower count by itself does not fully measure attention.

Visualization 2: Share of Creators by Follower Tier

This donut chart sorts follower counts into three groups to make comparisons easier. Each part of the chart shows the percentage of creators in each tier, providing a quick overview of the sample. I use percentages, so it is easy to spot any imbalances. Marketing usually divides influencers into five levels, including nano, micro, mid-tier, macro, and mega (InfluenceLogic, 2023). However, most accounts in my dataset have fewer than 1 million followers. So, I use three tiers that match my data and focus: micro-influencers (under 10,000), mid-tier influencers (10,000 to 100,000), and larger influencers (100,000 to 1 million).

This chart shows how uneven the creator population is. Out of 1,000 accounts, 857 are micro-influencers (85.7%), 102 are mid-tier (10.2%), and 41 are larger influencers (4.1%). The large micro-influencer section makes it clear that most creators are small. By showing that a “typical creator” is really a micro-influencer, the visualization highlights a key question: if almost everyone is small, does most attention go to them, or do the few large accounts from Visualization 1 get most of it?

Since follower counts dominate how TikTok presents influence, the next step is to test whether audience size actually predicts engagement per follower.

Do more followers mean more engagement?

Visualization 3: Followers vs Average Engagement Rate

This scatterplot moves from describing the sample to showing performance patterns. Each point represents an account. The x-axis shows follower count on a log scale, and the y-axis shows average engagement rate. I use the log scale to keep both small and large creators visible, so the biggest accounts do not overshadow the rest. The plot reveals that micro-influencers have the widest range and reach the highest engagement rates. There is even an outlier above 80%, likely from a small account with just a few very engaged interactions. At the same time, the vast majority of accounts across all tiers cluster near the bottom, with mid-tier creators occupying a narrower band and larger creators clustering closest to zero. Since the scatterplot shows the full range of engagement rates, including extreme outliers, the y-axis extends up to ~80%. In Visualization 4, I zoom in to the typical range so the distributions and medians are easier to compare.

This visualization shows the first clear sign of “engagement dilution.” Follower count alone does not strongly predict audience engagement. Instead of a simple pattern in which more followers lead to more engagement, the scatterplot suggests a trade-off. As audiences grow, creators reach more people but get fewer responses per follower. This matches my own experience on TikTok, where the largest accounts often feel less conversational than smaller ones.

Visualization 4: Engagement Distributions Across Follower Tiers

To get a clearer picture beyond individual data points, I use a density plot to show how engagement is distributed within each tier. Each curve highlights where engagement values are most common for that group. Since the average engagement rate in this dataset is extremely low, I keep the values on a comparable scale and zoom the x-axis to the 85th percentile of the data so the density curves and median lines remain visually distinguishable. Each curve is colored by follower tier. This means the plot focuses on the “typical” range of engagement values, and a small number of extreme outliers are not shown in the plotted x-range. Dashed vertical lines mark the median engagement rate for each group. The semi-transparent fills help compare where the curves overlap and where they stand apart.

This visualization shows that follower count does not always reflect real attention. The micro-influencer curve is wider and more right-shifted than the other groups, so smaller creators are more likely to have moderate or high engagement and often fall into the higher-engagement range. The mid-tier curve extends to the right, but less than the micro-influencer curve. The large-tier curve stays close to the very low engagement area, with few accounts reaching even moderate response levels. Along with the scatterplot, this density plot shows that larger creators often have a wide reach but receive fewer responses per follower, while smaller creators tend to have stronger engagement. This supports the project’s main idea that there is a trade-off between reach and responsiveness.

Visualizations 3 and 4 show that as audiences get bigger, engagement per follower often drops. But these visuals do not show what creators are doing differently. Therefore, I shift towards how creators post and how audiences react.

How creator behavior relates to engagement?

Visualization 5: Posting Volume × Engagement Style

Open the Engagement Drivers Explorer Shiny app: https://kaonichiwa.shinyapps.io/project_shiny3/

The Shiny app shows that follower count does not always reflect real influence by comparing how posting habits and engagement styles relate to performance. The main screen uses a heatmap to display posting volume based on video count. Engagement style—like-led, balanced, or comment-led—is defined by the gap between like and comment engagement rates. A creator is classified as like-led if their like engagement rate is at least 30% higher than their comment engagement rate, comment-led if the reverse is true, and balanced otherwise. While this is not industry-standard labeling, I use it as it offers a clear, interpretable way to capture how audiences respond, primarily through likes versus comments, without requiring a complex model.

Each heatmap cell shows the median overall engagement rate for that segment. When a user clicks a cell, the right panel updates with a segment summary, including the number of creators, median followers, median videos, like engagement rate, comment engagement rate, and overall average engagement rate. The bottom of the app then shows which account types fall into that segment: clicking a segment updates the comparison chart (privacy status, verification status, and predicted language) and refreshes the creator table accordingly. The “Overall ER” is displayed as a proportion, so values like 0.34 correspond to about 34% engagement, and values can exceed 1.0 for very small accounts.

In the bigger picture, the app shows that engagement depends on how creators behave and interact. For instance, in like-led segments, creators who post less have much higher median engagement. The average engagement rate is 0.3431 at like-led and low posting volume. However, engagement drops sharply as the number of posts increases. It goes down to 0.0165 at like-led and high posting volume, even though the median number of followers rises from 128 to 4,203. This pattern supports my main point that attention is uneven and not confined to the biggest accounts.

After recognizing that engagement depends on behavior and interaction style, the next step is to ask if these high-engagement patterns are spread out evenly. To explore this, I move from looking at individual creators to focusing on geography, examining where creators are based and which regions stand out.

Where in the world is attention concentrated?

Visualization 6: Creators by Country

After looking at how individual creators perform, I now focus on where they are based. I use a treemap to show the number of creators from each country. Each rectangle represents a country, with its size indicating the number of creators from that country relative to others. To keep the visualization readable, I include only countries with at least 5 creators in the dataset. By hovering over a tile, one can see the country name and the exact number of creators. For instance, the United States has one of the largest rectangles, indicating more creators than other countries, while countries like Saudi Arabia, Indonesia, and Viet Nam have mid-sized rectangles. Many other countries appear as smaller rectangles, each representing just a few accounts compared to the largest countries.

This view shows that the creator ecosystem is uneven across countries, even before engagement. A few countries host much of the creator base, while many have only a few accounts. This imbalance matters because attention depends on who is present. If most micro- and mid-tier influencers cluster in certain regions, findings about who gets noticed reflect this geography. This visualization highlights that geography is central to understanding attention, not just a minor detail.

Visualization 7: Top Countries by Median Engagement Rate per Follower

To find the countries with the highest engagement per follower, I use a lollipop chart. Here, “median engagement rate” means I take each creator’s average engagement rate and use the country-level median to reduce the influence of outliers. It ranks countries by median engagement rate per follower. I include only countries with at least 8 creators to avoid skewed results. The dot marks each country’s median engagement rate, and the horizontal line shows its distance from zero. Each point is labeled with n to indicate the data size behind each ranking. Creator counts for each country are shown in the hover to indicate the sample size behind each ranking.

This chart shows that a country with many creators does not necessarily have the highest engagement. For example, the United States has the most creators in the sample, with 159 creators, but its median engagement rate is not among the top. Some European countries, like Germany and France, report higher median engagement per follower. Several of the top 15 countries have much smaller samples, so their high median engagement rates may be less reliable and more influenced by outliers or sample composition. It is also important to note that the median engagement rates shown here are all very small, all below about 0.5%, so differences should be interpreted as relative rather than large absolute gaps. Overall, the chart shows attention on TikTok is unevenly distributed, varying by follower size, creator behavior, and country. High engagement occurs in places that are neither the largest nor the most recognized creator hubs.

Visualization 8: Creators by Continent across Privacy & Verification

I use two animated bar charts to show how visible the creators are. Each creator is assigned to a continent. North and South America are combined as “Americas” to match the map in the next visualization. The charts display the number of accounts by continent and switch between private and public, and between verified and unverified. The x-axis shows continents, and the y-axis shows the number of creators. The nature of animation enables viewers to easily compare privacy and verification patterns without having to adjust to new axes each time.

These animations show that attention depends on creator visibility, platform status, and presence. The privacy animation reveals that Asia has the most creators in this dataset. Europe and the Americas follow, while Africa and Oceania have fewer. When the bars change from “Private” to “Public,” the order stays about the same. The bar heights shift, suggesting that discovery chances differ by region. The verification animation shows that verified accounts are rare everywhere. Verified accounts are so rare that the “Verified” bars are nearly flat on this scale, which supports the idea that most TikTok attention in this sample comes from regular, non-verified creators. Attention does not just focus on a small group of official celebrities. Thus, these graphs shift the story from asking “where are creators?” to considering “who is actually competing for attention?”

Visualization 9: Global TikTok Creator Map

Open the Creator by Country Shiny app: https://kaonichiwa.shinyapps.io/project_shiny2/

So far, I have focused on creators as individuals or grouped by audience size. The final Shiny app brings these ideas together by showing how they play out across the world. The map lets users color countries based on a chosen metric. For example, users can choose the number of creators, total followers, total likes, or average engagement. A single slider sets the minimum follower count to define who is a “creator” on the map. As users adjust the slider, the map shifts from highlighting casual posters to spotlighting high-impact influencers. Country statistics change in real time. The comparison panel and bar chart help make this regional story clear. For the selected countries, it shows how much global attention they receive based on the chosen metric.

The app shows that location matters, but not just because more creators mean more influence. When we increase the follower threshold or change the metric, some regions with many creators become less important. Others, with fewer but larger accounts, stand out in terms of followers or likes. The log-style color scale helps show both the high and low extremes. This makes it clear that attention is uneven across countries. This map brings the story together. Follower counts, engagement, and geography all shape who gets noticed on TikTok. The biggest or most common accounts are not always the most influential.

Conclusion

The project demonstrates that a higher follower count does not guarantee higher engagement per follower (responsiveness). On TikTok, attention is less about how many people follow you and more about how many of them actually respond. Most accounts in this dataset are micro- or mid-tier creators. The visualizations, such as the histogram, donut chart, and scatterplot, show that as follower counts rise, engagement rates typically fall and become less variable. High engagement is most common among micro- and mid-tier creators. The findings reveal a fundamental trade-off: greater reach often comes at the cost of less engaged audiences.

The trade-off is then explained by focusing on creator behavior and engagement style rather than just audience size. The Engagement Drivers Explorer Shiny app uses an interactive heatmap to group creators by how often they post (“videos_count”) and by engagement style—like-led, balanced, or comment-led—based on the mix of likes and comments. Clicking a cell reveals a segment snapshot, comparative group features (privacy, verification, predicted language), and a creator table. This helps users see how different groups gain engagement. The heatmap shows that among like-led creators, those posting less often have much higher median engagement than frequent posters, even though the latter usually have more followers. This supports the finding that influence is not only about follower count; it also depends on creators’ actions and audience responses.

The last four visualizations illustrate how geography and visibility shape attention in the creator ecosystem. The treemap shows that most creators cluster in a few countries, while many regions host only a handful of accounts, creating an uneven landscape before engagement is even considered. The lollipop chart ranks countries by median engagement per follower, demonstrating that having more creators does not always lead to stronger engagement. The animated continent charts compare private versus public and verified versus non-verified accounts across continents, highlighting how platform status affects account discoverability. The global map Shiny app ties these patterns together, allowing users to switch between metrics and set a follower threshold. This exposes how regional prominence shifts depending on whether attention is measured by presence, reach, or responsiveness. Altogether, these visuals reveal that a greater number of creators in a country does not always translate to greater influence. Instead, attention is influenced by geography, engagement patterns, and account visibility on the platform.

Overall, these findings challenge the idea that influence is only about follower count. For brands, they suggest working with engaged micro- and mid-tier creators, especially where they are most active and visible, may be more effective than targeting only the biggest accounts. For creators, the data shows that attention depends on posting, interaction, and visibility settings, not just audience size. With more data and clearer signals over time, this approach could track how attention shifts, which communities grow fastest, and how engagement styles differ across cultures and locations.

Sources