Earnings Per Patron Among Patreon Projects Charging by the Month

Introduction

My goal in this analysis is to explore earnings per patron for Patreon projects charging by the month. This is a companion to my analyses of the distribution of earnings among Patreon projects and the distribution of number of patrons among Patreon projects.

I am particularly interested in the question of whether earnings per patron differs significantly between the top-ranked Patreon projects and the rest of the projects.

For those readers not familiar with the R statistical software and the additional Tidyverse software I use to manipulate and plot data, check out the various ways to learn more about the Tidyverse.

Setup

I load the following R libraries, for the purposes listed:

tidyverse. Do general data manipulation and plotting.
tools. Compute MD5 checksums.

library("tidyverse")
library("tools")

Preparing the data

Obtaining the Patreon data

I use a local copy of the Graphtreon-collected Patreon data for December 2022. This dataset contains an entry for every Patreon project for which the number of patrons is publicly reported.

Because the Graphtreon data is proprietary, I store it in a separate directory and do not make it available as part of this analysis. See the “References” section below for more information.

I check the MD5 hash values for the file, and stop if the contents are not what are expected.

stopifnot(md5sum("../../graphtreon/graphtreonBasicExport_Dec2022.csv") == "98ff63f7d6aa3f2d1b2acaf40425ac9b")

Loading the Patreon data

I load the raw Patreon data from Graphtreon:

patreon_tb <- read_csv("../../graphtreon/graphtreonBasicExport_Dec2022.csv")

## Rows: 217861 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (6): Name, Creation Name, Category, Pay Per, Patreon, Graphtreon
## dbl  (4): Patrons, Earnings, Is Nsfw, Twitter Followers
## dttm (1): Launched
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Analysis

Preliminary analysis

I do some basic exploratory data analysis, starting with the total amount of data in the dataset.

total_projects <- length(patreon_tb$Patrons)

Then I add a new column Patrons_Rank to allow analysis based on a project’s rank by its number of patrons.

patreon_tb <- patreon_tb %>%
  arrange(desc(Patrons)) %>%
  mutate(Patrons_Rank = row_number())

One question that came up in previously looking at the data was whether top-ranked projects were less likely to report earnings than lower-ranked projects. To guage the extent to which this is true, I batch all projects together into batches of 1,000 projects each, then look at how many projects in each batch do not report their earnings at all.

patreon_tb %>%
  mutate(Patrons_Rank_Group = ceiling(Patrons_Rank / 1000)) %>%
  select(Patrons_Rank_Group, Earnings) %>%
  group_by(Patrons_Rank_Group) %>%
  summarize(Not_Reporting = sum(is.na(Earnings)) / 1000) %>%
  ggplot(aes(x = Patrons_Rank_Group, y = Not_Reporting)) +
  geom_point() +
  scale_y_continuous(breaks = c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8), limits = c(0, 0.8)) +
  xlab("Rank by Number of Patrons (000)") +
  ylab("Fraction Not Reporting Earnings") +
  labs(
    title = "Patreon Projects Not Reporting Earnings",
    subtitle = "Fraction Not Reporting Earnings, By Rank",
    caption = "Data source: Graphtreon Basic CSV Export, December 2022"
  ) +
  theme_gray() +
  theme(axis.title.x = element_text(margin = margin(t = 5))) +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))

About three quarters of the top-ranked projects (by number of patrons) do not report their earnings, while just over a quarter of the lowest-ranked projects do not report earnings.

Given the increased absence of earnings data for the top-ranked projects, it’s not clear whether the top-ranked projects that do report earnings are truly representative or not. Unfortunately I can’t do much about that lack of data.

Projects reporting earnings

I now look at the projects that do report earnings. (This repeats some calculations from the earlier analysis.)

no_reported_earnings <- patreon_tb %>%
  filter(is.na(Earnings)) %>%
  summarize(n()) %>%
  as.integer()

reported_earnings <- total_projects - no_reported_earnings

zero_earnings <- patreon_tb %>%
  filter(!is.na(Earnings) & Earnings <= 0) %>%
  summarize(n()) %>%
  as.integer()

nonzero_earnings <- reported_earnings - zero_earnings

nonzero_nonmonthly_earnings <- patreon_tb %>%
  filter(!is.na(Earnings) & Earnings > 0) %>%
  filter(is.na(`Pay Per`) | `Pay Per` != "month") %>%
  summarize(n()) %>%
  as.integer()

nonzero_monthly_earnings <- nonzero_earnings - nonzero_nonmonthly_earnings

For the month in question there were a total of 217,861 Patreon projects in the Graphtreon dataset, of which 83,294 did not make their earnings public. This reduces the potential sample size down to 134,567 projects at best.

There were only 265 projects that reported zero earnings (as opposed to not publicly reporting earnings at all). Given the relatively small size of this group, I ignore it in the analysis. (This also simplifies doing log-log plots, as discussed below.)

There were only 5,369 projects that reported nonzero earnings and did not charge by the month. Again, given the relatively small size of this group, I ignore it as well.

Projects with earnings from monthly charges

I now construct a sample dataset consisting of all projects reporting nonzero earnings from monthly charges for the month in question, ranked by the amount of earnings, from greatest to least.

by_earnings_tb <- patreon_tb %>%
  filter(!is.na(Earnings) & Earnings > 0) %>%
  filter(!is.na(`Pay Per`) & `Pay Per` == "month") %>%
  arrange(desc(Earnings)) %>%
  mutate(Earnings_Rank = row_number())

This sample dataset contains a total of 128,933 projects, representing 59% of all projects in the Graphtreon dataset.

How are earnings and number of patrons related?

One obvious way for a Patreon project to have higher earnings is to have more patrons. But that’s not the only way; in particular, a project could increase the amount of money they get from each patron, for example, because relatively more patrons are in higher membership tiers. Which factor is more important for the Patreon projects in my sample dataset?

To investigate this, I start by calculating the correlation coefficient between the number of patrons and earnings across all the projects with nonzero earnings from monthly charges for the month in question. (More specifically, this is Pearson’s $r$, where I consider the number of patrons to be the independent variable and earnings to be the dependent variable.)

cor_p_e <- cor(by_earnings_tb$Patrons, by_earnings_tb$Earnings)

The resulting correlation coefficient $r$ is 0.86. This is a positive number, since in general projects with more patrons have higher earnings, and it is a reasonably strong correlation but not perfect. (If earnings were perfectly correlated with the number of patrons then the correlation coefficient would be 1.) Looking at $r^2$ instead, I note that about 74% of the variance in earnings is explained by the variance in the number of patrons.

To get a better feel for how the amount of earnings varies according to the number of patrons, I plot each project’s earnings vs. its number of patrons:

by_earnings_tb %>%
  ggplot(mapping=aes(x = Patrons, y = Earnings)) +
  geom_point(alpha = 0.1) +
  coord_trans(x = "log10", y = "log10") +
  scale_x_continuous(breaks = c(1, 2, 5, 10, 25, 100, 250, 1000, 2500, 10000, 25000, 50000),labels = scales::label_comma()) +
  scale_y_continuous(breaks = c(1, 10, 100, 1000, 10000, 100000), labels = scales::label_dollar()) +
  xlab("Number of Patrons") +
  ylab("Earnings") +
  labs(
    title = "Earnings vs. Number of Patrons (Log-Log)",
    subtitle = "All Projects with Nonzero Earnings from Monthly Charges",
    caption = "Data source: Graphtreon Basic CSV Export, December 2022"
  ) +
  theme_gray() +
  theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
  theme(axis.title.x = element_text(margin = margin(t = 5))) +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))

This plot shows two things: First, there are projects with the same number of patrons that have wildly different earnings. For example, consider the dark vertical line at the lower left, which represents projects that have only a single patron. There are some such projects that have less than $1 a month in earnings and others that have more than $100, with one earning over $1,000.

Similarly, projects with very similar earnings can realize those earnings from patron bases that are wildly different in size. For example, looking at projects around the $1,000 mark in earnings, there are projects that rely on less than ten patrons to earn that much, and others that require over a thousand patrons to realize the same level of earnings.

I fit a simple linear model to predict earnings for a given number of patrons. I constrain the model to have a $y$-intercept of zero because a project with no patrons will obviously have no earnings.

e_m <- lm(by_earnings_tb$Earnings ~ 0 + by_earnings_tb$Patrons)

The model predicts that the monthly earnings will be approximately $3.98 per patron. This is shown in the following graph, which adds predicted earnings to the previous graph.

by_earnings_tb %>%
  mutate(Predicted_Earnings = e_m$coefficients[1] * Patrons) %>%
  ggplot(mapping=aes(x = Patrons, y = Earnings)) +
  geom_point(alpha = 0.1) +
  geom_line(mapping=aes(y = Predicted_Earnings), color = "#009E73") +
  coord_trans(x = "log10", y = "log10") +
  scale_x_continuous(breaks = c(1, 2, 5, 10, 25, 100, 250, 1000, 2500, 10000, 25000, 50000),labels = scales::label_comma()) +
  scale_y_continuous(breaks = c(1, 10, 100, 1000, 10000, 100000), labels = scales::label_dollar()) +
  xlab("Number of Patrons") +
  ylab("Earnings") +
  labs(
    title = "Predicted and Actual Earnings vs. Patrons (Log-Log)",
    subtitle = "All Projects with Nonzero Earnings from Monthly Charges",
    caption = "Data source: Graphtreon Basic CSV Export, December 2022"
  ) +
  theme_gray() +
  theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
  theme(axis.title.x = element_text(margin = margin(t = 5))) +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))

How are the number of patrons and earnings per patron related?

I next take the set of all projects with nonzero earnings from monthly charges for the month in question, and add a new column showing earnings per patron. I then calculate some basic statistics for that measure.

by_earnings_tb <- by_earnings_tb %>%
  mutate(EPP = Earnings / Patrons)

max_epp = max(by_earnings_tb$EPP)
min_epp = min(by_earnings_tb$EPP)
mean_epp = mean(by_earnings_tb$EPP)
sd_epp = sd(by_earnings_tb$EPP)
median_epp = median(by_earnings_tb$EPP)

For the month in question, the earnings per patron from monthly charges ranged from a minimum of $0.002 per patron to a maximum of $1,197.04; the mean (average) earnings per patron was $6.83 (with a standard deviation of $12.01), and the median earnings per patron was $4.50. (Note that even the median value is higher than the predicted value of $3.98 from the linear model.)

I now plot the earnings per patron vs. the number of patrons.

by_earnings_tb %>%
  ggplot(mapping=aes(x = Patrons, y = EPP)) +
  geom_point(alpha = 0.1) +
  scale_x_continuous(breaks = c(5000, 10000, 15000, 20000, 25000, 30000, 35000, 40000), labels = scales::label_comma()) +
  scale_y_continuous(labels = scales::label_dollar()) +
  xlab("Number of Patrons") +
  ylab("Earnings Per Patron") +
  labs(
    title = "Earnings Per Patron vs. Number of Patrons",
    subtitle = "All Projects with Nonzero Earnings from Monthly Charges",
    caption = "Data source: Graphtreon Basic CSV Export, December 2022"
  ) +
  theme_gray() +
  theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
  theme(axis.title.x = element_text(margin = margin(t = 5))) +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))

It look as if most of the variability in earnings per patrons occurs in projects with a relatively low number of patrons. I’ll check that by regraphing the $x$ axis on a log scale.

by_earnings_tb %>%
  ggplot(mapping=aes(x = Patrons, y = EPP)) +
  geom_point(alpha = 0.1) +
#  coord_trans(x = "log10", y = "log10") +
  coord_trans(x = "log10") +
  scale_x_continuous(breaks = c(1, 2, 5, 10, 25, 100, 250, 1000, 2500, 10000, 25000, 100000, 200000), labels = scales::label_comma()) +
#  scale_y_continuous(breaks = c(1, 10, 25, 100, 250, 1000, 2500), labels = scales::label_dollar()) +
  xlab("Number of Patrons") +
  ylab("Earnings Per Patron") +
  labs(
    title = "Earnings Per Patron vs. Number of Patrons (Log Scale)",
    subtitle = "All Projects with Nonzero Earnings from Monthly Charges",
    caption = "Data source: Graphtreon Basic CSV Export, December 2022"
  ) +
  theme_gray() +
  theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
  theme(axis.title.x = element_text(margin = margin(t = 5))) +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))

I compute the correlation between the number of patrons and the earnings per patron.

cor_p_epp <- cor(by_earnings_tb$Patrons, by_earnings_tb$EPP)

The resulting correlation coefficient $r$ is -0.01.

This is a negative number, since in general projects with more patrons have lower earnings per patron, but there is very little correlation between these variables. (If there was no correlation at all then the correlation coefficient would be 1.)

Finally, as I did in a previous discussion, I plot a histogram of earnings per patron to see its distribution. Only 694 projects had over $50 in earnings per patron, so I don’t bother extending the histogram beyond that point. The solid orange line shows the mean earnings per patron, and the dashed green line the median.

by_earnings_tb %>%
  filter(EPP <= 50) %>%
  ggplot(mapping=aes(x = EPP)) +
  geom_histogram(binwidth = 1) +
  geom_vline(xintercept = mean_epp, color = "#E69F00") +
  geom_vline(xintercept = median_epp, color = "#009E73", linetype = "dashed") +
  scale_x_continuous(labels = scales::label_dollar()) +
  scale_y_continuous(labels = scales::label_comma()) +
  xlab("Earnings Per Patron") +
  ylab("Number of Projects") +
  labs(
    title = "Distribution of Earnings Per Patron",
    subtitle = "All Projects with Nonzero Earnings from Monthly Charges",
    caption = "Data source: Graphtreon Basic CSV Export, December 2022"
  ) +
  theme_gray() +
  theme(axis.title.x = element_text(margin = margin(t = 5))) +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))

From the graph it’s apparent that most projects earn $10 or less per patron; only 19,784 projects (15% of all projects with nonzero earnings from monthly charges) earned more than $10 per patron, and only 5,340 projects (4%) earned more than $20 per patron.

Kevin Kelly’s “1,000 True Fans”

The technology pundit Kevin Kelly once claimed that the secret to success as a creator in the Internet age was to have “1,000 true fans”:

To be a successful creator you don’t need … millions of dollars or millions of customers, millions of clients or millions of fans. To make a living as a craftsperson, photographer, musician, designer, author, animator, app maker, entrepreneur, or inventor you need only thousands of true fans.

A true fan is defined as a fan that will buy anything you produce. …

If you keep the full $100 of each true fan, then you need only 1,000 of them to earn $100,000 per year. That’s a living for most folks. …

Kelly’s blog post spawned lots of follow-on blog posts, podcasts, YouTube videos, and even books. But in all the breathless promotion of the idea it’s not clear if anyone thought to actually test Kelly’s claim that “1,000 true fans is an alternative path to success other than stardom. … It’s a much saner destiny to hope for. And you are much more likely to actually arrive there.”

So let’s test it in the context of Patreon. As Kelly notes, it’s not just enough to have 1,000 fans; they have to provide you $100 in annual profit per year. How many projects meet this criterion?

true_fan_projects <- by_earnings_tb %>%
  filter(Patrons >= 1000 & EPP >= (100. / 12)) %>%
  summarize(n = n()) %>%
  as.integer()

There are only 31 projects meeting this criterion (about 0.02% of all projects with nonzero monthly earnings), and fewer than that if we consider that Kelly was discussing $100 in profit per fan, not $100 in revenue (which is what Patreon earnings are equivalent to).

Let’s change the criterion a bit. As Kelly notes, “If you are able to only earn $50 per year per true fan, then you need 2,000. (Likewise if you can sell $200 per year, you need only 500 true fans.)” So let’s count the number of projects that are earning the equivalent of $100,000 or more a year:

true_fan_projects_2 <- by_earnings_tb %>%
  filter(Earnings >= (100000. / 12)) %>%
  summarize(n = n()) %>%
  as.integer()

This improves the picture somewhat. There are 238 projects meeting this criterion, about 0.18% of all projects with nonzero monthly earnings (but, again, this doesn’t account for any expenses incurred in the course of maintaining a Patreon project).

Does this substantiate or refute Kelly’s claim that you are “much more likely” to “make a living as a craftsperson, photographer, musician, designer, author, animator, app maker, entrepreneur, or inventor” pursuing 1,000 true fans as opposed to pursuing more traditional approaches? Unfortunately, it’s very difficult to find hard numbers on how much success people achieve pursuing fame and fortune in the “creator economy.” But for now at least I remain a skeptic.

Appendix

Caveats

This analysis is subject to the following caveats, among others:

The Graphtreon dataset does not contain Patreon projects that do not publicly report their number of patrons. If the likelihood of a project doing this is not uniform across all projects, this may skew the results, since the dataset would not necessarily be a representative sample of all Patreon projects.
As noted above, the Graphtreon dataset contains many projects that do not publicly report earnings. This is more common among projects with the most patrons, and may skew the results for the highest-ranked projects by earnings.

References

Patreon project data was obtained from Graphtreon LLC as a basic CVS export for the month of December 2022, https://graphtreon.com/data-services.

Environment

I used the following R environment in doing the analysis above:

sessionInfo()

## R version 4.2.1 (2022-06-23)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] tools     stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] forcats_0.5.2   stringr_1.5.0   dplyr_1.0.10    purrr_1.0.1    
## [5] readr_2.1.3     tidyr_1.2.1     tibble_3.1.8    ggplot2_3.4.0  
## [9] tidyverse_1.3.2
## 
## loaded via a namespace (and not attached):
##  [1] lubridate_1.9.0     assertthat_0.2.1    digest_0.6.31      
##  [4] utf8_1.2.2          R6_2.5.1            cellranger_1.1.0   
##  [7] backports_1.4.1     reprex_2.0.2        evaluate_0.19      
## [10] highr_0.10          httr_1.4.4          pillar_1.8.1       
## [13] rlang_1.0.6         googlesheets4_1.0.1 readxl_1.4.1       
## [16] rstudioapi_0.14     jquerylib_0.1.4     rmarkdown_2.19     
## [19] labeling_0.4.2      googledrive_2.0.0   bit_4.0.5          
## [22] munsell_0.5.0       broom_1.0.2         compiler_4.2.1     
## [25] modelr_0.1.10       xfun_0.36           pkgconfig_2.0.3    
## [28] htmltools_0.5.4     tidyselect_1.2.0    fansi_1.0.3        
## [31] crayon_1.5.2        tzdb_0.3.0          dbplyr_2.2.1       
## [34] withr_2.5.0         grid_4.2.1          jsonlite_1.8.4     
## [37] gtable_0.3.1        lifecycle_1.0.3     DBI_1.1.3          
## [40] magrittr_2.0.3      scales_1.2.1        cli_3.6.0          
## [43] stringi_1.7.12      vroom_1.6.0         cachem_1.0.6       
## [46] farver_2.1.1        fs_1.5.2            xml2_1.3.3         
## [49] bslib_0.4.2         ellipsis_0.3.2      generics_0.1.3     
## [52] vctrs_0.5.1         bit64_4.0.5         glue_1.6.2         
## [55] hms_1.1.2           parallel_4.2.1      fastmap_1.1.0      
## [58] yaml_2.3.6          timechange_0.2.0    colorspace_2.0-3   
## [61] gargle_1.2.1        rvest_1.0.3         knitr_1.41         
## [64] haven_2.5.1         sass_0.4.4

Source code

The source code for this analysis can be found in the public code repository https://gitlab.com/frankhecker/misc-analysis in the patreon subdirectory.

This document and its source code are available for unrestricted use, distribution and modification under the terms of the Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. Stated more simply, you’re free to do whatever you’d like with it.