Introduction

My goal in this analysis is to explore earnings per patron for Patreon projects charging by the month. This is a companion to my analyses of the distribution of earnings among Patreon projects and the distribution of number of patrons among Patreon projects.

I am particularly interested in the question of whether earnings per patron differs significantly between the top-ranked Patreon projects and the rest of the projects.

For those readers not familiar with the R statistical software and the additional Tidyverse software I use to manipulate and plot data, check out the various ways to learn more about the Tidyverse.

Setup

I load the following R libraries, for the purposes listed:

library("tidyverse")
library("tools")

Preparing the data

Obtaining the Patreon data

I use a local copy of the Graphtreon-collected Patreon data for December 2022. This dataset contains an entry for every Patreon project for which the number of patrons is publicly reported.

Because the Graphtreon data is proprietary, I store it in a separate directory and do not make it available as part of this analysis. See the “References” section below for more information.

I check the MD5 hash values for the file, and stop if the contents are not what are expected.

stopifnot(md5sum("../../graphtreon/graphtreonBasicExport_Dec2022.csv") == "98ff63f7d6aa3f2d1b2acaf40425ac9b")

Loading the Patreon data

I load the raw Patreon data from Graphtreon:

patreon_tb <- read_csv("../../graphtreon/graphtreonBasicExport_Dec2022.csv")
## Rows: 217861 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (6): Name, Creation Name, Category, Pay Per, Patreon, Graphtreon
## dbl  (4): Patrons, Earnings, Is Nsfw, Twitter Followers
## dttm (1): Launched
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Analysis

Preliminary analysis

I do some basic exploratory data analysis, starting with the total amount of data in the dataset.

total_projects <- length(patreon_tb$Patrons)

Then I add a new column Patrons_Rank to allow analysis based on a project’s rank by its number of patrons.

patreon_tb <- patreon_tb %>%
  arrange(desc(Patrons)) %>%
  mutate(Patrons_Rank = row_number())

One question that came up in previously looking at the data was whether top-ranked projects were less likely to report earnings than lower-ranked projects. To guage the extent to which this is true, I batch all projects together into batches of 1,000 projects each, then look at how many projects in each batch do not report their earnings at all.

patreon_tb %>%
  mutate(Patrons_Rank_Group = ceiling(Patrons_Rank / 1000)) %>%
  select(Patrons_Rank_Group, Earnings) %>%
  group_by(Patrons_Rank_Group) %>%
  summarize(Not_Reporting = sum(is.na(Earnings)) / 1000) %>%
  ggplot(aes(x = Patrons_Rank_Group, y = Not_Reporting)) +
  geom_point() +
  scale_y_continuous(breaks = c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8), limits = c(0, 0.8)) +
  xlab("Rank by Number of Patrons (000)") +
  ylab("Fraction Not Reporting Earnings") +
  labs(
    title = "Patreon Projects Not Reporting Earnings",
    subtitle = "Fraction Not Reporting Earnings, By Rank",
    caption = "Data source: Graphtreon Basic CSV Export, December 2022"
  ) +
  theme_gray() +
  theme(axis.title.x = element_text(margin = margin(t = 5))) +
  theme(axis.title.y = element_text(margin = margin(r = 10))) +
  theme(plot.caption = element_text(margin = margin(t = 15), hjust = 0))

About three quarters of the top-ranked projects (by number of patrons) do not report their earnings, while just over a quarter of the lowest-ranked projects do not report earnings.

Given the increased absence of earnings data for the top-ranked projects, it’s not clear whether the top-ranked projects that do report earnings are truly representative or not. Unfortunately I can’t do much about that lack of data.

Projects reporting earnings

I now look at the projects that do report earnings. (This repeats some calculations from the earlier analysis.)

no_reported_earnings <- patreon_tb %>%
  filter(is.na(Earnings)) %>%
  summarize(n()) %>%
  as.integer()

reported_earnings <- total_projects - no_reported_earnings

zero_earnings <- patreon_tb %>%
  filter(!is.na(Earnings) & Earnings <= 0) %>%
  summarize(n()) %>%
  as.integer()

nonzero_earnings <- reported_earnings - zero_earnings

nonzero_nonmonthly_earnings <- patreon_tb %>%
  filter(!is.na(Earnings) & Earnings > 0) %>%
  filter(is.na(`Pay Per`) | `Pay Per` != "month") %>%
  summarize(n()) %>%
  as.integer()

nonzero_monthly_earnings <- nonzero_earnings - nonzero_nonmonthly_earnings

For the month in question there were a total of 217,861 Patreon projects in the Graphtreon dataset, of which 83,294 did not make their earnings public. This reduces the potential sample size down to 134,567 projects at best.

There were only 265 projects that reported zero earnings (as opposed to not publicly reporting earnings at all). Given the relatively small size of this group, I ignore it in the analysis. (This also simplifies doing log-log plots, as discussed below.)

There were only 5,369 projects that reported nonzero earnings and did not charge by the month. Again, given the relatively small size of this group, I ignore it as well.

Projects with earnings from monthly charges

I now construct a sample dataset consisting of all projects reporting nonzero earnings from monthly charges for the month in question, ranked by the amount of earnings, from greatest to least.

by_earnings_tb <- patreon_tb %>%
  filter(!is.na(Earnings) & Earnings > 0) %>%
  filter(!is.na(`Pay Per`) & `Pay Per` == "month") %>%
  arrange(desc(Earnings)) %>%
  mutate(Earnings_Rank = row_number())

This sample dataset contains a total of 128,933 projects, representing 59% of all projects in the Graphtreon dataset.

Kevin Kelly’s “1,000 True Fans”

The technology pundit Kevin Kelly once claimed that the secret to success as a creator in the Internet age was to have “1,000 true fans”:

To be a successful creator you don’t need … millions of dollars or millions of customers, millions of clients or millions of fans. To make a living as a craftsperson, photographer, musician, designer, author, animator, app maker, entrepreneur, or inventor you need only thousands of true fans.

A true fan is defined as a fan that will buy anything you produce. …

If you keep the full $100 of each true fan, then you need only 1,000 of them to earn $100,000 per year. That’s a living for most folks. …

Kelly’s blog post spawned lots of follow-on blog posts, podcasts, YouTube videos, and even books. But in all the breathless promotion of the idea it’s not clear if anyone thought to actually test Kelly’s claim that “1,000 true fans is an alternative path to success other than stardom. … It’s a much saner destiny to hope for. And you are much more likely to actually arrive there.”

So let’s test it in the context of Patreon. As Kelly notes, it’s not just enough to have 1,000 fans; they have to provide you $100 in annual profit per year. How many projects meet this criterion?

true_fan_projects <- by_earnings_tb %>%
  filter(Patrons >= 1000 & EPP >= (100. / 12)) %>%
  summarize(n = n()) %>%
  as.integer()

There are only 31 projects meeting this criterion (about 0.02% of all projects with nonzero monthly earnings), and fewer than that if we consider that Kelly was discussing $100 in profit per fan, not $100 in revenue (which is what Patreon earnings are equivalent to).

Let’s change the criterion a bit. As Kelly notes, “If you are able to only earn $50 per year per true fan, then you need 2,000. (Likewise if you can sell $200 per year, you need only 500 true fans.)” So let’s count the number of projects that are earning the equivalent of $100,000 or more a year:

true_fan_projects_2 <- by_earnings_tb %>%
  filter(Earnings >= (100000. / 12)) %>%
  summarize(n = n()) %>%
  as.integer()

This improves the picture somewhat. There are 238 projects meeting this criterion, about 0.18% of all projects with nonzero monthly earnings (but, again, this doesn’t account for any expenses incurred in the course of maintaining a Patreon project).

Does this substantiate or refute Kelly’s claim that you are “much more likely” to “make a living as a craftsperson, photographer, musician, designer, author, animator, app maker, entrepreneur, or inventor” pursuing 1,000 true fans as opposed to pursuing more traditional approaches? Unfortunately, it’s very difficult to find hard numbers on how much success people achieve pursuing fame and fortune in the “creator economy.” But for now at least I remain a skeptic.

Appendix

Caveats

This analysis is subject to the following caveats, among others:

  • The Graphtreon dataset does not contain Patreon projects that do not publicly report their number of patrons. If the likelihood of a project doing this is not uniform across all projects, this may skew the results, since the dataset would not necessarily be a representative sample of all Patreon projects.
  • As noted above, the Graphtreon dataset contains many projects that do not publicly report earnings. This is more common among projects with the most patrons, and may skew the results for the highest-ranked projects by earnings.

References

Patreon project data was obtained from Graphtreon LLC as a basic CVS export for the month of December 2022, https://graphtreon.com/data-services.

Environment

I used the following R environment in doing the analysis above:

sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] tools     stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] forcats_0.5.2   stringr_1.5.0   dplyr_1.0.10    purrr_1.0.1    
## [5] readr_2.1.3     tidyr_1.2.1     tibble_3.1.8    ggplot2_3.4.0  
## [9] tidyverse_1.3.2
## 
## loaded via a namespace (and not attached):
##  [1] lubridate_1.9.0     assertthat_0.2.1    digest_0.6.31      
##  [4] utf8_1.2.2          R6_2.5.1            cellranger_1.1.0   
##  [7] backports_1.4.1     reprex_2.0.2        evaluate_0.19      
## [10] highr_0.10          httr_1.4.4          pillar_1.8.1       
## [13] rlang_1.0.6         googlesheets4_1.0.1 readxl_1.4.1       
## [16] rstudioapi_0.14     jquerylib_0.1.4     rmarkdown_2.19     
## [19] labeling_0.4.2      googledrive_2.0.0   bit_4.0.5          
## [22] munsell_0.5.0       broom_1.0.2         compiler_4.2.1     
## [25] modelr_0.1.10       xfun_0.36           pkgconfig_2.0.3    
## [28] htmltools_0.5.4     tidyselect_1.2.0    fansi_1.0.3        
## [31] crayon_1.5.2        tzdb_0.3.0          dbplyr_2.2.1       
## [34] withr_2.5.0         grid_4.2.1          jsonlite_1.8.4     
## [37] gtable_0.3.1        lifecycle_1.0.3     DBI_1.1.3          
## [40] magrittr_2.0.3      scales_1.2.1        cli_3.6.0          
## [43] stringi_1.7.12      vroom_1.6.0         cachem_1.0.6       
## [46] farver_2.1.1        fs_1.5.2            xml2_1.3.3         
## [49] bslib_0.4.2         ellipsis_0.3.2      generics_0.1.3     
## [52] vctrs_0.5.1         bit64_4.0.5         glue_1.6.2         
## [55] hms_1.1.2           parallel_4.2.1      fastmap_1.1.0      
## [58] yaml_2.3.6          timechange_0.2.0    colorspace_2.0-3   
## [61] gargle_1.2.1        rvest_1.0.3         knitr_1.41         
## [64] haven_2.5.1         sass_0.4.4

Source code

The source code for this analysis can be found in the public code repository https://gitlab.com/frankhecker/misc-analysis in the patreon subdirectory.

This document and its source code are available for unrestricted use, distribution and modification under the terms of the Creative Commons CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. Stated more simply, you’re free to do whatever you’d like with it.