This project explores a dataset of >150000 Kickstarter projects. Probably the most popular crowdfunding site. The dataset was obtained by querying directly the (undocumented) kickstarter API.The dataset contains information the project goal, the outcome (successfull, failed, how much was in the end pledged for, backers count), project location, the category, when it was created and also some extra information about the creator for a part of the projects. Can we use all this information to understand the recipe for a successful project?
First, let’s take a look at some basic statistics / numbers about the dataset.
Features in the dataset:
## [1] "id" "backers_count"
## [3] "country" "creator.failed_experience"
## [5] "creator.id" "creator.successful_experience"
## [7] "creator.total_experience" "currency"
## [9] "deadline" "goal"
## [11] "launched_at" "pledged"
## [13] "slug" "spotlight"
## [15] "state" "static_usd_rate"
## [17] "usd_goal" "usd_pledged"
## [19] "category_name" "location_country"
## [21] "location_type" "location_name"
## [23] "category_parent" "launched_at_month"
## [25] "launched_at_year" "launched_weekday"
## [27] "deadline_weekday"
There are 145508 projects in the dataset. These projects are distributed among successful* and not successful** projects like:
## canceled failed live successful suspended
## 7712 64746 0 72621 429
The top 10 categories of projects are:
## art comics crafts dance design
## 13948 4972 3804 2549 6112
## fashion film & video food games journalism
## 6867 22150 7684 10744 2260
And the top 10 countries where the projects are located in:
## US GB CA AU NL NZ DE SE FR IT
## 120590 10852 4710 2200 769 457 346 301 291 271
How many project are successful, how many failed?
There are five states: successful, failed, canceled, suspended and live. For the rest of this analysis, let’s focus on successful and for whatever reason unsuccessful projects. Thus we wrap up failed, canceled and suspended into a new variable ‘unsuccessful’.
## failed successful
## 64746 72621
From which country are most projects coming from?
By a large margin the most projects come from the USA. Let’s check how they are distributed within the USA:
Most projects come from Los Angeles. However if we add Brooklyn to New York, New York actually wins by a large margin. Third and fourth place go to Chicago and San Francisco, each with less than half the projects from LA.
Lets fix Brooklyn and New York
Let’s look at the distribution of project goals and what people actually pledged for:
summary(ks$usd_goal)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 1608 5000 31170 10520 125000000
summary(ks$usd_pledged)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 120 1496 10570 5796 20340000
Most of the projects have a goal of less than $5000. The goals histogram shows discrete jumps, round values for the goals are preferred over random goals. Both distributions look similar to an exponential distribution, with the pledged distribution dropping faster from zero, most projects receive less than $1300.
If we switch to a logarithmic scale, the two distributions look more linear. The pledged for distribution also shows discrete jumps at round values, presumably those are successful projects that just reached their goal.
Let’s create a new variable: percentage of goal achieved. This allows us to compare projects of different sizes. We divide pledged by goal and multiply it by 100 to get percentage of the project goal reached.
and it’s distribution looks like:
Clearly we are looking here at the superposition of two distributions that both look like they are very close to an exponential distribution.
This is the superposition of successful and failed projects. Projects that reach 100% become successful.
summary(filter(ks, state=='successful')$goal_reached)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 46 104 115 505 150 4154000
summary(filter(ks, state=='failed')$goal_reached)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.05714 2.06700 9.41800 12.27000 108.00000
Most of the failed projects raised pretty much nothing. The lack of projects between 50% and 100% is somewhat surprising. Apparently, projects that made it so far have enough momentum not to fail.
The feature backers_count tells us how many people pledged money for a project. Let’s take a look at it’s distribution:
summary(ks$backers_count)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 4.0 26.0 134.7 81.0 219400.0
Again we find an exponential like, very narrow, distribution. Most projects have less than 23 backers, and the likelyhood to receive more than 100 backers are pretty slim.
Let’s create a new feature: the amount pledged per individual backer. We divide the amount pledged per project by the backers count.
and it’s distribution looks like:
summary(ks$pledged_backer)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 22.22 45.91 66.49 78.77 10000.00
Most get around $45 per backer. The distribution is rather narrow, the money that individual backers are willing to pledge doesn’t vary too much.
We can split the histogram up by successful / failed projects:
It looks like two log-normal distributions, but the failed projects have a peak for very low amounts pledged per backer.
Let’s check if it’s log-normal by plotting the histogram of the log of pledged_backer.
This looks very much like two normal distributions.
We should also normalize the amount pledged per backer. I.e., what fraction of the goal is pledged per backer?
summary(ks$pledged_backer_pct)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.29 1.01 6.42 2.58 166700.00
Again an exponential distribution, each backer usually contributes only with a small fraction to the projects goal. Typically less than 1%.
Let’s take a look at the different categories we have in the dataset.
Which categories for projects are there and how many projects do they have?
So most projects are of the categories music, film&video and publishing.
There are three variables that describe the experience of the creators: creator.total_experience, creator.successful_experience and creator.failed_experience. Each variable describes how many projects a creator had before this project: in total, successful and failed.
summary(ks$creator.total_experience)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.2787 0.0000 98.0000
summary(ks$creator.successful_experience)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.2006 0.0000 92.0000
summary(ks$creator.failed_experience)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.0663 0.0000 18.0000
Again three exponential like distributions. Creators with lots of failed projects are rather rare, much rarer than creators with lots of successful projects. In generell most creators created only one project on Kickstarter.
How long does a Kickstarter campaign usually last?
Most campaigns have a timeframe of a month (30 days), while some are extremely short (a few days), other campaigns take two months or longer.
Let’s look at the number of projects / month over the course of the last years.
Looks like a pretty steady growth. Two interesting features are, a sharp rise in the number of projects in July 2014, and a recurring drop in number of projects around November/December of each year.
Let’s start by looking at a scatterplot of the goal vs. goal_reached. Are there any relationships?
Ok, that’s a mess. Though, it looks like there is a trend, that projects with larger goals tend to raise proportionally less of their goal. However, before we draw conclusions we should first check the relationship between project size and associated success rate.
My intuition tells me, that project goal and success rate (Number of successful projects / Number of projects for each goal bucket) need to be related.
That looks like a clear relationship. Smaller project goal, higher success rates. Larger projects (>~$100 000) settle at success rates around 15%. Let’s try to model this relationship.
##
## Calls:
## m1: lm(formula = 1/I(success_rate) ~ I(goal.bucket), data = by_goal)
##
## =========================
## (Intercept) 1.767
## (1.198)
## I(goal.bucket) 0.000***
## (0.000)
## -------------------------
## R-squared 0.248
## adj. R-squared 0.232
## sigma 4.155
## F 14.863
## p 0.000
## Log-likelihood -132.608
## Deviance 776.774
## AIC 271.215
## BIC 276.766
## N 47
## =========================
That looks like a good fit. Success rate and goal are inversely related.
Let’s check if the experience of the creator of a project (creator.total_experience) has any influence on the amount of money pledged. Note, at the time a project was actually created, the numbers may have been different, we can not see the number of projects created by a person by that time.
Hard to say from this plot. We will have to make buckets again and check the success rates like before!
This looks like a very clear relationship: experience pays off. Each project that a creator had before (at this point ignoring how many were successful or not), the success rate increases significantly.
Theory: creators with lots of experience have had mostly successful projects, while creators with failing projects disappear fast.
So the total # projects by each creator and the # of successful projects form almost a straight line. People with lots of projects tend to have these from successful projects.
Since we already found experienced creators have a higher success rate, this is probably not too surprising.
Let’s look at the influence of the category on the success of projects next. How many successful projects were in each category?
So crafts, journalism and technology are not doing very well, while theater and design projects have the highest success rates.
How did the different categories change over time? Did some become more popular?
Some categories show very strong seasonal fluctuations, especially theater and art projects. All categories show growth, which is reflecting the overall growth of Kickstarter. Some categories see sharp spikes in the number of projects in mid 2014, mostly technology, journalism, crafts, photography and food.
Let’s see how the success rate of projects changed over time
So there was a sudden drop in the success rate early 2014 with a cross over around spring 2014. From this point on there were more failed projects than successful ones (success rate < 0.5).
Let’s see next how the total amount of money pledged varied over the same time.
There are many failed projects, but they hardly receive any money. The successful projects generate more and more money. The seasonal fluctuations are again very obvious.
One question remains: does each individual project on average receive more or less money over the last years?
So the money pledged per project did rise sharply from 2011 to 2013, and is somewhat constant at a little over $20k since then.
Did the amount that each individual backer pledged change over the same time?
That’s interesting, the mean amount pledged per backer is very steady at around $74.
Did project campaigns become longer / shorter?
Projects become shorter, from > 1.5 months down to one month. Throughout Kickstarter history successful projects tended to be shorter.
The success rate of the projects depends inversely on the project goal. The bigger a project, the less likely will it get funded - for larger projects it’s getting harder to get enough momentum to raise the money. We also found positive correlations between the experience of creators and the outcome. The more projects a creator already created, the higher the probability for a successful project. Presumably, this is experience plus successful creators will tend to stick around. The success rate of projects dropped sharply early 2014 below 50% - more projects fail since then than succeed.
The strongest relationship was probably between the category of a project and the success rate. Theater projects, for example, are incredible successful. My guess is, that they are rather small and hit a certain established fan base. Craft projects on the other are not very successful, though they should also be small. My guess is, that it is more difficult to establish a fan base. We could investigate further what differs theater from craft projects, e.g. by analysing the mean project size for successful / failed projects. Especially since we also found a large influence of goal on the success rate.
Let’s check the influnce of project size on success rate split up by project category.
Some project categories are more resilient when it comes to goal size, e.g. technology, design or journalism. Other categories (music, art or food) have a success rate that depends stronger on goal size.
Let’s check how the number of failed / successful projects changed over the last years broken down to the categories.
So the hype in mid 2014 affected only unsuccessful projects, there was basically no increase in successful projects! Also only some categories were affected: technology, journalism, photography, crafts and fashion.
Let’s look at the change of success rates over time:
Success rates were generally going down from 2013 to 2014 across almost all categories. Some categories have no failed projects over long stretches (theater, food). This has to be data missing in the dataset and explains the high success rates for these categories!
The amout of money that was successfully raised did not go down, more to the opposite. Technology, design and games projects show perhaps the most steady recent growth in terms of money raised. Note the logarithmic y-scale!
Before we found that the top 4 locations are: New York, Los Angeles, San Francisco and Chicago. Let’s check how these four cities performed over the last years.
Very high success rates for all four cities, well above the average. Success rates in these places also didn’t drop after 2014 as for the average! Looks like Chicago raises the least money of the four cities, and San Francisco most. This is probably due to more technology projects in SF.
We should also look at the amount raised per successful project:
Projects based in San Francisco become on average way bigger than in the three other cities and reached around $60000 per successful project!
By number of projects created by creator and facetted by category:
There is not a lot of data, but there seem to be some trends. Games projects profit from more engaged creators, as do publishing or comics projects. Music projects on the other hand don’t really.
For most categories, that’s a clear trend. Success rates are slightly higher when they have short timeframes, longer projects are less successful. This is especially true for projects from the categories games and comics.
The longer the project, the smaller the relative contribution by each backer. Also, the project’s goals increases with longer campaigns:
Let’s plot relative amount pledged versus goal:
The goal and the relative amount pledged per backer are inveresely related: The higher the goal, the smaller the contribution by each backer, i.e. the amount pledged by each backer is more or less constant. Let’s model this:
##
## Calls:
## m2: lm(formula = 1/I(pledged_backer_pct) ~ I(median_goal), data = by_timeframe,
## weights = n)
##
## =========================
## (Intercept) 0.047
## (0.144)
## I(median_goal) 0.000***
## (0.000)
## -------------------------
## R-squared 0.780
## adj. R-squared 0.764
## sigma 13.240
## F 49.514
## p 0.000
## Log-likelihood 0.589
## Deviance 2454.315
## AIC 4.823
## BIC 7.141
## N 16
## =========================
Not a perfect fit, but reasonable: The bigger the goal the smaller the relative amount pledged per backer.
For some categories experience has influence on the success rates of projects. However, the data is not very complete and we don’t have information about the creators for most projects. We already found that the project size plays a role for the success rate. However, it’s influence is also different for different categories. The success rate for projects from some categories (e.g. music) drops faster for larger goals than from other categories (e.g. technology).
The amount pledged per backer is relatively independent on the goal of projects. I.e., the larger the goal of a project, the smaller the relative amount pledged by each backer. This may explain why larger projects fail more often: they need to convince more people to support them.
I created two models with this dataset. The first model is the relationship between project goal size and success rate. I found that a simple model where the success rate is inversely proportional to the goal size describes the dataset very well. One limitation of this model is, that the success rate dropped over the last two years, while project goals became bigger. It might be appropriate to incorporate this into the model.
The second model is again an inverse proportionality between project goal and relative amount pleged per backer. Basically, this means that the absolute amount that is pledged per backer is a relatively fixed quantity. Bigger projects need to find more backers that each contribute the same amount as they would for a smaller project.
The number of successful and failed projects at Kickstarter launched each month over time. Kickstarter started around early 2009. Since that time, the number of projects each month has grown to over 5000. Of these, about 3000 projects failed and 2000 were successful. In the first years more projects were successful than failed, however in early 2014 the number of failed has risen sharply to a peak in July 2014. Since then, the failed projects are majority. Seasonal fluctuations show up in the number of successful projects. Each November that number goes down and rises again in the next spring.
The success rate is defined as the number of successful projects divided by the total number of projects. All projects are grouped by their goal size in USD in buckets of each $5000. Then the success rate of each group is calculated by counting the successful and failed projects. Projects with smaller goals have a higher success rate, as projects become bigger the success rate drops. The red line is a fit following a simple model, where success rate and goal size are inversely proportional (\(goal \propto \frac{1}{success\_rate}\)). This simple model seems to describe the underlying dynamics surprisingly well.
Most projects are located in four cities: New York, Los Angeles, San Francisco and Chicago. This plot analyses how much money is on average pledged per successful project in each of these cities over the last years. The orange line is the average for all projects in the dataset from the whole world. The size of each data point corresponds to the number of projects in that place in that year. Projects in San Francisco raise on average almost 3x more than in the other 3 cities! And from the trend it looks like San Francisco is going to even increase this lead.
This dataset is definitely highly interesting and also very relevant. We got quite a lot of information about a huge amount of crowdsourcing projects located on Kickstarter. Even though the dataset is not 100% complete (for example projets from the theater category show an unreasonable success rate before 2012), we could still draw lots of valueable conclusion. The missing data is due to the fact that the Kickstarter API is not made for a study like this, and some workarounds are necessary to retrieve data about as many projects as possible.
The next step would be to predict for live projects whether they will succeed - or not. We found that the success rate is strongly affected by project goal size, location, category, campaign length and experience of the creator. With this information it should be possible to make reasonable predictions for the success of a live project.
Probably the single most important insight is the overall growth of Kickstarter, though is maybe not too surprising given the recent hype of crowdfunding. However what might be surprising and also worrysome is the drop of the success rate early 2014. Even though the total money raised is still growing, a too large fraction of failed projects will be discouraging and frustrating for new members of the community. I believe Kickstarter should make sure that the quality of the projects stays high in the future.