Let’s get funded on Kickstarter!

Before you start reading my analysis below, let me thank you first for being willing to see the results of my writing, and it means a lot to me.

Whatever you will see in this analysis, is the result of my study in Programming for Data Science and Practical Statistics class at Algoritma Academy. To see what I’ve learned in more detail, you can visit the Algoritma Academy learning syllabus.

Everything I have written is entirely my personal opinion based on my experience and knowledge up until now. If something is not right or missing, please feel free to contact me, I’d love to discuss it with you. Thank you.

Background and motivation

This dataset is about Kickstarter, a platform to raise money for creative projects published by Mickaël Mouillé and last updated in 2018. To download the dataset and see the details, you can visit Kaggle.

Background

One of the biggest obstacles to developing a project or business is money. sometimes, we already have an idea for our business, and maybe we already know how the business will run, but due to lack of funds, our business just doesn’t work.

Because of that reason, Kickstarter born. It’s a platform, where people can share their new visions for creative work with the communities that will come together to fund them. Simply, it’s a platform to bring creative projects to life.

Motivation

The main objective of this analysis is to try to summarize what insights we got from this particular dataset and try our best to understand how we can launch our project successfully with the power of the Kickstarter platform.

First look on the data

# Data Input and Checking Data
data <- read.csv("data/kickstarter.csv")
str(data)
## 'data.frame':    378661 obs. of  15 variables:
##  $ ID              : int  1000002330 1000003930 1000004038 1000007540 1000011046 1000014025 1000023410 1000030581 1000034518 100004195 ...
##  $ name            : chr  "The Songs of Adelaide & Abullah" "Greeting From Earth: ZGAC Arts Capsule For ET" "Where is Hank?" "ToshiCapital Rekordz Needs Help to Complete Album" ...
##  $ category        : chr  "Poetry" "Narrative Film" "Narrative Film" "Music" ...
##  $ main_category   : chr  "Publishing" "Film & Video" "Film & Video" "Music" ...
##  $ currency        : chr  "GBP" "USD" "USD" "USD" ...
##  $ deadline        : chr  "2015-10-09" "2017-11-01" "2013-02-26" "2012-04-16" ...
##  $ goal            : num  1000 30000 45000 5000 19500 50000 1000 25000 125000 65000 ...
##  $ launched        : chr  "2015-08-11 12:12:28" "2017-09-02 04:43:57" "2013-01-12 00:20:50" "2012-03-17 03:24:11" ...
##  $ pledged         : num  0 2421 220 1 1283 ...
##  $ state           : chr  "failed" "failed" "failed" "failed" ...
##  $ backers         : int  0 15 3 1 14 224 16 40 58 43 ...
##  $ country         : chr  "GB" "US" "US" "US" ...
##  $ usd.pledged     : num  0 100 220 1 1283 ...
##  $ usd_pledged_real: num  0 2421 220 1 1283 ...
##  $ usd_goal_real   : num  1534 30000 45000 5000 19500 ...
# Changing data type and inspecting the dataset
data$launched <- as.Date(data$launched, "%Y-%m-%d")
data$deadline <- as.Date(data$deadline, "%Y-%m-%d")
colSums(is.na(data))
##               ID             name         category    main_category 
##                0                0                0                0 
##         currency         deadline             goal         launched 
##                0                0                0                0 
##          pledged            state          backers          country 
##                0                0                0                0 
##      usd.pledged usd_pledged_real    usd_goal_real 
##             3797                0                0

Our dataset consist of 378661 observations and 15 variables, it has missing values though inside the usd.plegged variable. But, because we won’t use this variable, we will subset this variable, along with usd.pledged, usd_pledged_readl, and usd_goal_real.

To make it more simple, we also will only use the data with USD currency to prevent outliers when doing analysis because of the difference between currency values.

# Drop unused variables
data_clean <- subset(data, select = -c(ID, usd.pledged, usd_pledged_real, usd_goal_real))

# Get only the campaigns that use USD currency
data_clean <- data_clean[data_clean$currency == "USD",]

str(data_clean)
## 'data.frame':    295365 obs. of  11 variables:
##  $ name         : chr  "Greeting From Earth: ZGAC Arts Capsule For ET" "Where is Hank?" "ToshiCapital Rekordz Needs Help to Complete Album" "Community Film Project: The Art of Neighborhood Filmmaking" ...
##  $ category     : chr  "Narrative Film" "Narrative Film" "Music" "Film & Video" ...
##  $ main_category: chr  "Film & Video" "Film & Video" "Music" "Film & Video" ...
##  $ currency     : chr  "USD" "USD" "USD" "USD" ...
##  $ deadline     : Date, format: "2017-11-01" "2013-02-26" ...
##  $ goal         : num  30000 45000 5000 19500 50000 1000 25000 125000 65000 12500 ...
##  $ launched     : Date, format: "2017-09-02" "2013-01-12" ...
##  $ pledged      : num  2421 220 1 1283 52375 ...
##  $ state        : chr  "failed" "failed" "failed" "canceled" ...
##  $ backers      : int  15 3 1 14 224 16 40 58 43 100 ...
##  $ country      : chr  "US" "US" "US" "US" ...
# Create a pie chart to see the portion of each state in our dataset
state <- table(data_clean$state)
pie(state, main = "Portion of Kickstarter Campaign States")

Inside the state variable, we have several states other than failed and successful. From the pie chart, we know that failed and successful states cover more than 50% of our data, therefore to make our analysis more straightforward we will subset our dataset further.

# Remove observations with a state other than failed and successful
data_clean <- data_clean[data_clean$state == "failed" | data_clean$state == "successful",]
unique(data_clean$state)
## [1] "failed"     "successful"

Top categories and sub-categories

On the dataset, we have 2 different categories. We have category and main_category. This means category would be the sub-category or a more specific category compared to the main_category.

Main category

# Get main categories that have the most campaigns on Kickstarter
main_cat <- as.data.frame(sort(table(data_clean$main_category), decreasing = T))
top_main_cat <- data_clean[data_clean$main_category %in% head(main_cat$Var1, 7),]

campaign_state <- top_main_cat$state
main_category <-  top_main_cat$main_category
xtabs(~ campaign_state + main_category, top_main_cat)
##               main_category
## campaign_state   Art Design Film & Video Games Music Publishing Technology
##     failed     10953  10804        27159 10873 18548      18731      13338
##     successful  9496   7681        19791  9356 21788       9965       4724
# Create a barplot for the top main categories with their states separated
barplot(xtabs(~ top_main_cat$state + top_main_cat$main_category, top_main_cat), cex.names = .75, legend = T)

Based on the data, the main category that has the most campaign is Film & Video, with more than 50000 campaigns on Kickstarter. But, as we can see most of the projects ended up being failed including the Film & Video main category. The only main category that has more successful campaigns is Music.

Sub-category

# subset the dataset to music main category
data_music <- data_clean[data_clean$main_category == "Music",]
head(data_music)
##                                                    name   category
## 4     ToshiCapital Rekordz Needs Help to Complete Album      Music
## 12                                     Lisa Lim New CD! Indie Rock
## 19                  Mike Corey's Darkness & Light Album      Music
## 26 Matt Cavenaugh & Jenny Powers make their 1st album!       Music
## 35              Chris Eger Band - New Nashville Record!      Music
## 37                           Arrows & Sound Debut Album Indie Rock
##    main_category currency   deadline  goal   launched  pledged      state
## 4          Music      USD 2012-04-16  5000 2012-03-17     1.00     failed
## 12         Music      USD 2013-04-08 12500 2013-03-09 12700.00 successful
## 19         Music      USD 2012-08-17   250 2012-08-02   250.00 successful
## 26         Music      USD 2011-01-06 10000 2010-12-07 15827.00 successful
## 35         Music      USD 2014-08-13 12000 2014-07-14 13260.00 successful
## 37         Music      USD 2012-05-19  4000 2012-04-19  8641.34 successful
##    backers country
## 4        1      US
## 12     100      US
## 19       7      US
## 26     147      US
## 35      92      US
## 37     157      US
# Music unique genres
names(data_music)[names(data_music) == "category"] <- "genre"
unique(data_music$genre)
##  [1] "Music"            "Indie Rock"       "Pop"              "Rock"            
##  [5] "Jazz"             "Electronic Music" "Metal"            "Hip-Hop"         
##  [9] "World Music"      "Punk"             "Classical Music"  "Country & Folk"  
## [13] "R&B"              "Faith"            "Latin"            "Kids"            
## [17] "Blues"            "Comedy"           "Chiptune"

Same as the main category, we will try to figure out what music category / genre has the most campaign on Kickstarter.

# Get the top music genre on Kickstarter
genre <- as.data.frame(sort(table(data_music$genre), decreasing = T))
head(genre)
##             Var1  Freq
## 1          Music 11037
## 2           Rock  5650
## 3     Indie Rock  5008
## 4 Country & Folk  3888
## 5        Hip-Hop  3160
## 6            Pop  2632
# Create a barplot for the top main categories with their states separated
top_genre <- data_music[data_music$genre %in% head(genre$Var1),]
barplot(xtabs(~ top_genre$state + top_genre$genre, top_genre), legend = T, cex.names = .75, 
        main = "Top music genres with states separated")

Unfortunately, most of the Music projects submitted on Kickstarter don’t have a specific genre and only categorized as Music instead. Therefore, we will not explore these genres deeper.

Ideal fund goal and deadline date

According to Kickstarter, campaigns can last anywhere from 1 - 60 days. As for the fund goal, there is no limit for the creator to set their fund goal.

Deadline date

The most common assumption when gathering money is the longer the time we got, the more support we got. In this case, the longer deadline the better. But, is this assumption can be used when starting a campaign on Kickstarter?

# Make a new variable to store the campaign duration (days)
data_clean$campaign_duration <- as.numeric(data_clean$deadline - data_clean$launched)

# Campaign duration distribution
data_clean_success <- data_clean[data_clean$state == "successful",]
hist(data_clean_success$campaign_duration, xlim=c(0,60), col="dark grey", 
     xlab="Campaign Duration (days)", 
     ylab="Number of submission", 
     main="Distribution of Success Campaigns")

# The average duration of success campaigns (days)
mean(data_clean_success$campaign_duration)
## [1] 32.39186

According to the distribution, we can say that a longer deadline will not guarantee our campaign will succeed. The best deadline to set for our campaign will be around 32 days since the campaign launched. This also means that to get fully funded, you don’t have to set a long deadline.

Fund goal

The expected amount of money becomes a consideration factor for people who want to take part in your projects. We will see if the less money you expected, the higher your chance to have your campaign succeed.

# Create a boxplot to see the distribution of goal based on the campaign states
boxplot(data_clean$goal ~ data_clean$state, data = data_clean, outline=FALSE, 
        main = "Distribution of campaigns' goals",
        ylab = "Goal",
        xlab = "State")

As we can see from the boxplots, if we took out the outlier number, we can clearly see that successful campaigns tend to have lower goals compared to the failed ones.

The most desirable product

We already try to find out what categories have the most submission on Kickstarter. However, we get this result based on the number of submissions on Kickstarter, so we can’t say that those categories are the best yet. We also have to see from the perspective of the backers. To answer this, we have to consider the number of backers participated in each campaign from different category and sub-category.

# Number of campaigns
freq <- as.data.frame(xtabs(~ main_category + category, data = data_clean))

# NUmber of backers
freq$backers <- as.data.frame(xtabs(backers ~ main_category + category, data = data_clean))$Freq

# The average number of backers from each campaign
freq$avg_backers <- freq$backers / freq$Freq
head(freq[order(freq$avg_backers, decreasing = T),], 10)
##      main_category         category Freq backers avg_backers
## 326          Music         Chiptune   22   14147    643.0455
## 254     Technology Camera Equipment  253  149975    592.7866
## 1994    Technology            Sound  357  203035    568.7255
## 1228    Publishing      Letterpress    4    2232    558.0000
## 2049         Games   Tabletop Games 8939 4968628    555.8371
## 954          Games  Gaming Hardware  204  113170    554.7549
## 2249    Technology        Wearables  674  361124    535.7923
## 2229         Games      Video Games 6250 3302201    528.3522
## 929     Technology          Gadgets 1613  684253    424.2114
## 2165        Design       Typography   62   25847    416.8871

Although the difference between the number of submissions from each category and main_category is huge, we can still clearly see the most desirable product by simply dividing the number of submissions (Freq) with the numbers of backers participated. And, as we can see from the perspective of the backers, the most desirable product to be funded is Music products with more specific, Chiptune genre and has an average backer of 643 backers per campaign.

Conclusion

Now, what we can conclude from the Kickstarter dataset?

  • According to our analysis above, we found out that Film & Video, Games, Music, Publishing, Art, and Technology are the top 7 main categories that have the most campaigns on Kickstarter with more than 18.000 campaigns each.
  • Among the top main categories, we also found that Music is the only one with a success rate of more than 50%, but most of its campaigns don’t have specific genres.
  • The best campaign duration is around 32 days, as the most successful campaign has this kind of time frame.
  • Most successful campaigns also have a low amount of money as the goal. This makes the project’s budget more achievable to be fully funded faster and makes the campaign successful.
  • From the perspective of the backers, the most desirable category is Music, Technology, Publishing, and Games, because these categories successfully achieve more than 500 backers for each of its campaigns in several sub-categories.