Let’s get funded on Kickstarter!
Before you start reading my analysis below, let me thank you first for being willing to see the results of my writing, and it means a lot to me.
Whatever you will see in this analysis, is the result of my study in Programming for Data Science and Practical Statistics class at Algoritma Academy. To see what I’ve learned in more detail, you can visit the Algoritma Academy learning syllabus.
Everything I have written is entirely my personal opinion based on my experience and knowledge up until now. If something is not right or missing, please feel free to contact me, I’d love to discuss it with you. Thank you.
Background and motivation
This dataset is about Kickstarter, a platform to raise money for creative projects published by Mickaël Mouillé and last updated in 2018. To download the dataset and see the details, you can visit Kaggle.
Background
One of the biggest obstacles to developing a project or business is money. sometimes, we already have an idea for our business, and maybe we already know how the business will run, but due to lack of funds, our business just doesn’t work.
Because of that reason, Kickstarter born. It’s a platform, where people can share their new visions for creative work with the communities that will come together to fund them. Simply, it’s a platform to bring creative projects to life.
Motivation
The main objective of this analysis is to try to summarize what insights we got from this particular dataset and try our best to understand how we can launch our project successfully with the power of the Kickstarter platform.
First look on the data
# Data Input and Checking Data
data <- read.csv("data/kickstarter.csv")
str(data)## 'data.frame': 378661 obs. of 15 variables:
## $ ID : int 1000002330 1000003930 1000004038 1000007540 1000011046 1000014025 1000023410 1000030581 1000034518 100004195 ...
## $ name : chr "The Songs of Adelaide & Abullah" "Greeting From Earth: ZGAC Arts Capsule For ET" "Where is Hank?" "ToshiCapital Rekordz Needs Help to Complete Album" ...
## $ category : chr "Poetry" "Narrative Film" "Narrative Film" "Music" ...
## $ main_category : chr "Publishing" "Film & Video" "Film & Video" "Music" ...
## $ currency : chr "GBP" "USD" "USD" "USD" ...
## $ deadline : chr "2015-10-09" "2017-11-01" "2013-02-26" "2012-04-16" ...
## $ goal : num 1000 30000 45000 5000 19500 50000 1000 25000 125000 65000 ...
## $ launched : chr "2015-08-11 12:12:28" "2017-09-02 04:43:57" "2013-01-12 00:20:50" "2012-03-17 03:24:11" ...
## $ pledged : num 0 2421 220 1 1283 ...
## $ state : chr "failed" "failed" "failed" "failed" ...
## $ backers : int 0 15 3 1 14 224 16 40 58 43 ...
## $ country : chr "GB" "US" "US" "US" ...
## $ usd.pledged : num 0 100 220 1 1283 ...
## $ usd_pledged_real: num 0 2421 220 1 1283 ...
## $ usd_goal_real : num 1534 30000 45000 5000 19500 ...
# Changing data type and inspecting the dataset
data$launched <- as.Date(data$launched, "%Y-%m-%d")
data$deadline <- as.Date(data$deadline, "%Y-%m-%d")
colSums(is.na(data))## ID name category main_category
## 0 0 0 0
## currency deadline goal launched
## 0 0 0 0
## pledged state backers country
## 0 0 0 0
## usd.pledged usd_pledged_real usd_goal_real
## 3797 0 0
Our dataset consist of 378661 observations and 15 variables, it has missing values though inside the usd.plegged variable. But, because we won’t use this variable, we will subset this variable, along with usd.pledged, usd_pledged_readl, and usd_goal_real.
To make it more simple, we also will only use the data with USD currency to prevent outliers when doing analysis because of the difference between currency values.
# Drop unused variables
data_clean <- subset(data, select = -c(ID, usd.pledged, usd_pledged_real, usd_goal_real))
# Get only the campaigns that use USD currency
data_clean <- data_clean[data_clean$currency == "USD",]
str(data_clean)## 'data.frame': 295365 obs. of 11 variables:
## $ name : chr "Greeting From Earth: ZGAC Arts Capsule For ET" "Where is Hank?" "ToshiCapital Rekordz Needs Help to Complete Album" "Community Film Project: The Art of Neighborhood Filmmaking" ...
## $ category : chr "Narrative Film" "Narrative Film" "Music" "Film & Video" ...
## $ main_category: chr "Film & Video" "Film & Video" "Music" "Film & Video" ...
## $ currency : chr "USD" "USD" "USD" "USD" ...
## $ deadline : Date, format: "2017-11-01" "2013-02-26" ...
## $ goal : num 30000 45000 5000 19500 50000 1000 25000 125000 65000 12500 ...
## $ launched : Date, format: "2017-09-02" "2013-01-12" ...
## $ pledged : num 2421 220 1 1283 52375 ...
## $ state : chr "failed" "failed" "failed" "canceled" ...
## $ backers : int 15 3 1 14 224 16 40 58 43 100 ...
## $ country : chr "US" "US" "US" "US" ...
# Create a pie chart to see the portion of each state in our dataset
state <- table(data_clean$state)
pie(state, main = "Portion of Kickstarter Campaign States") Inside the state variable, we have several states other than
failed and successful. From the pie chart, we know that failed and successful states cover more than 50% of our data, therefore to make our analysis more straightforward we will subset our dataset further.
# Remove observations with a state other than failed and successful
data_clean <- data_clean[data_clean$state == "failed" | data_clean$state == "successful",]
unique(data_clean$state)## [1] "failed" "successful"
Top categories and sub-categories
On the dataset, we have 2 different categories. We have category and main_category. This means category would be the sub-category or a more specific category compared to the main_category.
Main category
# Get main categories that have the most campaigns on Kickstarter
main_cat <- as.data.frame(sort(table(data_clean$main_category), decreasing = T))
top_main_cat <- data_clean[data_clean$main_category %in% head(main_cat$Var1, 7),]
campaign_state <- top_main_cat$state
main_category <- top_main_cat$main_category
xtabs(~ campaign_state + main_category, top_main_cat)## main_category
## campaign_state Art Design Film & Video Games Music Publishing Technology
## failed 10953 10804 27159 10873 18548 18731 13338
## successful 9496 7681 19791 9356 21788 9965 4724
# Create a barplot for the top main categories with their states separated
barplot(xtabs(~ top_main_cat$state + top_main_cat$main_category, top_main_cat), cex.names = .75, legend = T)Based on the data, the main category that has the most campaign is Film & Video, with more than 50000 campaigns on Kickstarter. But, as we can see most of the projects ended up being failed including the Film & Video main category. The only main category that has more successful campaigns is Music.
Sub-category
# subset the dataset to music main category
data_music <- data_clean[data_clean$main_category == "Music",]
head(data_music)## name category
## 4 ToshiCapital Rekordz Needs Help to Complete Album Music
## 12 Lisa Lim New CD! Indie Rock
## 19 Mike Corey's Darkness & Light Album Music
## 26 Matt Cavenaugh & Jenny Powers make their 1st album! Music
## 35 Chris Eger Band - New Nashville Record! Music
## 37 Arrows & Sound Debut Album Indie Rock
## main_category currency deadline goal launched pledged state
## 4 Music USD 2012-04-16 5000 2012-03-17 1.00 failed
## 12 Music USD 2013-04-08 12500 2013-03-09 12700.00 successful
## 19 Music USD 2012-08-17 250 2012-08-02 250.00 successful
## 26 Music USD 2011-01-06 10000 2010-12-07 15827.00 successful
## 35 Music USD 2014-08-13 12000 2014-07-14 13260.00 successful
## 37 Music USD 2012-05-19 4000 2012-04-19 8641.34 successful
## backers country
## 4 1 US
## 12 100 US
## 19 7 US
## 26 147 US
## 35 92 US
## 37 157 US
# Music unique genres
names(data_music)[names(data_music) == "category"] <- "genre"
unique(data_music$genre)## [1] "Music" "Indie Rock" "Pop" "Rock"
## [5] "Jazz" "Electronic Music" "Metal" "Hip-Hop"
## [9] "World Music" "Punk" "Classical Music" "Country & Folk"
## [13] "R&B" "Faith" "Latin" "Kids"
## [17] "Blues" "Comedy" "Chiptune"
Same as the main category, we will try to figure out what music category / genre has the most campaign on Kickstarter.
# Get the top music genre on Kickstarter
genre <- as.data.frame(sort(table(data_music$genre), decreasing = T))
head(genre)## Var1 Freq
## 1 Music 11037
## 2 Rock 5650
## 3 Indie Rock 5008
## 4 Country & Folk 3888
## 5 Hip-Hop 3160
## 6 Pop 2632
# Create a barplot for the top main categories with their states separated
top_genre <- data_music[data_music$genre %in% head(genre$Var1),]
barplot(xtabs(~ top_genre$state + top_genre$genre, top_genre), legend = T, cex.names = .75,
main = "Top music genres with states separated")Unfortunately, most of the Music projects submitted on Kickstarter don’t have a specific genre and only categorized as Music instead. Therefore, we will not explore these genres deeper.
Ideal fund goal and deadline date
According to Kickstarter, campaigns can last anywhere from 1 - 60 days. As for the fund goal, there is no limit for the creator to set their fund goal.
Deadline date
The most common assumption when gathering money is the longer the time we got, the more support we got. In this case, the longer deadline the better. But, is this assumption can be used when starting a campaign on Kickstarter?
# Make a new variable to store the campaign duration (days)
data_clean$campaign_duration <- as.numeric(data_clean$deadline - data_clean$launched)
# Campaign duration distribution
data_clean_success <- data_clean[data_clean$state == "successful",]
hist(data_clean_success$campaign_duration, xlim=c(0,60), col="dark grey",
xlab="Campaign Duration (days)",
ylab="Number of submission",
main="Distribution of Success Campaigns")# The average duration of success campaigns (days)
mean(data_clean_success$campaign_duration)## [1] 32.39186
According to the distribution, we can say that a longer deadline will not guarantee our campaign will succeed. The best deadline to set for our campaign will be around 32 days since the campaign launched. This also means that to get fully funded, you don’t have to set a long deadline.
Fund goal
The expected amount of money becomes a consideration factor for people who want to take part in your projects. We will see if the less money you expected, the higher your chance to have your campaign succeed.
# Create a boxplot to see the distribution of goal based on the campaign states
boxplot(data_clean$goal ~ data_clean$state, data = data_clean, outline=FALSE,
main = "Distribution of campaigns' goals",
ylab = "Goal",
xlab = "State") As we can see from the boxplots, if we took out the outlier number, we can clearly see that successful campaigns tend to have lower goals compared to the failed ones.
The most desirable product
We already try to find out what categories have the most submission on Kickstarter. However, we get this result based on the number of submissions on Kickstarter, so we can’t say that those categories are the best yet. We also have to see from the perspective of the backers. To answer this, we have to consider the number of backers participated in each campaign from different category and sub-category.
# Number of campaigns
freq <- as.data.frame(xtabs(~ main_category + category, data = data_clean))
# NUmber of backers
freq$backers <- as.data.frame(xtabs(backers ~ main_category + category, data = data_clean))$Freq
# The average number of backers from each campaign
freq$avg_backers <- freq$backers / freq$Freq
head(freq[order(freq$avg_backers, decreasing = T),], 10)## main_category category Freq backers avg_backers
## 326 Music Chiptune 22 14147 643.0455
## 254 Technology Camera Equipment 253 149975 592.7866
## 1994 Technology Sound 357 203035 568.7255
## 1228 Publishing Letterpress 4 2232 558.0000
## 2049 Games Tabletop Games 8939 4968628 555.8371
## 954 Games Gaming Hardware 204 113170 554.7549
## 2249 Technology Wearables 674 361124 535.7923
## 2229 Games Video Games 6250 3302201 528.3522
## 929 Technology Gadgets 1613 684253 424.2114
## 2165 Design Typography 62 25847 416.8871
Although the difference between the number of submissions from each category and main_category is huge, we can still clearly see the most desirable product by simply dividing the number of submissions (Freq) with the numbers of backers participated. And, as we can see from the perspective of the backers, the most desirable product to be funded is Music products with more specific, Chiptune genre and has an average backer of 643 backers per campaign.
Conclusion
Now, what we can conclude from the Kickstarter dataset?
- According to our analysis above, we found out that
Film & Video,Games,Music,Publishing,Art, andTechnologyare the top 7 main categories that have the most campaigns on Kickstarter with more than 18.000 campaigns each. - Among the top main categories, we also found that
Musicis the only one with a success rate of more than 50%, but most of its campaigns don’t have specific genres. - The best campaign duration is around 32 days, as the most successful campaign has this kind of time frame.
- Most successful campaigns also have a low amount of money as the goal. This makes the project’s budget more achievable to be fully funded faster and makes the campaign successful.
- From the perspective of the backers, the most desirable category is
Music,Technology,Publishing, andGames, because these categories successfully achieve more than 500 backers for each of its campaigns in several sub-categories.