ABSTRACT

In the following exploratory analysis of Kickstarter data, I found that the most common campaigns are those in Film & Video. The least common campaigns are for Dance. Despite Film & Video being the most populated category, Music has a higher proportion of successful campaigns with a statistically significant margin. Through evaluating the amount of kickstarters each year, I found that the number of kickstarters peaked in 2015 and has since steadily declined. The category with the highest number of backers was Games followed by Design. Between the two, Games held a stastically significantly higher number of backers overall.

INTRODUCTION

Kickstarter is a crowd-funding corporation that allows people of the general public to financially contribute to creative projects they would like to support. In this exploration of kickstarter data, I will evaluate the most and least common Kickstarter campaign categories, varying levels of success in each category, and the popularity of kickstarter over the years 2009-2020.

DATA AND METHODOLOGY

The data that is evaluated is observational data collected via web scrape - allowing the original author to gather data on all kickstarters in this period of time (2009-2020). As this was a collection of data from the kickstarter website, there was not necessarily a sampling scheme. There are 506,199 kickstarter campaigns evaluated in this data set with many different variables and units. Variables included in this data set are category, subcategory, project location, launched and deadline dates, currency goals and currency pledged, type of currency, state of the kickstarter(successful, non-successful, etc.), and number of backers. Throughout my exploration of the data, I created additional columns from exisiting data to help me best represent the information visually in my graphics. The units in this data set include monetary values of multiple different currencies and count numbers for categories and backers.

As I am not the original collector of the data, some limitations arise. I am unaware of any bias in data collection and I am unsure if there were any implemented controls to limit bias in data collection process. I know that data was collected via web scrape, but I do not know exactly how the study was carried out or if any information was omitted in the collection process.

RESULTS

In the above graphic, the kickstarter campaign categories in the dataset are sorted by frequency. The most common kickstarter category is film & video, followed closely by both music and games. On the lower end are crafts, journalism, and, as the least common category, dance.

This graphic further evaluates the most common kickstarter categories with their number of successful fundraisers. A successful kickstarter is defined by meeting the monetary goal set by the organizer at it’s onset. Though film & video is the most common category, the category with the most successful outcomes is music.

## Adding missing grouping variables: `SUCCESS`
## # A tibble: 194,458 × 3
## # Groups:   CATEGORY, SUCCESS [15]
##    SUCCESS CATEGORY     SUCCESS_COUNT
##    <chr>   <chr>                <int>
##  1 YES     Games                24028
##  2 YES     Games                24028
##  3 YES     Games                24028
##  4 YES     Games                24028
##  5 YES     Technology            9423
##  6 YES     Film & Video         28597
##  7 YES     Games                24028
##  8 YES     Design               17000
##  9 YES     Film & Video         28597
## 10 YES     Technology            9423
## # ℹ 194,448 more rows
## # A tibble: 75,808 × 2
## # Groups:   CATEGORY [1]
##    CATEGORY     CATEGORY_COUNT
##    <chr>                 <int>
##  1 Film & Video          75808
##  2 Film & Video          75808
##  3 Film & Video          75808
##  4 Film & Video          75808
##  5 Film & Video          75808
##  6 Film & Video          75808
##  7 Film & Video          75808
##  8 Film & Video          75808
##  9 Film & Video          75808
## 10 Film & Video          75808
## # ℹ 75,798 more rows
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(x_m, x_fv) out of c(n_m, n_fv)
## X-squared = 2268.8, df = 1, p-value < 2.2e-16
## alternative hypothesis: greater
## 95 percent confidence interval:
##  0.1228494 1.0000000
## sample estimates:
##    prop 1    prop 2 
## 0.5003239 0.3730903

Through a 2-proportion z-test, I evaluated if the the proportion of successful campaigns in music was significantly different than those in film and video. Assumptions for a 2-proportion z-test were satisfied as my data had a large enough sample size, independent groups, and independent observations. I chose these two categories given as they are the two most popular - but music was more successful than film & video, which was more common. My results showed a p-value of < 2.2e-16, rendering my results significant. There is a statisically significant difference between successful campaigns in music as opposed to successful campaigns in film & video.

However, the top three categories with the highest number of backers/supporters are different than the most common types of kickstarters. The category with the most supporters is Games by a large margin. Other largely-backed categories include Design and, as expected, Film & Video.

## 
##  Welch Two Sample t-test
## 
## data:  BACKERS_COUNT by CATEGORY
## t = -19.458, df = 92494, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group Design and group Games is less than 0
## 95 percent confidence interval:
##       -Inf -20.72102
## sample estimates:
## mean in group Design  mean in group Games 
##             96.30805            118.94246

Above, I have run a two-sample t-test to evaluate the differences in backers for both the Games category and Design categories (the two with the most backers). My t-test had a p-value of < 2.2e-16, showing statistically significant results. The assumptions for this test were met as there were independent groups (games vs. design), the observations in each group were independent, the sample is random (taken across all collected data in ds), and there is normality in both groups.

Kickstarter has had more popular years than others. Kickstarter was founded in April of 2009 and rose in popularity consistently until it’s peak in 2015. From then to 2020, the fundraising website has faced a steady decline in campaigns.

## Adding missing grouping variables: `CATEGORY`
## # A tibble: 2,057 × 3
## # Groups:   CATEGORY, LAUNCH_YEAR [1]
##    CATEGORY LAUNCH_YEAR MUSIC_COUNT
##    <chr>          <dbl>       <int>
##  1 Music           2020        2057
##  2 Music           2020        2057
##  3 Music           2020        2057
##  4 Music           2020        2057
##  5 Music           2020        2057
##  6 Music           2020        2057
##  7 Music           2020        2057
##  8 Music           2020        2057
##  9 Music           2020        2057
## 10 Music           2020        2057
## # ℹ 2,047 more rows

## 
## Call:
## lm(formula = Count ~ Year, data = df)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -5405  -1914    240   2528   3584 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) 166328.76  510600.26   0.326    0.751
## Year           -79.95     253.46  -0.315    0.759
## 
## Residual standard error: 3031 on 10 degrees of freedom
## Multiple R-squared:  0.009851,   Adjusted R-squared:  -0.08916 
## F-statistic: 0.09949 on 1 and 10 DF,  p-value: 0.7589

Above, I have run a linear regression test on the number of music kickstarters per year.

## `geom_smooth()` using formula = 'y ~ x'

In the above graphic, I have fit a linear model to number of music kickstarters over the years 2009-2020. It is interesting that overall number of kickstarters peaked in 2015, but the number of music kickstarters peaked in 2012.

CONCLUSION

The most common kickstarter category was Film & Video, though the category with the most successful kickstarters was Music. The most widely-supported kickstarter campaigns are those for games. The least common kickstarter category was Dance, and Journalism held the title of lowest number of successful campaigns. I found that there was a statistically significant different in the number of successes vs. overall campaigns in the music category vs. the film & video category through a 2-proportion z test. The category with the least supporters overall was Dance while Games bragged a high number of backers. I found through a 2-factor t-test that between the two most highly-backed categories, Games and Design, Games held a stastically significantly higher number of backers. Kickstarter peaked in usage in 2015 and has faced a steady decline in popularity since - despite having risen in popularity from it’s initial founding in 2009 to it’s peak in 2015.

For further data exploration, it would be interesting to evaluate each category over the years to see if they matched up with peaking in 2015 as the overall number of kickstarters did.

APPENDIX