In the following exploratory analysis of Kickstarter data, I found that the most common campaigns are those in Film & Video. The least common campaigns are for Dance. Despite Film & Video being the most populated category, Music has a higher proportion of successful campaigns with a statistically significant margin. Through evaluating the amount of kickstarters each year, I found that the number of kickstarters peaked in 2015 and has since steadily declined. The category with the highest number of backers was Games followed by Design. Between the two, Games held a stastically significantly higher number of backers overall.
Kickstarter is a crowd-funding corporation that allows people of the general public to financially contribute to creative projects they would like to support. In this exploration of kickstarter data, I will evaluate the most and least common Kickstarter campaign categories, varying levels of success in each category, and the popularity of kickstarter over the years 2009-2020.
The data that is evaluated is observational data collected via web scrape - allowing the original author to gather data on all kickstarters in this period of time (2009-2020). As this was a collection of data from the kickstarter website, there was not necessarily a sampling scheme. There are 506,199 kickstarter campaigns evaluated in this data set with many different variables and units. Variables included in this data set are category, subcategory, project location, launched and deadline dates, currency goals and currency pledged, type of currency, state of the kickstarter(successful, non-successful, etc.), and number of backers. Throughout my exploration of the data, I created additional columns from exisiting data to help me best represent the information visually in my graphics. The units in this data set include monetary values of multiple different currencies and count numbers for categories and backers.
As I am not the original collector of the data, some limitations arise. I am unaware of any bias in data collection and I am unsure if there were any implemented controls to limit bias in data collection process. I know that data was collected via web scrape, but I do not know exactly how the study was carried out or if any information was omitted in the collection process.
In the above graphic, the kickstarter campaign categories in the dataset are sorted by frequency. The most common kickstarter category is film & video, followed closely by both music and games. On the lower end are crafts, journalism, and, as the least common category, dance.
This graphic further evaluates the most common kickstarter categories with their number of successful fundraisers. A successful kickstarter is defined by meeting the monetary goal set by the organizer at it’s onset. Though film & video is the most common category, the category with the most successful outcomes is music.
## Adding missing grouping variables: `SUCCESS`
## # A tibble: 194,458 × 3
## # Groups: CATEGORY, SUCCESS [15]
## SUCCESS CATEGORY SUCCESS_COUNT
## <chr> <chr> <int>
## 1 YES Games 24028
## 2 YES Games 24028
## 3 YES Games 24028
## 4 YES Games 24028
## 5 YES Technology 9423
## 6 YES Film & Video 28597
## 7 YES Games 24028
## 8 YES Design 17000
## 9 YES Film & Video 28597
## 10 YES Technology 9423
## # ℹ 194,448 more rows
## # A tibble: 75,808 × 2
## # Groups: CATEGORY [1]
## CATEGORY CATEGORY_COUNT
## <chr> <int>
## 1 Film & Video 75808
## 2 Film & Video 75808
## 3 Film & Video 75808
## 4 Film & Video 75808
## 5 Film & Video 75808
## 6 Film & Video 75808
## 7 Film & Video 75808
## 8 Film & Video 75808
## 9 Film & Video 75808
## 10 Film & Video 75808
## # ℹ 75,798 more rows
##
## 2-sample test for equality of proportions with continuity correction
##
## data: c(x_m, x_fv) out of c(n_m, n_fv)
## X-squared = 2268.8, df = 1, p-value < 2.2e-16
## alternative hypothesis: greater
## 95 percent confidence interval:
## 0.1228494 1.0000000
## sample estimates:
## prop 1 prop 2
## 0.5003239 0.3730903
Through a 2-proportion z-test, I evaluated if the the proportion of successful campaigns in music was significantly different than those in film and video. Assumptions for a 2-proportion z-test were satisfied as my data had a large enough sample size, independent groups, and independent observations. I chose these two categories given as they are the two most popular - but music was more successful than film & video, which was more common. My results showed a p-value of < 2.2e-16, rendering my results significant. There is a statisically significant difference between successful campaigns in music as opposed to successful campaigns in film & video.
However, the top three categories with the highest number of
backers/supporters are different than the most common types of
kickstarters. The category with the most supporters is Games by a large
margin. Other largely-backed categories include Design and, as expected,
Film & Video.
##
## Welch Two Sample t-test
##
## data: BACKERS_COUNT by CATEGORY
## t = -19.458, df = 92494, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group Design and group Games is less than 0
## 95 percent confidence interval:
## -Inf -20.72102
## sample estimates:
## mean in group Design mean in group Games
## 96.30805 118.94246
Above, I have run a two-sample t-test to evaluate the differences in backers for both the Games category and Design categories (the two with the most backers). My t-test had a p-value of < 2.2e-16, showing statistically significant results. The assumptions for this test were met as there were independent groups (games vs. design), the observations in each group were independent, the sample is random (taken across all collected data in ds), and there is normality in both groups.
Kickstarter has had more popular years than others. Kickstarter was
founded in April of 2009 and rose in popularity consistently until it’s
peak in 2015. From then to 2020, the fundraising website has faced a
steady decline in campaigns.
## Adding missing grouping variables: `CATEGORY`
## # A tibble: 2,057 × 3
## # Groups: CATEGORY, LAUNCH_YEAR [1]
## CATEGORY LAUNCH_YEAR MUSIC_COUNT
## <chr> <dbl> <int>
## 1 Music 2020 2057
## 2 Music 2020 2057
## 3 Music 2020 2057
## 4 Music 2020 2057
## 5 Music 2020 2057
## 6 Music 2020 2057
## 7 Music 2020 2057
## 8 Music 2020 2057
## 9 Music 2020 2057
## 10 Music 2020 2057
## # ℹ 2,047 more rows
##
## Call:
## lm(formula = Count ~ Year, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5405 -1914 240 2528 3584
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 166328.76 510600.26 0.326 0.751
## Year -79.95 253.46 -0.315 0.759
##
## Residual standard error: 3031 on 10 degrees of freedom
## Multiple R-squared: 0.009851, Adjusted R-squared: -0.08916
## F-statistic: 0.09949 on 1 and 10 DF, p-value: 0.7589
Above, I have run a linear regression test on the number of music kickstarters per year.
## `geom_smooth()` using formula = 'y ~ x'
In the above graphic, I have fit a linear model to number of music kickstarters over the years 2009-2020. It is interesting that overall number of kickstarters peaked in 2015, but the number of music kickstarters peaked in 2012.
The most common kickstarter category was Film & Video, though the category with the most successful kickstarters was Music. The most widely-supported kickstarter campaigns are those for games. The least common kickstarter category was Dance, and Journalism held the title of lowest number of successful campaigns. I found that there was a statistically significant different in the number of successes vs. overall campaigns in the music category vs. the film & video category through a 2-proportion z test. The category with the least supporters overall was Dance while Games bragged a high number of backers. I found through a 2-factor t-test that between the two most highly-backed categories, Games and Design, Games held a stastically significantly higher number of backers. Kickstarter peaked in usage in 2015 and has faced a steady decline in popularity since - despite having risen in popularity from it’s initial founding in 2009 to it’s peak in 2015.
For further data exploration, it would be interesting to evaluate each category over the years to see if they matched up with peaking in 2015 as the overall number of kickstarters did.