DS 804: Communication of Data
Kickstarter is a community of creators and backers. Our goal for this project was to analyze whether COVID-19 is having an impact on people supporting various campaigns on Kickstarter. After performing some exploratory analysis on the data, we developed and answered specific research questions relevant to the main goal.
The data itself is obtained from webrobots.io, who use a scraper robot to crawl all Kickstarter projects and collects data in CSV format. We implemented data from March and November of 2019 and 2020 to get a reasonable idea of the landscape of Kickstarter campaigns pre-COVID and post-COVID.
The methodology utilized for importing these datasets is as follows: each folder with the month’s individual .csv files was loaded using the “list.files” function. Using the pattern of “.csv”, the “lapply” function of “read_csv” read in each individual file which was then binded into one large dataset using the “bind_rows” function. Doing this for each month’s folders of CSV files left us with four datasets in .csv format to use for our analysis.
glimpse(march2019_data)
## Rows: 209,222
## Columns: 38
## $ X1 <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
## $ backers_count <dbl> 21, 97, 88, 193, 20, 77, 15, 1, 2, 31, 11, 1…
## $ blurb <chr> "2006 was almost 7 years ago.... Can you bel…
## $ category <chr> "{\"id\":43,\"name\":\"Rock\",\"slug\":\"mus…
## $ converted_pledged_amount <dbl> 802, 2259, 29638, 49158, 549, 2117, 886, 30,…
## $ country <chr> "US", "US", "US", "IT", "US", "GB", "US", "U…
## $ created_at <dbl> 1387659690, 1549659768, 1477242384, 15403699…
## $ creator <chr> "{\"id\":1495925645,\"name\":\"Daniel\",\"is…
## $ currency <chr> "USD", "USD", "USD", "EUR", "USD", "GBP", "U…
## $ currency_symbol <chr> "$", "$", "$", "€", "$", "£", "$", "$", "$",…
## $ currency_trailing_code <lgl> TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, …
## $ current_currency <chr> "USD", "USD", "USD", "USD", "USD", "USD", "U…
## $ deadline <dbl> 1391899046, 1551801611, 1480607930, 15443099…
## $ disable_communication <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
## $ friends <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ fx_rate <dbl> 1.0000000, 1.0000000, 1.0000000, 1.1284326, …
## $ goal <dbl> 200, 400, 27224, 40000, 1000, 400, 850, 2000…
## $ id <dbl> 287514992, 385129759, 681033598, 1031782682,…
## $ is_backing <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ is_starrable <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
## $ is_starred <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ launched_at <dbl> 1388011046, 1550073611, 1478012330, 15406845…
## $ location <chr> "{\"id\":2379574,\"name\":\"Chicago\",\"slug…
## $ name <chr> "New Final Round Album", "Princess Pals Enam…
## $ permissions <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ photo <chr> "{\"key\":\"assets/011/625/534/5bea1760d7f20…
## $ pledged <dbl> 802, 2259, 29638, 43180, 549, 1509, 886, 30,…
## $ profile <chr> "{\"id\":822687,\"project_id\":822687,\"stat…
## $ slug <chr> "new-final-round-album", "princess-pals-enam…
## $ source_url <chr> "https://www.kickstarter.com/discover/catego…
## $ spotlight <lgl> TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, F…
## $ staff_pick <lgl> FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FAL…
## $ state <chr> "successful", "successful", "successful", "s…
## $ state_changed_at <dbl> 1391899046, 1551801611, 1480607932, 15443099…
## $ static_usd_rate <dbl> 1.0000000, 1.0000000, 1.0000000, 1.1365251, …
## $ urls <chr> "{\"web\":{\"project\":\"https://www.kicksta…
## $ usd_pledged <dbl> 802.0000, 2259.0000, 29638.0000, 49075.1525,…
## $ usd_type <chr> "international", "international", "internati…
march2019_data %>%
select(state) %>%
count(state) %>%
mutate(percentage = n/sum(n)*100)
## # A tibble: 5 x 3
## state n percentage
## * <chr> <int> <dbl>
## 1 canceled 8624 4.12
## 2 failed 75199 35.9
## 3 live 7311 3.49
## 4 successful 117465 56.1
## 5 suspended 623 0.298
nov2019_data %>%
select(state) %>%
count(state) %>%
mutate(percentage = n/sum(n)*100)
## # A tibble: 5 x 3
## state n percentage
## * <chr> <int> <dbl>
## 1 canceled 8681 4.11
## 2 failed 75100 35.5
## 3 live 7350 3.48
## 4 successful 119652 56.6
## 5 suspended 641 0.303
march2020_data %>%
select(state) %>%
count(state) %>%
mutate(percentage = n/sum(n)*100)
## # A tibble: 4 x 3
## state n percentage
## * <chr> <int> <dbl>
## 1 canceled 8896 4.12
## 2 failed 75732 35.1
## 3 live 7260 3.36
## 4 successful 124089 57.5
nov2020_data %>%
select(state) %>%
count(state)%>%
mutate(percentage = n/sum(n)*100)
## # A tibble: 4 x 3
## state n percentage
## * <chr> <int> <dbl>
## 1 canceled 9137 4.29
## 2 failed 74319 34.9
## 3 live 6214 2.91
## 4 successful 123523 57.9
We were most concerned with projects which either were fully successful or fully failed. Therefore, we filtered out all projects which fell under the states, “canceled”, “live”, and “suspended”. This resulted in more relevant analyses and faster processing times.
nov2020_bool <- nov2020_data %>%
filter(state == 'successful' | state == 'failed')
nov2019_bool <- nov2019_data %>%
filter(state == 'successful' | state == 'failed')
march2020_bool <- march2020_data %>%
filter(state == 'successful' | state == 'failed')
march2019_bool <- march2019_data %>%
filter(state == 'successful' | state == 'failed')
To clean the location and category variables, we used the “gsub” function to replace any non-letter characters with nothing (essentially deleting any non-letter characters). After this, we used the “str_match” function to match a wildcard string that is between two other identifier strings. Using this syntax, we were able to select only the exact category, city, and state abbreviation names and redefine those categories to include the cleaned data. We also created a new variable called ”time-period” that would categorize each dataset into one of four time periods, those being ”March 2019”, ”November 2019”, ”March 2020”, and ”November 2020” by repeating that string of text by however many rows there were in each dataset. This would assist us in categorizing by time period for later analyses. This methodology was applied to each month’s dataset, we are only including one for the sake of redundancy.
march2019_bool_cleaned <- march2019_bool %>%
mutate(march2019_category_clean = gsub("[^a-zA-Z]", "", category)) %>%
mutate(march2019_category_new = str_match(march2019_category_clean, "idname\\s*(.*?)\\s*slug")) %>%
mutate(march2019_category = march2019_category_new[,2]) %>%
mutate(category = march2019_category) %>%
mutate(march2019_location_clean = gsub("[^a-zA-Z]", "", location)) %>%
mutate(march2019_location_new = str_match(march2019_location_clean, "idname\\s*(.*?)\\s*slug")) %>%
mutate(march2019_location = march2019_location_new[,2]) %>%
mutate(city = march2019_location) %>%
mutate(march2019_location_abbr = gsub("[^a-zA-Z]", "", location)) %>%
mutate(march2019_abbr_new = str_match(march2019_location_abbr, "state\\s*(.*?)\\s*type")) %>%
mutate(abbr = march2019_abbr_new[,2]) %>%
mutate(time_period = rep("March 2019", 192664)) %>%
select(X1:usd_type, city, abbr, time_period)
march2019_bool %>%
mutate(march2019_category_clean = gsub("[^a-zA-Z]", "", category)) %>%
mutate(march2019_location_clean = gsub("[^a-zA-Z]", "", location)) %>%
select(march2019_location_clean, march2019_category_clean)
## # A tibble: 192,664 x 2
## march2019_location_clean march2019_category_clean
## <chr> <chr>
## 1 idnameChicagoslugchicagoilshortnameCh… idnameRockslugmusicrockpositionparent…
## 2 idnameSacramentoslugsacramentocashort… idnameMixedMediaslugartmixedmediaposi…
## 3 idnameColumbusslugcolumbusohshortname… idnamePhotobooksslugphotographyphotob…
## 4 idnameVeniceslugvenicevenicevenetosho… idnameFootwearslugfashionfootwearposi…
## 5 idnameRedmondslugredmondwashortnameRe… idnameSoftwareslugtechnologysoftwarep…
## 6 idnameCardiganslugcardigangbshortname… idnameAccessoriesslugfashionaccessori…
## 7 idnameLosAngelessluglosangelescashort… idnameNonfictionslugpublishingnonfict…
## 8 idnameFortCollinsslugfortcollinscosho… idnameSoftwareslugtechnologysoftwarep…
## 9 idnameRaleighslugraleighncshortnameRa… idnameSculptureslugartsculpturepositi…
## 10 idnameLosAngelessluglosangelescashort… idnamePlayingCardssluggamesplayingcar…
## # … with 192,654 more rows
nov2019_bool_cleaned <- nov2019_bool %>%
mutate(nov2019_category_clean = gsub("[^a-zA-Z]", "", category)) %>%
mutate(nov2019_category_new = str_match(nov2019_category_clean, "idname\\s*(.*?)\\s*slug")) %>%
mutate(nov2019_category = nov2019_category_new[,2]) %>%
mutate(category = nov2019_category) %>%
mutate(nov2019_location_clean = gsub("[^a-zA-Z]", "", location)) %>%
mutate(nov2019_location_new = str_match(nov2019_location_clean, "idname\\s*(.*?)\\s*slug")) %>%
mutate(nov2019_location = nov2019_location_new[,2]) %>%
mutate(city = nov2019_location) %>%
mutate(nov2019_location_abbr = gsub("[^a-zA-Z]", "", location)) %>%
mutate(nov2019_abbr_new = str_match(nov2019_location_abbr, "state\\s*(.*?)\\s*type")) %>%
mutate(abbr = nov2019_abbr_new[,2]) %>%
mutate(time_period = rep("November 2019", 194752)) %>%
select(X1:usd_type, city, abbr, time_period)
march2020_bool_cleaned <- march2020_bool %>%
mutate(march2020_category_clean = gsub("[^a-zA-Z]", "", category)) %>%
mutate(march2020_category_new = str_match(march2020_category_clean, "idname\\s*(.*?)\\s*slug")) %>%
mutate(march2020_category = march2020_category_new[,2]) %>%
mutate(category = march2020_category) %>%
mutate(march2020_location_clean = gsub("[^a-zA-Z]", "", location)) %>%
mutate(march2020_location_new = str_match(march2020_location_clean, "idname\\s*(.*?)\\s*slug")) %>%
mutate(march2020_location = march2020_location_new[,2]) %>%
mutate(city = march2020_location) %>%
mutate(march2020_location_abbr = gsub("[^a-zA-Z]", "", location)) %>%
mutate(march2020_abbr_new = str_match(march2020_location_abbr, "state\\s*(.*?)\\s*type")) %>%
mutate(abbr = march2020_abbr_new[,2]) %>%
mutate(time_period = rep("March 2020", 199821)) %>%
select(X1:usd_type, city, abbr, time_period)
nov2020_bool_cleaned <- nov2020_bool %>%
mutate(nov2020_category_clean = gsub("[^a-zA-Z]", "", category)) %>%
mutate(nov2020_category_new = str_match(nov2020_category_clean, "idname\\s*(.*?)\\s*slug")) %>%
mutate(nov2020_category = nov2020_category_new[,2]) %>%
mutate(category = nov2020_category) %>%
mutate(nov2020_location_clean = gsub("[^a-zA-Z]", "", location)) %>%
mutate(nov2020_location_new = str_match(nov2020_location_clean, "idname\\s*(.*?)\\s*slug")) %>%
mutate(nov2020_location = nov2020_location_new[,2]) %>%
mutate(city = nov2020_location) %>%
mutate(nov2020_location_abbr = gsub("[^a-zA-Z]", "", location)) %>%
mutate(nov2020_abbr_new = str_match(nov2020_location_abbr, "state\\s*(.*?)\\s*type")) %>%
mutate(abbr = nov2020_abbr_new[,2]) %>%
mutate(time_period = rep("November 2020", 197842)) %>%
select(X1:usd_type, city, abbr, time_period)
march2019_bool_cleaned %>%
select(category, city, abbr)
## # A tibble: 192,664 x 3
## category city abbr
## <chr> <chr> <chr>
## 1 Rock Chicago IL
## 2 MixedMedia Sacramento CA
## 3 Photobooks Columbus OH
## 4 Footwear Venice Veneto
## 5 Software Redmond WA
## 6 Accessories Cardigan Wales
## 7 Nonfiction LosAngeles CA
## 8 Software FortCollins CO
## 9 Sculpture Raleigh NC
## 10 PlayingCards LosAngeles CA
## # … with 192,654 more rows
march2019_bool_cleaned %>%
select(state) %>%
count(state) %>%
mutate(percentage = n/sum(n))
## # A tibble: 2 x 3
## state n percentage
## * <chr> <int> <dbl>
## 1 failed 75199 0.390
## 2 successful 117465 0.610
nov2019_bool_cleaned %>%
select(state) %>%
count(state) %>%
mutate(percentage = n/sum(n))
## # A tibble: 2 x 3
## state n percentage
## * <chr> <int> <dbl>
## 1 failed 75100 0.386
## 2 successful 119652 0.614
march2020_bool_cleaned %>%
select(state) %>%
count(state) %>%
mutate(percentage = n/sum(n))
## # A tibble: 2 x 3
## state n percentage
## * <chr> <int> <dbl>
## 1 failed 75732 0.379
## 2 successful 124089 0.621
nov2020_bool_cleaned %>%
select(state) %>%
count(state)%>%
mutate(percentage = n/sum(n))
## # A tibble: 2 x 3
## state n percentage
## * <chr> <int> <dbl>
## 1 failed 74319 0.376
## 2 successful 123523 0.624
march2019_bool %>%
select(state, goal) %>%
group_by(state) %>%
summarize(avg_goal = mean(goal))
## # A tibble: 2 x 2
## state avg_goal
## * <chr> <dbl>
## 1 failed 94514.
## 2 successful 13774.
nov2019_bool %>%
select(state, goal) %>%
group_by(state) %>%
summarize(avg_goal = mean(goal))
## # A tibble: 2 x 2
## state avg_goal
## * <chr> <dbl>
## 1 failed 94693.
## 2 successful 12617.
march2020_bool %>%
select(state, goal) %>%
group_by(state) %>%
summarize(avg_goal = mean(goal))
## # A tibble: 2 x 2
## state avg_goal
## * <chr> <dbl>
## 1 failed 99726.
## 2 successful 14112.
nov2020_bool %>%
select(state, goal) %>%
group_by(state) %>%
summarize(avg_goal = mean(goal))
## # A tibble: 2 x 2
## state avg_goal
## * <chr> <dbl>
## 1 failed 99453.
## 2 successful 15109.
We initially anticipated seeing a decrease as unemployment numbers are rising, however we hypothesize that this increase in successful projects may be due to the higher amount of disposable income being saved by those who retained their jobs; by not going out as much, middle/higher income earners may be looking for new things to devote their free time to and may see these Kickstarter projects as new hobbies to take up during this time.
Further investigating this research question, we decided to look at the average goal for all projects by month. What we found was that consistently across each month in both years, the average goal of failed projects was substantially higher than the average goal of successful projects. The output for this analysis can be found in Figure 4 in the Appendix. While the obvious reason for this is that the lower the project goal, the easier it is to reach that goal, there may be some psychological effect that a project’s goal plays on the individual. Even if someone might be very interested in a project, if they see that it has a high goal, they might consider it to be unrealistic and pointless to donate; therefore, projects with higher goals have even less of a chance of reaching that goal.
cat1 <- march2019_bool_cleaned %>%
select(category, state) %>%
dplyr::group_by(category) %>%
mutate(success_rate = length(state[state == "successful"])/n()*100) %>%
mutate(num_projects=n()) %>%
select(category, success_rate, num_projects) %>%
arrange(desc(success_rate))
cat1_clean <- cat1[!duplicated(cat1$category),]
print(cat1_clean, n=26)
## # A tibble: 159 x 3
## # Groups: category [159]
## category success_rate num_projects
## <chr> <dbl> <int>
## 1 Rock 100 2544
## 2 Nonfiction 100 2610
## 3 CountryFolk 100 2642
## 4 Theater 100 558
## 5 Photography 100 631
## 6 Music 100 721
## 7 IndieRock 100 2532
## 8 Crafts 100 1196
## 9 Fashion 100 173
## 10 VideoGames 100 2435
## 11 TabletopGames 100 3355
## 12 Technology 100 329
## 13 Games 100 54
## 14 Shorts 100 2697
## 15 Fiction 100 2573
## 16 Dance 100 1474
## 17 Comics 100 353
## 18 ComicBooks 100 3267
## 19 ChildrensBooks 100 2755
## 20 ProductDesign 100 3711
## 21 Food 100 511
## 22 Design 100 191
## 23 Art 100 836
## 24 Documentary 100 2707
## 25 Publishing 100 293
## 26 FilmVideo 100 242
## # … with 133 more rows
cat2 <- nov2019_bool_cleaned %>%
select(category, state) %>%
dplyr::group_by(category) %>%
mutate(success_rate = length(state[state == "successful"])/n()*100) %>%
mutate(num_projects=n()) %>%
select(category, success_rate, num_projects) %>%
arrange(desc(success_rate))
cat2_clean <- cat2[!duplicated(cat2$category),]
print(cat2_clean, n=31)
## # A tibble: 160 x 3
## # Groups: category [160]
## category success_rate num_projects
## <chr> <dbl> <int>
## 1 Accessories 100 3451
## 2 Documentary 100 2739
## 3 ComicBooks 100 3334
## 4 Illustration 100 2857
## 5 Theater 100 545
## 6 IndieRock 100 2539
## 7 Fiction 100 2615
## 8 Photography 100 504
## 9 ProductDesign 100 3563
## 10 Dance 100 1413
## 11 Design 100 196
## 12 FilmVideo 100 223
## 13 Rock 100 2556
## 14 Nonfiction 100 2600
## 15 TabletopGames 100 3388
## 16 CountryFolk 100 2641
## 17 Shorts 100 2748
## 18 Comics 100 288
## 19 Technology 100 327
## 20 ChildrensBooks 100 2807
## 21 Fashion 100 151
## 22 ScienceFiction 100 62
## 23 Publishing 100 253
## 24 Food 100 517
## 25 Music 100 696
## 26 Crafts 100 1115
## 27 Punk 100 31
## 28 Art 100 857
## 29 Residencies 100 60
## 30 Games 100 63
## 31 VideoGames 100 179
## # … with 129 more rows
cat3 <- march2020_bool_cleaned %>%
select(category, state) %>%
dplyr::group_by(category) %>%
mutate(success_rate = length(state[state == "successful"])/n()*100) %>%
mutate(num_projects=n()) %>%
select(category, success_rate, num_projects) %>%
arrange(desc(success_rate))
cat3_clean <- cat3[!duplicated(cat3$category),]
print(cat3_clean, n=29)
## # A tibble: 160 x 3
## # Groups: category [160]
## category success_rate num_projects
## <chr> <dbl> <int>
## 1 ProductDesign 100 3777
## 2 Design 100 200
## 3 TabletopGames 100 3372
## 4 Rock 100 2554
## 5 Music 100 690
## 6 Accessories 100 3499
## 7 ComicBooks 100 3353
## 8 Comics 100 291
## 9 Illustration 100 2933
## 10 Documentary 100 2692
## 11 CountryFolk 100 2636
## 12 Shorts 100 2704
## 13 Art 100 815
## 14 IndieRock 100 2545
## 15 Dance 100 1387
## 16 Food 100 520
## 17 Fiction 100 2609
## 18 ChildrensBooks 100 2799
## 19 Publishing 100 260
## 20 Nonfiction 100 2608
## 21 VideoGames 100 2396
## 22 Theater 100 540
## 23 Fashion 100 147
## 24 FilmVideo 100 215
## 25 Technology 100 334
## 26 Photography 100 460
## 27 Translations 100 3
## 28 Crafts 100 1087
## 29 Games 100 69
## # … with 131 more rows
cat4 <- nov2020_bool_cleaned %>%
select(category, state) %>%
dplyr::group_by(category) %>%
mutate(success_rate = length(state[state == "successful"])/n()*100) %>%
mutate(num_projects=n()) %>%
select(category, success_rate, num_projects) %>%
arrange(desc(success_rate))
cat4_clean <- cat4[!duplicated(cat4$category),]
print(cat4_clean, n=29)
## # A tibble: 160 x 3
## # Groups: category [160]
## category success_rate num_projects
## <chr> <dbl> <int>
## 1 ChildrensBooks 100 2850
## 2 Crafts 100 1080
## 3 ProductDesign 100 3685
## 4 Design 100 174
## 5 VideoGames 100 2417
## 6 Accessories 100 3547
## 7 Rock 100 2381
## 8 Documentary 100 2768
## 9 Fashion 100 127
## 10 IndieRock 100 2380
## 11 Dance 100 1377
## 12 Fiction 100 2617
## 13 Photography 100 447
## 14 Food 100 543
## 15 Vegan 100 171
## 16 Animation 100 111
## 17 Nonfiction 100 2588
## 18 FilmVideo 100 218
## 19 Shorts 100 2814
## 20 Theater 100 548
## 21 Illustration 100 2991
## 22 ComicBooks 100 3316
## 23 Comics 100 273
## 24 Technology 100 323
## 25 TabletopGames 100 1169
## 26 Publishing 100 262
## 27 CountryFolk 100 2380
## 28 Art 100 872
## 29 Games 100 54
## # … with 131 more rows
cat1_low <- march2019_bool_cleaned %>%
select(category, state) %>%
dplyr::group_by(category) %>%
mutate(success_rate = length(state[state == "successful"])/n()*100) %>%
mutate(num_projects=n()) %>%
select(category, success_rate, num_projects) %>%
arrange(success_rate)
cat1_low_clean <- cat1_low[!duplicated(cat1_low$category),]
print(cat1_low_clean, n=20)
## # A tibble: 159 x 3
## # Groups: category [159]
## category success_rate num_projects
## <chr> <dbl> <int>
## 1 MobileGames 13.2 2387
## 2 Video 18.2 617
## 3 FoodTrucks 18.2 1881
## 4 Web 18.3 4322
## 5 Couture 18.5 265
## 6 Software 20.4 2447
## 7 Television 20.6 1023
## 8 Bacon 22.3 188
## 9 LiveGames 22.5 1020
## 10 Action 22.7 754
## 11 Apps 22.8 2422
## 12 Flight 23.1 385
## 13 InteractiveDesign 23.7 401
## 14 Readytowear 24 950
## 15 FarmersMarkets 24.7 461
## 16 Candles 25.6 516
## 17 Photo 25.8 279
## 18 Family 26.0 358
## 19 Translations 26.5 166
## 20 Academic 27.4 952
## # … with 139 more rows
cat2_low <- nov2019_bool_cleaned %>%
select(category, state) %>%
dplyr::group_by(category) %>%
mutate(success_rate = length(state[state == "successful"])/n()*100) %>%
mutate(num_projects=n()) %>%
select(category, success_rate, num_projects) %>%
arrange(success_rate)
cat2_low_clean <- cat2_low[!duplicated(cat2_low$category),]
print(cat2_low_clean, n=20)
## # A tibble: 160 x 3
## # Groups: category [160]
## category success_rate num_projects
## <chr> <dbl> <int>
## 1 MobileGames 13.7 2378
## 2 FoodTrucks 17.3 1959
## 3 Couture 17.7 288
## 4 Web 18.9 4350
## 5 Video 19.7 640
## 6 Television 20.3 1063
## 7 Software 21.5 2441
## 8 Bacon 22.0 191
## 9 Action 22.3 806
## 10 Flight 22.5 395
## 11 LiveGames 22.7 1081
## 12 FarmersMarkets 23.1 464
## 13 InteractiveDesign 24.8 428
## 14 Apps 24.9 2409
## 15 Candles 25.4 560
## 16 Readytowear 25.4 1036
## 17 Translations 25.7 175
## 18 Photo 26.4 280
## 19 Family 26.7 378
## 20 Academic 27.0 1024
## # … with 140 more rows
cat3_low <- march2020_bool_cleaned %>%
select(category, state) %>%
dplyr::group_by(category) %>%
mutate(success_rate = length(state[state == "successful"])/n()*100) %>%
mutate(num_projects=n()) %>%
select(category, success_rate, num_projects) %>%
arrange(success_rate)
cat3_low_clean <- cat3_low[!duplicated(cat3_low$category),]
print(cat3_low_clean, n=20)
## # A tibble: 160 x 3
## # Groups: category [160]
## category success_rate num_projects
## <chr> <dbl> <int>
## 1 MobileGames 14.2 2387
## 2 FoodTrucks 17.1 2005
## 3 Web 19.3 4367
## 4 Couture 19.5 303
## 5 Television 20.2 1085
## 6 Video 20.6 631
## 7 Bacon 21.5 191
## 8 Action 21.8 818
## 9 Software 21.9 2447
## 10 Flight 22.6 399
## 11 FarmersMarkets 22.9 468
## 12 LiveGames 23.3 1108
## 13 InteractiveDesign 25.4 445
## 14 Readytowear 25.9 1065
## 15 Candles 26.0 585
## 16 Apps 26.0 2416
## 17 Academic 26.0 1048
## 18 Photo 26.8 284
## 19 Family 27.2 393
## 20 Architecture 28.1 797
## # … with 140 more rows
cat4_low <- nov2020_bool_cleaned %>%
select(category, state) %>%
dplyr::group_by(category) %>%
mutate(success_rate = length(state[state == "successful"])/n()*100) %>%
mutate(num_projects=n()) %>%
select(category, success_rate, num_projects) %>%
arrange(success_rate)
cat4_low_clean <- cat4_low[!duplicated(cat4_low$category),]
print(cat4_low_clean, n=20)
## # A tibble: 160 x 3
## # Groups: category [160]
## category success_rate num_projects
## <chr> <dbl> <int>
## 1 MobileGames 14.7 2389
## 2 FoodTrucks 17.3 2082
## 3 Couture 19.6 311
## 4 Television 19.9 1118
## 5 Web 20.2 4386
## 6 Video 20.9 642
## 7 Bacon 21.4 192
## 8 Flight 22.0 409
## 9 Action 22.1 848
## 10 FarmersMarkets 22.2 474
## 11 Software 22.6 2448
## 12 LiveGames 24.2 1163
## 13 InteractiveDesign 24.8 467
## 14 Readytowear 25.4 1117
## 15 Translations 26.3 186
## 16 Academic 26.5 1100
## 17 Candles 27.0 630
## 18 Architecture 27.1 807
## 19 Photo 27.4 277
## 20 Events 27.7 936
## # … with 140 more rows
We compared these lists to each other to get an idea of how successful or unsuccessful certain categories generally were month-to-month. From March 2020 to November 2020, the top categories that had the most dramatic positive change in success were “Children’s Books”, “Crafts”, “Fashion”, and “Video Games”. The top categories which had the most dramatic negative change in success were “Comic Books”, “Comics”, and “Tabletop Games”. There was no dramatic change in the lowest 20 categories, but it is interesting to note that for each month, “Mobile Games” was the lowest category.
From this data, we may conclude that many of the categories with the lowest success rates fall into one of two general distinctions: tech, and niche interests. We believe that many tech projects fail because of the comprehension that is required to fully appreciate the utility of certain tech projects (softwares, web, mobile games, etc). Projects for niche interests also tend to fail more often due to the challenge of finding donors who share that interest; some examples of niche interest categories consistently found in the bottom 20 included bacon, flight, candles, farmers markets, food trucks, and interactive design).
We could also hypothesize that the category of successful projects is relevant to current societal events, such as the Coronavirus pandemic. As people quarantine and spend more and more time at home, they may find themselves trying to keep busy with things like fashion or video games, whilst also dealing with children being at home instead of school; this would explain the increase of successful children’s book and craft projects.
march2019_sentiment <- march2019_bool_cleaned %>%
unnest_tokens(word, blurb) %>%
inner_join(get_sentiments("afinn"), by= "word") %>%
group_by(name) %>%
summarize(sentiment = mean(value), words = n()) %>%
ungroup()
march2019_bool_cleaned %>%
inner_join(march2019_sentiment, by = "name") %>%
select(category, state, blurb, sentiment)
## # A tibble: 131,629 x 4
## category state blurb sentiment
## <chr> <chr> <chr> <dbl>
## 1 MixedMedia success… "An adorable fantasy enamel pin series of pri… 3
## 2 Photobooks success… "Helping a community come together to set the… 1.5
## 3 Software failed "Learn to build 10+ Applications in this comp… 2
## 4 Accessories success… "'Eclipse' - A 30mm hard enamel pin in jet bl… 0
## 5 Software failed "Let's build and remix a new Personal Web The… 4
## 6 Sculpture failed "A limited edition, signed and numbered hand … 0
## 7 PlayingCar… failed "Play this hilarious, nonsensical card game f… 1.5
## 8 Footwear success… "Award Winning Footwear Designs | Crafted Usi… 2.75
## 9 Footwear success… "NAKEFIT: Hypoallergenic adhesive pad for wal… 1
## 10 PlayingCar… failed "OVERTHROW THE KING AND TAKE WHAT IS YOURS & … 0.75
## # … with 131,619 more rows
march2019_graph <- march2019_bool_cleaned %>%
inner_join(march2019_sentiment, by = "name") %>%
select(state, sentiment)
ggplot(march2019_graph, aes(sentiment, state)) + geom_boxplot()
nov2019_sentiment <- nov2019_bool_cleaned %>%
unnest_tokens(word, blurb) %>%
inner_join(get_sentiments("afinn"), by= "word") %>%
group_by(name) %>%
summarize(sentiment = mean(value), words = n()) %>%
ungroup()
nov2019_bool_cleaned %>%
inner_join(nov2019_sentiment, by = "name") %>%
select(category, state, blurb, sentiment)
## # A tibble: 130,325 x 4
## category state blurb sentiment
## <chr> <chr> <chr> <dbl>
## 1 PublicArt success… Help support the legendary B.O.O.G.A. mutant… 2
## 2 Software failed 21m breaches in 2016 alone. Presenting a vid… -2
## 3 Spaces success… We are opening our first restaurant in Cambe… 2
## 4 DIY failed Sac Super Duty construit pour la longévité,… 3
## 5 DIY success… Great decor and gifts for any occasion! 3
## 6 ClassicalMu… failed Inedited & original music for Guitar's Orche… 2
## 7 Apparel success… A new online portal where you can discover f… 3
## 8 NarrativeFi… success… An award-winning ensemble drama charting the… 3
## 9 Ceramics failed I'm excited to introduce my Salad serving sp… 3
## 10 Software failed To calculate extremely rare Plato's periodi… 2
## # … with 130,315 more rows
nov2019_graph <- nov2019_bool_cleaned %>%
inner_join(nov2019_sentiment, by = "name") %>%
select(state, sentiment)
ggplot(nov2019_graph, aes(sentiment, state)) + geom_boxplot()
march2020_sentiment <- march2020_bool_cleaned %>%
unnest_tokens(word, blurb) %>%
inner_join(get_sentiments("afinn"), by= "word") %>%
group_by(name) %>%
summarize(sentiment = mean(value), words = n()) %>%
ungroup()
march2020_bool_cleaned %>%
inner_join(march2020_sentiment, by = "name") %>%
select(category, state, blurb, sentiment)
## # A tibble: 133,014 x 4
## category state blurb sentiment
## <chr> <chr> <chr> <dbl>
## 1 Web failed "Dedicated to allow Christian Musicians, Pod… 1.5
## 2 Faith success… "New Music from Marty Mikles! A new EP all … 2.5
## 3 DIY failed "Expand greenhouse and plants. I offer wide … 1
## 4 Periodicals failed "A lifestyle magazine for the intelligent an… 2
## 5 Periodicals success… "A new, full-colour magazine devoted to life… 3
## 6 Blues success… "A new CD of ten original blues-soul songs b… 4
## 7 Faith failed "An EP of 7 Original Neoclassical Piano Comp… 2
## 8 ClassicalMu… success… "An album of the previously unrecorded piano… 2
## 9 Periodicals failed "Everyone has a story. Let's make it a Cove… 2
## 10 ElectronicM… success… "Help Bluetech release the first volume of S… 2
## # … with 133,004 more rows
march2020_graph <- march2020_bool_cleaned %>%
inner_join(march2020_sentiment, by = "name") %>%
select(state, sentiment)
ggplot(march2020_graph, aes(sentiment, state)) + geom_boxplot()
nov2020_sentiment <- nov2020_bool_cleaned %>%
unnest_tokens(word, blurb) %>%
inner_join(get_sentiments("afinn"), by= "word") %>%
group_by(name) %>%
summarize(sentiment = mean(value), words = n()) %>%
ungroup()
nov2020_bool_cleaned %>%
inner_join(nov2020_sentiment, by = "name") %>%
select(category, state, blurb, sentiment)
## # A tibble: 130,428 x 4
## category state blurb sentiment
## <chr> <chr> <chr> <dbl>
## 1 Jazz failed "This project is designed to help protect th… 1.67
## 2 PublicArt success… "Help send me to Washington DC where I will … 2
## 3 Makerspaces failed "Help us built a sustainable studio & elimin… 2
## 4 PublicArt failed "\"If I paint something, I don't want to hav… 1
## 5 ChildrensBo… success… "An illustrated collection of short horror s… -2
## 6 Stationery success… "Make your holidays a little stranger with t… 2.5
## 7 Restaurants failed "Our free app will allow you pool reservatio… -0.2
## 8 ChildrensBo… success… "A tender portrait of Afro-Caribbean family … 2
## 9 ChildrensBo… success… "Penelope's Petition teaches children about … -3
## 10 NarrativeFi… success… "What if your dreams attacked you?\nFollow N… 0
## # … with 130,418 more rows
nov2020_graph <- nov2020_bool_cleaned %>%
inner_join(nov2020_sentiment, by = "name") %>%
select(state, sentiment)
ggplot(nov2020_graph, aes(sentiment, state)) + geom_boxplot()
if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")
mar19 <- ggplot(march2019_graph, aes(sentiment, state)) + geom_boxplot()
nov19 <- ggplot(nov2019_graph, aes(sentiment, state)) + geom_boxplot()
mar20 <- ggplot(march2020_graph, aes(sentiment, state)) + geom_boxplot()
nov20 <- ggplot(nov2020_graph, aes(sentiment, state)) + geom_boxplot()
library(ggpubr)
ggarrange(mar19, nov19, mar20, nov20 + rremove("x.text"),
labels = c("Mar 2019", "Nov 2019", "Mar 2020", "Nov 2020"),
ncol = 2, nrow = 2)
Here, we performed a sentiment analysis on the blurbs of Kickstarter projects. To do this, we started by unnesting the tokens in the blurb to create a vector of all words used in each blurb. From there, we performed an inner join with the “afinn” sentiment lexicon and the unnested blurbs and grouped those results by project name (“name” variable). We then summarized this variable with the “sentiment” value, which was defined as the mean sentiment values for all words in each blurb. This data was set to a new variable called “monthyear_sentiment”, with month and year pertaining to the respective values (“march2019_sentiment, etc.). From there, we created a variable in preparation for visualizing the sentiment scores called “monthyear_graph” which included the “monthyear_sentiment” data joining with the “monthyear_bool_cleaned data” and selecting only the “state” and “sentiment” columns.
Using ggplot2’s boxplot, we were able to visualize the distribution of sentiment scores between both successful and unsuccessful projects. In doing this, we found that for all four months, the median sentiment score for successful projects was slightly higher than the median sentiment score for unsuccessful projects. Additionally, there was a more polarized distribution of sentiment scores within the failed projects, meaning that while successful projects had sentiment scores that were neither extremely positive nor negative, failed projects had sentiment scores that were very concentrated on either end of the plot. We realize that while a negative sentiment score does not necessarily indicate that a blurb was written with a negative or rude tone, it may indicate that using words with more negative connotations (even in a positive way) could lower the likelihood for that project to reach its goal.
march2019_spotlight <- march2019_bool_cleaned %>%
select(spotlight, usd_pledged) %>%
group_by(spotlight) %>%
mutate(mean_pledged = mean(usd_pledged)) %>%
select(spotlight, mean_pledged) %>%
group_by(spotlight)
march2019_spotlight <- march2019_spotlight[!duplicated(march2019_spotlight$spotlight),]
nov2019_spotlight <- nov2019_bool_cleaned %>%
select(spotlight, usd_pledged) %>%
group_by(spotlight) %>%
mutate(mean_pledged = mean(usd_pledged)) %>%
select(spotlight, mean_pledged) %>%
group_by(spotlight)
nov2019_spotlight <- nov2019_spotlight[!duplicated(nov2019_spotlight$spotlight),]
march2020_spotlight <- march2020_bool_cleaned %>%
select(spotlight, usd_pledged) %>%
group_by(spotlight) %>%
mutate(mean_pledged = mean(usd_pledged)) %>%
select(spotlight, mean_pledged) %>%
group_by(spotlight)
march2020_spotlight <- march2020_spotlight[!duplicated(march2020_spotlight$spotlight),]
nov2020_spotlight <- nov2020_bool_cleaned %>%
select(spotlight, usd_pledged) %>%
group_by(spotlight) %>%
mutate(mean_pledged = mean(usd_pledged)) %>%
select(spotlight, mean_pledged) %>%
group_by(spotlight)
nov2020_spotlight <- nov2020_spotlight[!duplicated(nov2020_spotlight$spotlight),]
march2019_spotlight
## # A tibble: 2 x 2
## # Groups: spotlight [2]
## spotlight mean_pledged
## <lgl> <dbl>
## 1 TRUE 21474.
## 2 FALSE 1087.
nov2019_spotlight
## # A tibble: 2 x 2
## # Groups: spotlight [2]
## spotlight mean_pledged
## <lgl> <dbl>
## 1 TRUE 21149.
## 2 FALSE 1073.
march2020_spotlight
## # A tibble: 2 x 2
## # Groups: spotlight [2]
## spotlight mean_pledged
## <lgl> <dbl>
## 1 TRUE 22047.
## 2 FALSE 1070.
nov2020_spotlight
## # A tibble: 2 x 2
## # Groups: spotlight [2]
## spotlight mean_pledged
## <lgl> <dbl>
## 1 TRUE 22888.
## 2 FALSE 1034.
Here, we analyzed whether spotlight Kickstarter campaigns receive a higher “usd_pledged” amount. To do this, we calculated the average amount of “usd_pledged” for Kickstarter campaigns that were in the spotlight section of the website versus the campaigns that were not within the months of March and November of 2019 and 2020.
Our research indicated that spotlight campaigns receive a higher amount pledged. All four months (March and November of 2019 and 2020) received between $20,000-$21,000 more for utilizing the spotlight feature.
The main findings from this analysis are as follows: while the average amount per pledge has decreased over the last two years, COVID-19 may have actually increased the popularity of certain campaign categories. Things like children’s books, crafts, and video games saw a major increase in successful campaigns over last year. Some categories, however, seem to be consistently less successful, regardless of outside factors. Specifically, the ‘Mobile Games’ category had the lowest number of successful projects across all four time periods. Included in these low-success categories are tech-oriented categories such as web, software, and apps, as well as categories with more niche interests like farmer’s markets, candles, and bacon. We can conclude from these findings that the more niche or advanced in nature a category is, the harder time it will have in securing funding.
Lastly, our analysis leads us to believe that creating a “Spotlight” page for a Kickstarter campaign will play a major role in securing funding, as on average, campaigns with spotlight pages secure approximately $20,000 more in funding than campaigns without a spotlight page. This knowledge would benefit both Kickstarter and creators, as Kickstarter takes a small percentage of any successful projects, and projects are more likely to achieve more funding, and therefore, succeed with spotlight pages.