DS 804: Communication of Data

Kickstarter is a community of creators and backers. Our goal for this project was to analyze whether COVID-19 is having an impact on people supporting various campaigns on Kickstarter. After performing some exploratory analysis on the data, we developed and answered specific research questions relevant to the main goal.

The data itself is obtained from webrobots.io, who use a scraper robot to crawl all Kickstarter projects and collects data in CSV format. We implemented data from March and November of 2019 and 2020 to get a reasonable idea of the landscape of Kickstarter campaigns pre-COVID and post-COVID.

The methodology utilized for importing these datasets is as follows: each folder with the month’s individual .csv files was loaded using the “list.files” function. Using the pattern of “.csv”, the “lapply” function of “read_csv” read in each individual file which was then binded into one large dataset using the “bind_rows” function. Doing this for each month’s folders of CSV files left us with four datasets in .csv format to use for our analysis.

glimpse(march2019_data)
## Rows: 209,222
## Columns: 38
## $ X1                       <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
## $ backers_count            <dbl> 21, 97, 88, 193, 20, 77, 15, 1, 2, 31, 11, 1…
## $ blurb                    <chr> "2006 was almost 7 years ago.... Can you bel…
## $ category                 <chr> "{\"id\":43,\"name\":\"Rock\",\"slug\":\"mus…
## $ converted_pledged_amount <dbl> 802, 2259, 29638, 49158, 549, 2117, 886, 30,…
## $ country                  <chr> "US", "US", "US", "IT", "US", "GB", "US", "U…
## $ created_at               <dbl> 1387659690, 1549659768, 1477242384, 15403699…
## $ creator                  <chr> "{\"id\":1495925645,\"name\":\"Daniel\",\"is…
## $ currency                 <chr> "USD", "USD", "USD", "EUR", "USD", "GBP", "U…
## $ currency_symbol          <chr> "$", "$", "$", "€", "$", "£", "$", "$", "$",…
## $ currency_trailing_code   <lgl> TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, …
## $ current_currency         <chr> "USD", "USD", "USD", "USD", "USD", "USD", "U…
## $ deadline                 <dbl> 1391899046, 1551801611, 1480607930, 15443099…
## $ disable_communication    <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
## $ friends                  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ fx_rate                  <dbl> 1.0000000, 1.0000000, 1.0000000, 1.1284326, …
## $ goal                     <dbl> 200, 400, 27224, 40000, 1000, 400, 850, 2000…
## $ id                       <dbl> 287514992, 385129759, 681033598, 1031782682,…
## $ is_backing               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ is_starrable             <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
## $ is_starred               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ launched_at              <dbl> 1388011046, 1550073611, 1478012330, 15406845…
## $ location                 <chr> "{\"id\":2379574,\"name\":\"Chicago\",\"slug…
## $ name                     <chr> "New Final Round Album", "Princess Pals Enam…
## $ permissions              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ photo                    <chr> "{\"key\":\"assets/011/625/534/5bea1760d7f20…
## $ pledged                  <dbl> 802, 2259, 29638, 43180, 549, 1509, 886, 30,…
## $ profile                  <chr> "{\"id\":822687,\"project_id\":822687,\"stat…
## $ slug                     <chr> "new-final-round-album", "princess-pals-enam…
## $ source_url               <chr> "https://www.kickstarter.com/discover/catego…
## $ spotlight                <lgl> TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE, F…
## $ staff_pick               <lgl> FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FAL…
## $ state                    <chr> "successful", "successful", "successful", "s…
## $ state_changed_at         <dbl> 1391899046, 1551801611, 1480607932, 15443099…
## $ static_usd_rate          <dbl> 1.0000000, 1.0000000, 1.0000000, 1.1365251, …
## $ urls                     <chr> "{\"web\":{\"project\":\"https://www.kicksta…
## $ usd_pledged              <dbl> 802.0000, 2259.0000, 29638.0000, 49075.1525,…
## $ usd_type                 <chr> "international", "international", "internati…

Exploratory Analysis

Percentage of Campaigns Which Reached Their Goal

march2019_data %>%
  select(state) %>%
  count(state) %>%
  mutate(percentage = n/sum(n)*100)
## # A tibble: 5 x 3
##   state           n percentage
## * <chr>       <int>      <dbl>
## 1 canceled     8624      4.12 
## 2 failed      75199     35.9  
## 3 live         7311      3.49 
## 4 successful 117465     56.1  
## 5 suspended     623      0.298
nov2019_data %>%
  select(state) %>%
  count(state) %>%
  mutate(percentage = n/sum(n)*100)
## # A tibble: 5 x 3
##   state           n percentage
## * <chr>       <int>      <dbl>
## 1 canceled     8681      4.11 
## 2 failed      75100     35.5  
## 3 live         7350      3.48 
## 4 successful 119652     56.6  
## 5 suspended     641      0.303
march2020_data %>%
  select(state) %>%
  count(state) %>%
  mutate(percentage = n/sum(n)*100)
## # A tibble: 4 x 3
##   state           n percentage
## * <chr>       <int>      <dbl>
## 1 canceled     8896       4.12
## 2 failed      75732      35.1 
## 3 live         7260       3.36
## 4 successful 124089      57.5
nov2020_data %>%
  select(state) %>%
  count(state)%>%
  mutate(percentage = n/sum(n)*100)
## # A tibble: 4 x 3
##   state           n percentage
## * <chr>       <int>      <dbl>
## 1 canceled     9137       4.29
## 2 failed      74319      34.9 
## 3 live         6214       2.91
## 4 successful 123523      57.9

Filtering Only Failed and Successful Campaigns

We were most concerned with projects which either were fully successful or fully failed. Therefore, we filtered out all projects which fell under the states, “canceled”, “live”, and “suspended”. This resulted in more relevant analyses and faster processing times.

nov2020_bool <- nov2020_data %>%
  filter(state == 'successful' | state == 'failed')

nov2019_bool <- nov2019_data %>%
  filter(state == 'successful' | state == 'failed')

march2020_bool <- march2020_data %>%
  filter(state == 'successful' | state == 'failed')

march2019_bool <- march2019_data %>%
  filter(state == 'successful' | state == 'failed')

Cleaning Location and Category Variables

To clean the location and category variables, we used the “gsub” function to replace any non-letter characters with nothing (essentially deleting any non-letter characters). After this, we used the “str_match” function to match a wildcard string that is between two other identifier strings. Using this syntax, we were able to select only the exact category, city, and state abbreviation names and redefine those categories to include the cleaned data. We also created a new variable called ”time-period” that would categorize each dataset into one of four time periods, those being ”March 2019”, ”November 2019”, ”March 2020”, and ”November 2020” by repeating that string of text by however many rows there were in each dataset. This would assist us in categorizing by time period for later analyses. This methodology was applied to each month’s dataset, we are only including one for the sake of redundancy.

march2019_bool_cleaned <- march2019_bool %>%
  mutate(march2019_category_clean = gsub("[^a-zA-Z]", "", category)) %>%
  mutate(march2019_category_new = str_match(march2019_category_clean, "idname\\s*(.*?)\\s*slug")) %>%
  mutate(march2019_category = march2019_category_new[,2]) %>%
  mutate(category = march2019_category) %>%
  mutate(march2019_location_clean = gsub("[^a-zA-Z]", "", location)) %>%
  mutate(march2019_location_new = str_match(march2019_location_clean, "idname\\s*(.*?)\\s*slug")) %>%
  mutate(march2019_location = march2019_location_new[,2]) %>%
  mutate(city = march2019_location) %>%
  mutate(march2019_location_abbr = gsub("[^a-zA-Z]", "", location)) %>%
  mutate(march2019_abbr_new = str_match(march2019_location_abbr, "state\\s*(.*?)\\s*type")) %>%
  mutate(abbr = march2019_abbr_new[,2]) %>%
  mutate(time_period = rep("March 2019", 192664)) %>%
  select(X1:usd_type, city, abbr, time_period)

march2019_bool %>%
  mutate(march2019_category_clean = gsub("[^a-zA-Z]", "", category)) %>%
  mutate(march2019_location_clean = gsub("[^a-zA-Z]", "", location)) %>%
  select(march2019_location_clean, march2019_category_clean)
## # A tibble: 192,664 x 2
##    march2019_location_clean               march2019_category_clean              
##    <chr>                                  <chr>                                 
##  1 idnameChicagoslugchicagoilshortnameCh… idnameRockslugmusicrockpositionparent…
##  2 idnameSacramentoslugsacramentocashort… idnameMixedMediaslugartmixedmediaposi…
##  3 idnameColumbusslugcolumbusohshortname… idnamePhotobooksslugphotographyphotob…
##  4 idnameVeniceslugvenicevenicevenetosho… idnameFootwearslugfashionfootwearposi…
##  5 idnameRedmondslugredmondwashortnameRe… idnameSoftwareslugtechnologysoftwarep…
##  6 idnameCardiganslugcardigangbshortname… idnameAccessoriesslugfashionaccessori…
##  7 idnameLosAngelessluglosangelescashort… idnameNonfictionslugpublishingnonfict…
##  8 idnameFortCollinsslugfortcollinscosho… idnameSoftwareslugtechnologysoftwarep…
##  9 idnameRaleighslugraleighncshortnameRa… idnameSculptureslugartsculpturepositi…
## 10 idnameLosAngelessluglosangelescashort… idnamePlayingCardssluggamesplayingcar…
## # … with 192,654 more rows
nov2019_bool_cleaned <- nov2019_bool %>%
  mutate(nov2019_category_clean = gsub("[^a-zA-Z]", "", category)) %>%
  mutate(nov2019_category_new = str_match(nov2019_category_clean, "idname\\s*(.*?)\\s*slug")) %>%
  mutate(nov2019_category = nov2019_category_new[,2]) %>%
  mutate(category = nov2019_category) %>%
  mutate(nov2019_location_clean = gsub("[^a-zA-Z]", "", location)) %>%
  mutate(nov2019_location_new = str_match(nov2019_location_clean, "idname\\s*(.*?)\\s*slug")) %>%
  mutate(nov2019_location = nov2019_location_new[,2]) %>%
  mutate(city = nov2019_location) %>%
  mutate(nov2019_location_abbr = gsub("[^a-zA-Z]", "", location)) %>%
  mutate(nov2019_abbr_new = str_match(nov2019_location_abbr, "state\\s*(.*?)\\s*type")) %>%
  mutate(abbr = nov2019_abbr_new[,2]) %>%
  mutate(time_period = rep("November 2019", 194752)) %>%
  select(X1:usd_type, city, abbr, time_period)

march2020_bool_cleaned <- march2020_bool %>%
  mutate(march2020_category_clean = gsub("[^a-zA-Z]", "", category)) %>%
  mutate(march2020_category_new = str_match(march2020_category_clean, "idname\\s*(.*?)\\s*slug")) %>%
  mutate(march2020_category = march2020_category_new[,2]) %>%
  mutate(category = march2020_category) %>%
  mutate(march2020_location_clean = gsub("[^a-zA-Z]", "", location)) %>%
  mutate(march2020_location_new = str_match(march2020_location_clean, "idname\\s*(.*?)\\s*slug")) %>%
  mutate(march2020_location = march2020_location_new[,2]) %>%
  mutate(city = march2020_location) %>%
  mutate(march2020_location_abbr = gsub("[^a-zA-Z]", "", location)) %>%
  mutate(march2020_abbr_new = str_match(march2020_location_abbr, "state\\s*(.*?)\\s*type")) %>%
  mutate(abbr = march2020_abbr_new[,2]) %>%
  mutate(time_period = rep("March 2020", 199821)) %>%
  select(X1:usd_type, city, abbr, time_period)

nov2020_bool_cleaned <- nov2020_bool %>%
  mutate(nov2020_category_clean = gsub("[^a-zA-Z]", "", category)) %>%
  mutate(nov2020_category_new = str_match(nov2020_category_clean, "idname\\s*(.*?)\\s*slug")) %>%
  mutate(nov2020_category = nov2020_category_new[,2]) %>%
  mutate(category = nov2020_category) %>%
  mutate(nov2020_location_clean = gsub("[^a-zA-Z]", "", location)) %>%
  mutate(nov2020_location_new = str_match(nov2020_location_clean, "idname\\s*(.*?)\\s*slug")) %>%
  mutate(nov2020_location = nov2020_location_new[,2]) %>%
  mutate(city = nov2020_location) %>%
  mutate(nov2020_location_abbr = gsub("[^a-zA-Z]", "", location)) %>%
  mutate(nov2020_abbr_new = str_match(nov2020_location_abbr, "state\\s*(.*?)\\s*type")) %>%
  mutate(abbr = nov2020_abbr_new[,2]) %>%
  mutate(time_period = rep("November 2020", 197842)) %>%
  select(X1:usd_type, city, abbr, time_period)

march2019_bool_cleaned %>%
  select(category, city, abbr)
## # A tibble: 192,664 x 3
##    category     city        abbr  
##    <chr>        <chr>       <chr> 
##  1 Rock         Chicago     IL    
##  2 MixedMedia   Sacramento  CA    
##  3 Photobooks   Columbus    OH    
##  4 Footwear     Venice      Veneto
##  5 Software     Redmond     WA    
##  6 Accessories  Cardigan    Wales 
##  7 Nonfiction   LosAngeles  CA    
##  8 Software     FortCollins CO    
##  9 Sculpture    Raleigh     NC    
## 10 PlayingCards LosAngeles  CA    
## # … with 192,654 more rows

Percentage of Successful and Failed Campaigns

march2019_bool_cleaned %>%
  select(state) %>%
  count(state) %>%
  mutate(percentage = n/sum(n))
## # A tibble: 2 x 3
##   state           n percentage
## * <chr>       <int>      <dbl>
## 1 failed      75199      0.390
## 2 successful 117465      0.610
nov2019_bool_cleaned %>%
  select(state) %>%
  count(state) %>%
  mutate(percentage = n/sum(n))
## # A tibble: 2 x 3
##   state           n percentage
## * <chr>       <int>      <dbl>
## 1 failed      75100      0.386
## 2 successful 119652      0.614
march2020_bool_cleaned %>%
  select(state) %>%
  count(state) %>%
  mutate(percentage = n/sum(n))
## # A tibble: 2 x 3
##   state           n percentage
## * <chr>       <int>      <dbl>
## 1 failed      75732      0.379
## 2 successful 124089      0.621
nov2020_bool_cleaned %>%
  select(state) %>%
  count(state)%>%
  mutate(percentage = n/sum(n))
## # A tibble: 2 x 3
##   state           n percentage
## * <chr>       <int>      <dbl>
## 1 failed      74319      0.376
## 2 successful 123523      0.624

Difference in Average Goal Between Successful and Failed Campaigns

march2019_bool %>%
  select(state, goal) %>%
  group_by(state) %>%
  summarize(avg_goal = mean(goal))
## # A tibble: 2 x 2
##   state      avg_goal
## * <chr>         <dbl>
## 1 failed       94514.
## 2 successful   13774.
nov2019_bool %>%
  select(state, goal) %>%
  group_by(state) %>%
  summarize(avg_goal = mean(goal))
## # A tibble: 2 x 2
##   state      avg_goal
## * <chr>         <dbl>
## 1 failed       94693.
## 2 successful   12617.
march2020_bool %>%
  select(state, goal) %>%
  group_by(state) %>%
  summarize(avg_goal = mean(goal))
## # A tibble: 2 x 2
##   state      avg_goal
## * <chr>         <dbl>
## 1 failed       99726.
## 2 successful   14112.
nov2020_bool %>%
  select(state, goal) %>%
  group_by(state) %>%
  summarize(avg_goal = mean(goal))
## # A tibble: 2 x 2
##   state      avg_goal
## * <chr>         <dbl>
## 1 failed       99453.
## 2 successful   15109.

We initially anticipated seeing a decrease as unemployment numbers are rising, however we hypothesize that this increase in successful projects may be due to the higher amount of disposable income being saved by those who retained their jobs; by not going out as much, middle/higher income earners may be looking for new things to devote their free time to and may see these Kickstarter projects as new hobbies to take up during this time.

Further investigating this research question, we decided to look at the average goal for all projects by month. What we found was that consistently across each month in both years, the average goal of failed projects was substantially higher than the average goal of successful projects. The output for this analysis can be found in Figure 4 in the Appendix. While the obvious reason for this is that the lower the project goal, the easier it is to reach that goal, there may be some psychological effect that a project’s goal plays on the individual. Even if someone might be very interested in a project, if they see that it has a high goal, they might consider it to be unrealistic and pointless to donate; therefore, projects with higher goals have even less of a chance of reaching that goal.

Success Rate by Campaign Category

cat1 <- march2019_bool_cleaned %>%
  select(category, state) %>%
  dplyr::group_by(category) %>%
  mutate(success_rate = length(state[state == "successful"])/n()*100) %>%
  mutate(num_projects=n()) %>%
  select(category, success_rate, num_projects) %>%
  arrange(desc(success_rate))
cat1_clean <- cat1[!duplicated(cat1$category),]
print(cat1_clean, n=26)
## # A tibble: 159 x 3
## # Groups:   category [159]
##    category       success_rate num_projects
##    <chr>                 <dbl>        <int>
##  1 Rock                    100         2544
##  2 Nonfiction              100         2610
##  3 CountryFolk             100         2642
##  4 Theater                 100          558
##  5 Photography             100          631
##  6 Music                   100          721
##  7 IndieRock               100         2532
##  8 Crafts                  100         1196
##  9 Fashion                 100          173
## 10 VideoGames              100         2435
## 11 TabletopGames           100         3355
## 12 Technology              100          329
## 13 Games                   100           54
## 14 Shorts                  100         2697
## 15 Fiction                 100         2573
## 16 Dance                   100         1474
## 17 Comics                  100          353
## 18 ComicBooks              100         3267
## 19 ChildrensBooks          100         2755
## 20 ProductDesign           100         3711
## 21 Food                    100          511
## 22 Design                  100          191
## 23 Art                     100          836
## 24 Documentary             100         2707
## 25 Publishing              100          293
## 26 FilmVideo               100          242
## # … with 133 more rows
cat2 <- nov2019_bool_cleaned %>%
  select(category, state) %>%
  dplyr::group_by(category) %>%
  mutate(success_rate = length(state[state == "successful"])/n()*100) %>%
  mutate(num_projects=n()) %>%
  select(category, success_rate, num_projects) %>%
  arrange(desc(success_rate))
cat2_clean <- cat2[!duplicated(cat2$category),]
print(cat2_clean, n=31)
## # A tibble: 160 x 3
## # Groups:   category [160]
##    category       success_rate num_projects
##    <chr>                 <dbl>        <int>
##  1 Accessories             100         3451
##  2 Documentary             100         2739
##  3 ComicBooks              100         3334
##  4 Illustration            100         2857
##  5 Theater                 100          545
##  6 IndieRock               100         2539
##  7 Fiction                 100         2615
##  8 Photography             100          504
##  9 ProductDesign           100         3563
## 10 Dance                   100         1413
## 11 Design                  100          196
## 12 FilmVideo               100          223
## 13 Rock                    100         2556
## 14 Nonfiction              100         2600
## 15 TabletopGames           100         3388
## 16 CountryFolk             100         2641
## 17 Shorts                  100         2748
## 18 Comics                  100          288
## 19 Technology              100          327
## 20 ChildrensBooks          100         2807
## 21 Fashion                 100          151
## 22 ScienceFiction          100           62
## 23 Publishing              100          253
## 24 Food                    100          517
## 25 Music                   100          696
## 26 Crafts                  100         1115
## 27 Punk                    100           31
## 28 Art                     100          857
## 29 Residencies             100           60
## 30 Games                   100           63
## 31 VideoGames              100          179
## # … with 129 more rows
cat3 <- march2020_bool_cleaned %>%
  select(category, state) %>%
  dplyr::group_by(category) %>%
  mutate(success_rate = length(state[state == "successful"])/n()*100) %>%
  mutate(num_projects=n()) %>%
  select(category, success_rate, num_projects) %>%
  arrange(desc(success_rate))
cat3_clean <- cat3[!duplicated(cat3$category),]
print(cat3_clean, n=29)
## # A tibble: 160 x 3
## # Groups:   category [160]
##    category       success_rate num_projects
##    <chr>                 <dbl>        <int>
##  1 ProductDesign           100         3777
##  2 Design                  100          200
##  3 TabletopGames           100         3372
##  4 Rock                    100         2554
##  5 Music                   100          690
##  6 Accessories             100         3499
##  7 ComicBooks              100         3353
##  8 Comics                  100          291
##  9 Illustration            100         2933
## 10 Documentary             100         2692
## 11 CountryFolk             100         2636
## 12 Shorts                  100         2704
## 13 Art                     100          815
## 14 IndieRock               100         2545
## 15 Dance                   100         1387
## 16 Food                    100          520
## 17 Fiction                 100         2609
## 18 ChildrensBooks          100         2799
## 19 Publishing              100          260
## 20 Nonfiction              100         2608
## 21 VideoGames              100         2396
## 22 Theater                 100          540
## 23 Fashion                 100          147
## 24 FilmVideo               100          215
## 25 Technology              100          334
## 26 Photography             100          460
## 27 Translations            100            3
## 28 Crafts                  100         1087
## 29 Games                   100           69
## # … with 131 more rows
cat4 <- nov2020_bool_cleaned %>%
  select(category, state) %>%
  dplyr::group_by(category) %>%
  mutate(success_rate = length(state[state == "successful"])/n()*100) %>%
  mutate(num_projects=n()) %>%
  select(category, success_rate, num_projects) %>%
  arrange(desc(success_rate))
cat4_clean <- cat4[!duplicated(cat4$category),]
print(cat4_clean, n=29)
## # A tibble: 160 x 3
## # Groups:   category [160]
##    category       success_rate num_projects
##    <chr>                 <dbl>        <int>
##  1 ChildrensBooks          100         2850
##  2 Crafts                  100         1080
##  3 ProductDesign           100         3685
##  4 Design                  100          174
##  5 VideoGames              100         2417
##  6 Accessories             100         3547
##  7 Rock                    100         2381
##  8 Documentary             100         2768
##  9 Fashion                 100          127
## 10 IndieRock               100         2380
## 11 Dance                   100         1377
## 12 Fiction                 100         2617
## 13 Photography             100          447
## 14 Food                    100          543
## 15 Vegan                   100          171
## 16 Animation               100          111
## 17 Nonfiction              100         2588
## 18 FilmVideo               100          218
## 19 Shorts                  100         2814
## 20 Theater                 100          548
## 21 Illustration            100         2991
## 22 ComicBooks              100         3316
## 23 Comics                  100          273
## 24 Technology              100          323
## 25 TabletopGames           100         1169
## 26 Publishing              100          262
## 27 CountryFolk             100         2380
## 28 Art                     100          872
## 29 Games                   100           54
## # … with 131 more rows
cat1_low <- march2019_bool_cleaned %>%
  select(category, state) %>%
  dplyr::group_by(category) %>%
  mutate(success_rate = length(state[state == "successful"])/n()*100) %>%
  mutate(num_projects=n()) %>%
  select(category, success_rate, num_projects) %>%
  arrange(success_rate)
cat1_low_clean <- cat1_low[!duplicated(cat1_low$category),]
print(cat1_low_clean, n=20)
## # A tibble: 159 x 3
## # Groups:   category [159]
##    category          success_rate num_projects
##    <chr>                    <dbl>        <int>
##  1 MobileGames               13.2         2387
##  2 Video                     18.2          617
##  3 FoodTrucks                18.2         1881
##  4 Web                       18.3         4322
##  5 Couture                   18.5          265
##  6 Software                  20.4         2447
##  7 Television                20.6         1023
##  8 Bacon                     22.3          188
##  9 LiveGames                 22.5         1020
## 10 Action                    22.7          754
## 11 Apps                      22.8         2422
## 12 Flight                    23.1          385
## 13 InteractiveDesign         23.7          401
## 14 Readytowear               24            950
## 15 FarmersMarkets            24.7          461
## 16 Candles                   25.6          516
## 17 Photo                     25.8          279
## 18 Family                    26.0          358
## 19 Translations              26.5          166
## 20 Academic                  27.4          952
## # … with 139 more rows
cat2_low <- nov2019_bool_cleaned %>%
  select(category, state) %>%
  dplyr::group_by(category) %>%
  mutate(success_rate = length(state[state == "successful"])/n()*100) %>%
  mutate(num_projects=n()) %>%
  select(category, success_rate, num_projects) %>%
  arrange(success_rate)
cat2_low_clean <- cat2_low[!duplicated(cat2_low$category),]
print(cat2_low_clean, n=20)
## # A tibble: 160 x 3
## # Groups:   category [160]
##    category          success_rate num_projects
##    <chr>                    <dbl>        <int>
##  1 MobileGames               13.7         2378
##  2 FoodTrucks                17.3         1959
##  3 Couture                   17.7          288
##  4 Web                       18.9         4350
##  5 Video                     19.7          640
##  6 Television                20.3         1063
##  7 Software                  21.5         2441
##  8 Bacon                     22.0          191
##  9 Action                    22.3          806
## 10 Flight                    22.5          395
## 11 LiveGames                 22.7         1081
## 12 FarmersMarkets            23.1          464
## 13 InteractiveDesign         24.8          428
## 14 Apps                      24.9         2409
## 15 Candles                   25.4          560
## 16 Readytowear               25.4         1036
## 17 Translations              25.7          175
## 18 Photo                     26.4          280
## 19 Family                    26.7          378
## 20 Academic                  27.0         1024
## # … with 140 more rows
cat3_low <- march2020_bool_cleaned %>%
  select(category, state) %>%
  dplyr::group_by(category) %>%
  mutate(success_rate = length(state[state == "successful"])/n()*100) %>%
  mutate(num_projects=n()) %>%
  select(category, success_rate, num_projects) %>%
  arrange(success_rate)
cat3_low_clean <- cat3_low[!duplicated(cat3_low$category),]
print(cat3_low_clean, n=20)
## # A tibble: 160 x 3
## # Groups:   category [160]
##    category          success_rate num_projects
##    <chr>                    <dbl>        <int>
##  1 MobileGames               14.2         2387
##  2 FoodTrucks                17.1         2005
##  3 Web                       19.3         4367
##  4 Couture                   19.5          303
##  5 Television                20.2         1085
##  6 Video                     20.6          631
##  7 Bacon                     21.5          191
##  8 Action                    21.8          818
##  9 Software                  21.9         2447
## 10 Flight                    22.6          399
## 11 FarmersMarkets            22.9          468
## 12 LiveGames                 23.3         1108
## 13 InteractiveDesign         25.4          445
## 14 Readytowear               25.9         1065
## 15 Candles                   26.0          585
## 16 Apps                      26.0         2416
## 17 Academic                  26.0         1048
## 18 Photo                     26.8          284
## 19 Family                    27.2          393
## 20 Architecture              28.1          797
## # … with 140 more rows
cat4_low <- nov2020_bool_cleaned %>%
  select(category, state) %>%
  dplyr::group_by(category) %>%
  mutate(success_rate = length(state[state == "successful"])/n()*100) %>%
  mutate(num_projects=n()) %>%
  select(category, success_rate, num_projects) %>%
  arrange(success_rate)
cat4_low_clean <- cat4_low[!duplicated(cat4_low$category),]
print(cat4_low_clean, n=20)
## # A tibble: 160 x 3
## # Groups:   category [160]
##    category          success_rate num_projects
##    <chr>                    <dbl>        <int>
##  1 MobileGames               14.7         2389
##  2 FoodTrucks                17.3         2082
##  3 Couture                   19.6          311
##  4 Television                19.9         1118
##  5 Web                       20.2         4386
##  6 Video                     20.9          642
##  7 Bacon                     21.4          192
##  8 Flight                    22.0          409
##  9 Action                    22.1          848
## 10 FarmersMarkets            22.2          474
## 11 Software                  22.6         2448
## 12 LiveGames                 24.2         1163
## 13 InteractiveDesign         24.8          467
## 14 Readytowear               25.4         1117
## 15 Translations              26.3          186
## 16 Academic                  26.5         1100
## 17 Candles                   27.0          630
## 18 Architecture              27.1          807
## 19 Photo                     27.4          277
## 20 Events                    27.7          936
## # … with 140 more rows

We compared these lists to each other to get an idea of how successful or unsuccessful certain categories generally were month-to-month. From March 2020 to November 2020, the top categories that had the most dramatic positive change in success were “Children’s Books”, “Crafts”, “Fashion”, and “Video Games”. The top categories which had the most dramatic negative change in success were “Comic Books”, “Comics”, and “Tabletop Games”. There was no dramatic change in the lowest 20 categories, but it is interesting to note that for each month, “Mobile Games” was the lowest category.

From this data, we may conclude that many of the categories with the lowest success rates fall into one of two general distinctions: tech, and niche interests. We believe that many tech projects fail because of the comprehension that is required to fully appreciate the utility of certain tech projects (softwares, web, mobile games, etc). Projects for niche interests also tend to fail more often due to the challenge of finding donors who share that interest; some examples of niche interest categories consistently found in the bottom 20 included bacon, flight, candles, farmers markets, food trucks, and interactive design).

We could also hypothesize that the category of successful projects is relevant to current societal events, such as the Coronavirus pandemic. As people quarantine and spend more and more time at home, they may find themselves trying to keep busy with things like fashion or video games, whilst also dealing with children being at home instead of school; this would explain the increase of successful children’s book and craft projects.

Sentiment Analysis of Campaign Blurbs

march2019_sentiment <- march2019_bool_cleaned %>%
  unnest_tokens(word, blurb) %>%
  inner_join(get_sentiments("afinn"), by= "word") %>%
  group_by(name) %>%
  summarize(sentiment = mean(value), words = n()) %>%
  ungroup()
march2019_bool_cleaned %>%
  inner_join(march2019_sentiment, by = "name") %>%
  select(category, state, blurb, sentiment)
## # A tibble: 131,629 x 4
##    category    state    blurb                                          sentiment
##    <chr>       <chr>    <chr>                                              <dbl>
##  1 MixedMedia  success… "An adorable fantasy enamel pin series of pri…      3   
##  2 Photobooks  success… "Helping a community come together to set the…      1.5 
##  3 Software    failed   "Learn to build 10+ Applications in this comp…      2   
##  4 Accessories success… "'Eclipse' - A 30mm hard enamel pin in jet bl…      0   
##  5 Software    failed   "Let's build and remix a new Personal Web The…      4   
##  6 Sculpture   failed   "A limited edition, signed and numbered hand …      0   
##  7 PlayingCar… failed   "Play this hilarious, nonsensical card game f…      1.5 
##  8 Footwear    success… "Award Winning Footwear Designs | Crafted Usi…      2.75
##  9 Footwear    success… "NAKEFIT: Hypoallergenic adhesive pad for wal…      1   
## 10 PlayingCar… failed   "OVERTHROW THE KING AND TAKE WHAT IS YOURS & …      0.75
## # … with 131,619 more rows
march2019_graph <- march2019_bool_cleaned %>%
  inner_join(march2019_sentiment, by = "name") %>%
  select(state, sentiment)
ggplot(march2019_graph, aes(sentiment, state)) + geom_boxplot()

nov2019_sentiment <- nov2019_bool_cleaned %>%
  unnest_tokens(word, blurb) %>%
  inner_join(get_sentiments("afinn"), by= "word") %>%
  group_by(name) %>%
  summarize(sentiment = mean(value), words = n()) %>%
  ungroup()
nov2019_bool_cleaned %>%
  inner_join(nov2019_sentiment, by = "name") %>%
  select(category, state, blurb, sentiment)
## # A tibble: 130,325 x 4
##    category     state    blurb                                         sentiment
##    <chr>        <chr>    <chr>                                             <dbl>
##  1 PublicArt    success… Help support the legendary B.O.O.G.A. mutant…         2
##  2 Software     failed   21m breaches in 2016 alone. Presenting a vid…        -2
##  3 Spaces       success… We are opening our first restaurant in Cambe…         2
##  4 DIY          failed   Sac  Super Duty construit pour la longévité,…         3
##  5 DIY          success… Great decor and gifts for any occasion!               3
##  6 ClassicalMu… failed   Inedited & original music for Guitar's Orche…         2
##  7 Apparel      success… A new online portal where you can discover f…         3
##  8 NarrativeFi… success… An award-winning ensemble drama charting the…         3
##  9 Ceramics     failed   I'm excited to introduce my Salad serving sp…         3
## 10 Software     failed   To calculate extremely rare Plato's  periodi…         2
## # … with 130,315 more rows
nov2019_graph <- nov2019_bool_cleaned %>%
  inner_join(nov2019_sentiment, by = "name") %>%
  select(state, sentiment)
ggplot(nov2019_graph, aes(sentiment, state)) + geom_boxplot()

march2020_sentiment <- march2020_bool_cleaned %>%
  unnest_tokens(word, blurb) %>%
  inner_join(get_sentiments("afinn"), by= "word") %>%
  group_by(name) %>%
  summarize(sentiment = mean(value), words = n()) %>%
  ungroup()
march2020_bool_cleaned %>%
  inner_join(march2020_sentiment, by = "name") %>%
  select(category, state, blurb, sentiment)
## # A tibble: 133,014 x 4
##    category     state    blurb                                         sentiment
##    <chr>        <chr>    <chr>                                             <dbl>
##  1 Web          failed   "Dedicated to allow Christian Musicians, Pod…       1.5
##  2 Faith        success… "New Music from Marty Mikles!  A new EP all …       2.5
##  3 DIY          failed   "Expand greenhouse and plants. I offer wide …       1  
##  4 Periodicals  failed   "A lifestyle magazine for the intelligent an…       2  
##  5 Periodicals  success… "A new, full-colour magazine devoted to life…       3  
##  6 Blues        success… "A new CD of ten original blues-soul songs b…       4  
##  7 Faith        failed   "An EP of 7 Original Neoclassical Piano Comp…       2  
##  8 ClassicalMu… success… "An album of the previously unrecorded piano…       2  
##  9 Periodicals  failed   "Everyone has a story.  Let's make it a Cove…       2  
## 10 ElectronicM… success… "Help Bluetech release the first volume of S…       2  
## # … with 133,004 more rows
march2020_graph <- march2020_bool_cleaned %>%
  inner_join(march2020_sentiment, by = "name") %>%
  select(state, sentiment)
ggplot(march2020_graph, aes(sentiment, state)) + geom_boxplot()

nov2020_sentiment <- nov2020_bool_cleaned %>%
  unnest_tokens(word, blurb) %>%
  inner_join(get_sentiments("afinn"), by= "word") %>%
  group_by(name) %>%
  summarize(sentiment = mean(value), words = n()) %>%
  ungroup()
nov2020_bool_cleaned %>%
  inner_join(nov2020_sentiment, by = "name") %>%
  select(category, state, blurb, sentiment)
## # A tibble: 130,428 x 4
##    category     state    blurb                                         sentiment
##    <chr>        <chr>    <chr>                                             <dbl>
##  1 Jazz         failed   "This project is designed to help protect th…      1.67
##  2 PublicArt    success… "Help send me to Washington DC where I will …      2   
##  3 Makerspaces  failed   "Help us built a sustainable studio & elimin…      2   
##  4 PublicArt    failed   "\"If I paint something, I don't want to hav…      1   
##  5 ChildrensBo… success… "An illustrated collection of short horror s…     -2   
##  6 Stationery   success… "Make your holidays a little stranger with t…      2.5 
##  7 Restaurants  failed   "Our free app will allow you pool reservatio…     -0.2 
##  8 ChildrensBo… success… "A tender portrait of Afro-Caribbean family …      2   
##  9 ChildrensBo… success… "Penelope's Petition teaches children about …     -3   
## 10 NarrativeFi… success… "What if your dreams attacked you?\nFollow N…      0   
## # … with 130,418 more rows
nov2020_graph <- nov2020_bool_cleaned %>%
  inner_join(nov2020_sentiment, by = "name") %>%
  select(state, sentiment)
ggplot(nov2020_graph, aes(sentiment, state)) + geom_boxplot()

if(!require(devtools)) install.packages("devtools")
devtools::install_github("kassambara/ggpubr")

mar19 <- ggplot(march2019_graph, aes(sentiment, state)) + geom_boxplot()
nov19 <- ggplot(nov2019_graph, aes(sentiment, state)) + geom_boxplot()
mar20 <- ggplot(march2020_graph, aes(sentiment, state)) + geom_boxplot()
nov20 <- ggplot(nov2020_graph, aes(sentiment, state)) + geom_boxplot()

library(ggpubr)
ggarrange(mar19, nov19, mar20, nov20 + rremove("x.text"), 
          labels = c("Mar 2019", "Nov 2019", "Mar 2020", "Nov 2020"),
          ncol = 2, nrow = 2)

Here, we performed a sentiment analysis on the blurbs of Kickstarter projects. To do this, we started by unnesting the tokens in the blurb to create a vector of all words used in each blurb. From there, we performed an inner join with the “afinn” sentiment lexicon and the unnested blurbs and grouped those results by project name (“name” variable). We then summarized this variable with the “sentiment” value, which was defined as the mean sentiment values for all words in each blurb. This data was set to a new variable called “monthyear_sentiment”, with month and year pertaining to the respective values (“march2019_sentiment, etc.). From there, we created a variable in preparation for visualizing the sentiment scores called “monthyear_graph” which included the “monthyear_sentiment” data joining with the “monthyear_bool_cleaned data” and selecting only the “state” and “sentiment” columns.

Using ggplot2’s boxplot, we were able to visualize the distribution of sentiment scores between both successful and unsuccessful projects. In doing this, we found that for all four months, the median sentiment score for successful projects was slightly higher than the median sentiment score for unsuccessful projects. Additionally, there was a more polarized distribution of sentiment scores within the failed projects, meaning that while successful projects had sentiment scores that were neither extremely positive nor negative, failed projects had sentiment scores that were very concentrated on either end of the plot. We realize that while a negative sentiment score does not necessarily indicate that a blurb was written with a negative or rude tone, it may indicate that using words with more negative connotations (even in a positive way) could lower the likelihood for that project to reach its goal.

Impact of Spotlight Pages

march2019_spotlight <- march2019_bool_cleaned %>%
  select(spotlight, usd_pledged) %>%
  group_by(spotlight) %>%
  mutate(mean_pledged = mean(usd_pledged)) %>%
  select(spotlight, mean_pledged) %>%
  group_by(spotlight)
march2019_spotlight <- march2019_spotlight[!duplicated(march2019_spotlight$spotlight),]

nov2019_spotlight <- nov2019_bool_cleaned %>%
  select(spotlight, usd_pledged) %>%
  group_by(spotlight) %>%
  mutate(mean_pledged = mean(usd_pledged)) %>%
  select(spotlight, mean_pledged) %>%
  group_by(spotlight)
nov2019_spotlight <- nov2019_spotlight[!duplicated(nov2019_spotlight$spotlight),]

march2020_spotlight <- march2020_bool_cleaned %>%
  select(spotlight, usd_pledged) %>%
  group_by(spotlight) %>%
  mutate(mean_pledged = mean(usd_pledged)) %>%
  select(spotlight, mean_pledged) %>%
  group_by(spotlight)
march2020_spotlight <- march2020_spotlight[!duplicated(march2020_spotlight$spotlight),]

nov2020_spotlight <- nov2020_bool_cleaned %>%
  select(spotlight, usd_pledged) %>%
  group_by(spotlight) %>%
  mutate(mean_pledged = mean(usd_pledged)) %>%
  select(spotlight, mean_pledged) %>%
  group_by(spotlight)
nov2020_spotlight <- nov2020_spotlight[!duplicated(nov2020_spotlight$spotlight),]

march2019_spotlight
## # A tibble: 2 x 2
## # Groups:   spotlight [2]
##   spotlight mean_pledged
##   <lgl>            <dbl>
## 1 TRUE            21474.
## 2 FALSE            1087.
nov2019_spotlight
## # A tibble: 2 x 2
## # Groups:   spotlight [2]
##   spotlight mean_pledged
##   <lgl>            <dbl>
## 1 TRUE            21149.
## 2 FALSE            1073.
march2020_spotlight
## # A tibble: 2 x 2
## # Groups:   spotlight [2]
##   spotlight mean_pledged
##   <lgl>            <dbl>
## 1 TRUE            22047.
## 2 FALSE            1070.
nov2020_spotlight
## # A tibble: 2 x 2
## # Groups:   spotlight [2]
##   spotlight mean_pledged
##   <lgl>            <dbl>
## 1 TRUE            22888.
## 2 FALSE            1034.

Here, we analyzed whether spotlight Kickstarter campaigns receive a higher “usd_pledged” amount. To do this, we calculated the average amount of “usd_pledged” for Kickstarter campaigns that were in the spotlight section of the website versus the campaigns that were not within the months of March and November of 2019 and 2020.

Our research indicated that spotlight campaigns receive a higher amount pledged. All four months (March and November of 2019 and 2020) received between $20,000-$21,000 more for utilizing the spotlight feature.

Findings and Implications

The main findings from this analysis are as follows: while the average amount per pledge has decreased over the last two years, COVID-19 may have actually increased the popularity of certain campaign categories. Things like children’s books, crafts, and video games saw a major increase in successful campaigns over last year. Some categories, however, seem to be consistently less successful, regardless of outside factors. Specifically, the ‘Mobile Games’ category had the lowest number of successful projects across all four time periods. Included in these low-success categories are tech-oriented categories such as web, software, and apps, as well as categories with more niche interests like farmer’s markets, candles, and bacon. We can conclude from these findings that the more niche or advanced in nature a category is, the harder time it will have in securing funding.

Lastly, our analysis leads us to believe that creating a “Spotlight” page for a Kickstarter campaign will play a major role in securing funding, as on average, campaigns with spotlight pages secure approximately $20,000 more in funding than campaigns without a spotlight page. This knowledge would benefit both Kickstarter and creators, as Kickstarter takes a small percentage of any successful projects, and projects are more likely to achieve more funding, and therefore, succeed with spotlight pages.