Crowdfunded Projects on KickStarter in 2012

#Loading necessary libraries:
library(ggplot2)
library(dplyr)
Kickstarter <- read.csv("Kickstarter.csv", stringsAsFactors = FALSE)
head(Kickstarter)
##   Crowdfunded.projects.on.Kickstarter..2012 Launched Successful
## 1                                     Games     2796        911
## 2                              Film & Video     9600       3891
## 3                                    Design     1882        759
## 4                                     Music     9086       5067
## 5                                Technology      831        312
## 6                                Publishing     5634       1666
##   Money.pledged... Pledges Success.rate... Average.pledge...
## 1         83144565 1378143            32.6              60.3
## 2         57951876  647361            40.5              89.5
## 3         50124041  536469            40.3              93.4
## 4         34953600  522441            55.8              66.9
## 5         29003932  270912            37.5             107.1
## 6         15311251  262738            29.6              58.3
success_rate <- Kickstarter$Successful/Kickstarter$Launched
print(success_rate)
##  [1] 0.3258226 0.4053125 0.4032944 0.5576711 0.3754513 0.2957047 0.3763676
##  [8] 0.4855934 0.4632479 0.6681589 0.2616034 0.3567251 0.7441406
colnames(Kickstarter) <- c("Project", "Launched", "Successful", "Money.Pledged", "Pledges", "Success.Rate", "Average.Pledge")
money.pledged <- Kickstarter$Money.Pledged/1000000
print(money.pledged)
##  [1] 83.144565 57.951876 50.124041 34.953600 29.003932 15.311251 11.117486
##  [8] 10.477939  9.242233  7.084968  6.317799  3.283635  1.773304

Exploratory Information

First, I will become more familiar with the data before attempting to create a visual that is more appropriate than the Economist’s journalistic visualization.

Figure 1. Scatterplot of Money Pledged and Success Rate by Project Type

Kickstarter_Success <- ggplot(data=Kickstarter, aes(x= money.pledged, y=success_rate, color=Project))+
  geom_point(size=5)+
  xlab("Money Pledged\n(in millions)") +
  ylab("Success Rate")+
  scale_y_continuous(labels = scales::percent)
Kickstarter_Success

From Figure 1 we can see that the highest success rate is attributed with dance projects which also had the lowest amount of money pledged. However, in this visual it is not easy to compare across the project types on a quantitative basis because of this we will continue to make other visualizations that can outperform or at least be as informative as the Economist’s visual.

After some trial and error looking at different variables on the x-axis, it made the most sense to use the project type as the x variable. It also made sense to have money pledges, opposed to success rate, on the first y-axis due to the larger values. When success rate was on the y-axis the scale wasn’t large enough to see both values in an easy way. I chose the bar graph in order to show success rate as a color scale to make it more clear which project types were more successful compared to other project types.

Single Visualization

Figure 2. Barplot of Success Rate and Money Pledged by Project Type

Average.Pledged <- Kickstarter$Average.Pledge
BKick <- ggplot(Kickstarter, aes(x=reorder(Project, -money.pledged), y=money.pledged))+
  geom_col(aes(fill=success_rate), color="white")+
  geom_text(aes(label=money.pledged, y = money.pledged),vjust=-1,color="black", size=2, label=round(money.pledged, digits =2))+
  theme(legend.position = "bottom")+
  labs(x = "Project Type", y="Money Pledged\n(in millions)", color="Average Pledged", fill="Success Rate")+   
  theme(text = element_text(size=7),
        axis.text.x = element_text(angle=90, hjust=1))+
  geom_point(aes(color="", y=Average.Pledged), size=8)+ 
  geom_text(aes(label=Average.Pledged, y = Average.Pledged),vjust=0,color="black", size=2)
 
BKick2 <- BKick + scale_y_continuous(sec.axis=sec_axis(~.*1, name="Average Pledged\n(in dollars)"), limits=c(0,110))

BKick2

The first y-axis for Figure 2 had to be based on money pledged because, as mentioned before, that values for money pledged was much larger than the success rate percentages and the average pledged and required a broader scale on the primary y-axis.

The assumption would be that the more money pledged to a certain project type the higher the success rate would be. However, as we can see in Figure 2, this is not always the case. At first I looked at putting the success rate as points, similar to those now representing average pledged. However, I thought it was more interesting to look at the average amount pledged to a certain project type. My assumption after creating this visual was then that the larger dollar amount pledged (on average) represented a better “pitch” to investors and therefore generated higher confidence in the project and would have lead to a higher success rate. Although, this assumption was pretty inconclusive given this dataset does not show each project within the larger project type category, it seemed to hold true for Dance and Theater.

In comparison with the Economist’s visualization, Figure 2 does not show the relationship path between variables. Additionally, the actual success rate is not shown as a percentage, you can see which project type was more successful compared to the others based on the color scale but you cannot see the actual percentage of success per project type. I opted to not include the actual success percentage as I thought the more important relationship was to compare success between project types and did not want to add more data points that could potentially introduce confusion given the number of variables already included. However, compared to the Economist’s visualiztion, Figure 2 shows the money pledged compared to the average pledged. I thought it was interesting that both Theater and Dance project types seem to have generated higher average pledge amounts but lower volume. This would lead me to believe that these types of projects target a more affluent donor (or maybe just someone who is just willing to spend more) and even without reaching a broad audience or raising the most funds, they generated higher success rates compared to other project types. This finding was not something that was easily visible in any of the Economist’s visualizations.

Although I do think there is definitely many ways to improve this visualization, as we’ve learned in the course so far the message that you want to convey should drive the visualization. After trial and error, I found the most interesting message to be the average pledged and success rate, vs. the total money pledged and success rate. Given more time to experiment, I would drop the total money pledged and look more closely at the average pledged and success rate. It would also be interesting to look at the projects within the project type categories for individual project success rates and average money pledged.

Issues I had:

  1. I was unsure of how to fix the legend label for “Sucess Rate” I made a few attempts at the legend suggestions from Wickham’s Chapter 6, but it wasn’t until I really thought about the grammar of ggplot that I realized the “fill” was the associated naming convention for the scaling bar that represents the success rate variable.

  2. I could not figure out how to remove the tick marks on the second y-axis, for average pledged. Since the scales are technically the same values, I was not sure I needed the tick marks. Although, the y-axes labels are definitely necessary since Money Pledged is in millions and Average Pledged is in dollars.

  3. It took me quite a while to figure out how to round the digits for the money pledged labels but as always, stackoverflow was very helpful.

  4. Lastly, I am not sure how to get rid of the “red” text next to the red circle in the legend, I was able to assign the correct variable (Average Pledged) to the red circle but unable to figure out how to remove the actual text saying “red”, any suggestions?

Please let me know how you think I can improve this visual to make it even more clear for readers.