#Loading necessary libraries:
library(ggplot2)
library(dplyr)
Kickstarter <- read.csv("Kickstarter.csv", stringsAsFactors = FALSE)
head(Kickstarter)
## Crowdfunded.projects.on.Kickstarter..2012 Launched Successful
## 1 Games 2796 911
## 2 Film & Video 9600 3891
## 3 Design 1882 759
## 4 Music 9086 5067
## 5 Technology 831 312
## 6 Publishing 5634 1666
## Money.pledged... Pledges Success.rate... Average.pledge...
## 1 83144565 1378143 32.6 60.3
## 2 57951876 647361 40.5 89.5
## 3 50124041 536469 40.3 93.4
## 4 34953600 522441 55.8 66.9
## 5 29003932 270912 37.5 107.1
## 6 15311251 262738 29.6 58.3
success_rate <- Kickstarter$Successful/Kickstarter$Launched
print(success_rate)
## [1] 0.3258226 0.4053125 0.4032944 0.5576711 0.3754513 0.2957047 0.3763676
## [8] 0.4855934 0.4632479 0.6681589 0.2616034 0.3567251 0.7441406
colnames(Kickstarter) <- c("Project", "Launched", "Successful", "Money.Pledged", "Pledges", "Success.Rate", "Average.Pledge")
money.pledged <- Kickstarter$Money.Pledged/1000000
print(money.pledged)
## [1] 83.144565 57.951876 50.124041 34.953600 29.003932 15.311251 11.117486
## [8] 10.477939 9.242233 7.084968 6.317799 3.283635 1.773304
First, I will become more familiar with the data before attempting to create a visual that is more appropriate than the Economist’s journalistic visualization.
Kickstarter_Success <- ggplot(data=Kickstarter, aes(x= money.pledged, y=success_rate, color=Project))+
geom_point(size=5)+
xlab("Money Pledged\n(in millions)") +
ylab("Success Rate")+
scale_y_continuous(labels = scales::percent)
Kickstarter_Success
From Figure 1 we can see that the highest success rate is attributed with dance projects which also had the lowest amount of money pledged. However, in this visual it is not easy to compare across the project types on a quantitative basis because of this we will continue to make other visualizations that can outperform or at least be as informative as the Economist’s visual.
After some trial and error looking at different variables on the x-axis, it made the most sense to use the project type as the x variable. It also made sense to have money pledges, opposed to success rate, on the first y-axis due to the larger values. When success rate was on the y-axis the scale wasn’t large enough to see both values in an easy way. I chose the bar graph in order to show success rate as a color scale to make it more clear which project types were more successful compared to other project types.
Average.Pledged <- Kickstarter$Average.Pledge
BKick <- ggplot(Kickstarter, aes(x=reorder(Project, -money.pledged), y=money.pledged))+
geom_col(aes(fill=success_rate), color="white")+
geom_text(aes(label=money.pledged, y = money.pledged),vjust=-1,color="black", size=2, label=round(money.pledged, digits =2))+
theme(legend.position = "bottom")+
labs(x = "Project Type", y="Money Pledged\n(in millions)", color="Average Pledged", fill="Success Rate")+
theme(text = element_text(size=7),
axis.text.x = element_text(angle=90, hjust=1))+
geom_point(aes(color="", y=Average.Pledged), size=8)+
geom_text(aes(label=Average.Pledged, y = Average.Pledged),vjust=0,color="black", size=2)
BKick2 <- BKick + scale_y_continuous(sec.axis=sec_axis(~.*1, name="Average Pledged\n(in dollars)"), limits=c(0,110))
BKick2
The first y-axis for Figure 2 had to be based on money pledged because, as mentioned before, that values for money pledged was much larger than the success rate percentages and the average pledged and required a broader scale on the primary y-axis.
The assumption would be that the more money pledged to a certain project type the higher the success rate would be. However, as we can see in Figure 2, this is not always the case. At first I looked at putting the success rate as points, similar to those now representing average pledged. However, I thought it was more interesting to look at the average amount pledged to a certain project type. My assumption after creating this visual was then that the larger dollar amount pledged (on average) represented a better “pitch” to investors and therefore generated higher confidence in the project and would have lead to a higher success rate. Although, this assumption was pretty inconclusive given this dataset does not show each project within the larger project type category, it seemed to hold true for Dance and Theater.
In comparison with the Economist’s visualization, Figure 2 does not show the relationship path between variables. Additionally, the actual success rate is not shown as a percentage, you can see which project type was more successful compared to the others based on the color scale but you cannot see the actual percentage of success per project type. I opted to not include the actual success percentage as I thought the more important relationship was to compare success between project types and did not want to add more data points that could potentially introduce confusion given the number of variables already included. However, compared to the Economist’s visualiztion, Figure 2 shows the money pledged compared to the average pledged. I thought it was interesting that both Theater and Dance project types seem to have generated higher average pledge amounts but lower volume. This would lead me to believe that these types of projects target a more affluent donor (or maybe just someone who is just willing to spend more) and even without reaching a broad audience or raising the most funds, they generated higher success rates compared to other project types. This finding was not something that was easily visible in any of the Economist’s visualizations.
Although I do think there is definitely many ways to improve this visualization, as we’ve learned in the course so far the message that you want to convey should drive the visualization. After trial and error, I found the most interesting message to be the average pledged and success rate, vs. the total money pledged and success rate. Given more time to experiment, I would drop the total money pledged and look more closely at the average pledged and success rate. It would also be interesting to look at the projects within the project type categories for individual project success rates and average money pledged.
I was unsure of how to fix the legend label for “Sucess Rate” I made a few attempts at the legend suggestions from Wickham’s Chapter 6, but it wasn’t until I really thought about the grammar of ggplot that I realized the “fill” was the associated naming convention for the scaling bar that represents the success rate variable.
I could not figure out how to remove the tick marks on the second y-axis, for average pledged. Since the scales are technically the same values, I was not sure I needed the tick marks. Although, the y-axes labels are definitely necessary since Money Pledged is in millions and Average Pledged is in dollars.
It took me quite a while to figure out how to round the digits for the money pledged labels but as always, stackoverflow was very helpful.
Lastly, I am not sure how to get rid of the “red” text next to the red circle in the legend, I was able to assign the correct variable (Average Pledged) to the red circle but unable to figure out how to remove the actual text saying “red”, any suggestions?
Please let me know how you think I can improve this visual to make it even more clear for readers.