Assignment 1 (MBA 677)

Crowdfunding at Kickstarter

Rationale for the Visual Encodings

After reviewing the data and examining the charts from the economist, I thought about what we would want to show in a visualization - what data, which relationships, etc. To me, the “output” of the kickstarter data is the success rate - and we want to look at the factors that contribute to that. We can examine other components along the way, but I wanted to highlight this variable the most.

data kickstart - data set created from csv file.

Aesthetics

y-variable: I mapped the success rate to the y axis to highlight this variable the most.

x variable: Since there were 13 categories, I thought that was too many to use shapes, or some identifier within the chart that would be unique enough to differentiate them from one another. Conversely, I felt there were not enough data points (1 per category) to use any facet/grid layouts. This led me to map the categories to the x axis. Further, I plotted these horizontally by their success rate - again giving people the ability to see most successful, to least successful categories.

size: I plotted number of pledges by size. This gives a ballpark idea of how popular any given category was. It shows where the masses invest, and where few went - I think that’s an interesting consideration in looking at which start-ups were more successful.

color: I plotted the amount pledged. While not the professionals favorite encoding for this purpose, it does show investment dollars easily enough to give the viewer a general idea of how much money was invested in any given category. Again, with fewer categories, I probably would have plotted pledged amounts on an axis and used symbols for the categories.

geometry: I chose geom_point because I felt this plotted the information I was interest in a simple to read plot, and allowed me enough options (below) to add detail.

labs: Labeled the chart and all mapped components.

theme: I used themes to make the chart “prettier” - some text was faint, and I thought with a number of components on the plot, I needed to help clarify axes, labels, etc. I also changed background colors to make the data points pop a bit better.

scales: I used these to clarify the variables, so the plot was easier to comprehend to the reader, including formatting labels and breaks. Of interest I chose to change the scale for the size (number of pledges). The default was 500k and 1,000k. There are several categories with less that 100k pledges, and the scale was very misleading or hard to get an idea of how many pledges there really were for the categories with lower participation. This may be improved - challenge is putting out a scale that represents the range of values from 25k - 1,400k. I also chose to use breaks for the success rate from 20% to 80% to use more of the plot and emphasize the change in success rate from category to category. While I was concerned the scale should start at 0%, I felt better after reading this article from Junk Charts.

Pros & Cons Compared to Economist Visualization

Pros

My visualization is a simpler view, and can be more easily read by a wider audience.
I only listed category labels once, whereas the Economist one had to list them twice.
I show the success rate very clearly.
Since it is a point plot, you can see pockets of trends (such as the most successful had few pledges and less money pledged). In the Economist one, you have to follow a shaded line that gets very indiscernible in the middle
My plot shows generally relationships better - you can evaluate the amount pledged against success, the number of pledges against success, and get a clearer idea of how these are related. The Economist visualization is harder to see these relationships.
My color coding/shading is meaningful for a value, whereas the Economist used it to identify category, which is already labeled twice - it seems the Economist had to really try to identify each category.
My Axes and labels are more clearly marked

Cons

My visualization does not as clearly rank the amount pledged, nor does it show the (rough) proportion of money for each category.
My visualization doesn’t show the position jump as well. The Economist’s shows a pattern of some of the most pledged categories have the lowest success rates, and vice-versa. While I would argue mine does this as well, the use of position and a connection between the two positions is not as clear - even though I argue their connections between points are not great.
The Economist’s visualization includes the exact money pledged and success rate - mine references axes or legends for this purpose, which will not give you precise numbers for either category.

Other Notes

Neither mine nor the Economist’s show the number of projects launched - while this is interesting, I don’t think it has a direct impact on success. What it would show is where there are more start-ups - which is not something I thought was necessary to show in this visualization.
I agree that the Economist made a good decision to eliminate the average amount pledged. The variation is not huge between the lowest and highest categories, making it fairly unexciting.

#read CSV
kickstart <- read.csv("A2_kickstart.csv", stringsAsFactors = FALSE)

#Libraries
library(dplyr)
library(ggplot2)
library(RColorBrewer)
library(scales)
library(ggthemes)

#Clean data
colnames(kickstart) <- c("Category","No.Launched", "No.Successful", "Pledged.Amt", "No.Pledges", "Success.Rate", "Avg.Pledge$")
blue.bold.text <- element_text(face = "bold", color = "darkblue")
black.text <- element_text(color = "black")
black.45angle.text <- element_text(angle = 45, hjust = 1, color = "black")

#Plot
kickstart.plot <- ggplot(kickstart, aes(x = reorder(Category, -Success.Rate), y = Success.Rate/100, size = No.Pledges/1000, col = Pledged.Amt/1000)) +
  geom_point() + 
  labs(title = "Crowdfunded Projects on Kickstarter in 2012", x = "Category", y = "Success Rate", color = "Amount Pledged (in thousands)", size = "Number of Pledges (in thousands)") +
  theme(axis.text.x = black.45angle.text,axis.text.y = black.text, plot.title = element_text(hjust = 0.5), axis.title = blue.bold.text, panel.background = element_rect(fill = "darkgray", color = "black"), plot.background = element_rect(fill = "gray95", color = "black")) +
  scale_size_continuous(breaks = c(100, 500, 1000), range = c(5, 15), label = comma_format()) +
  scale_y_continuous(labels = percent, limits = c(0.2, .8), breaks = c(0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8)) +
  scale_color_continuous(label=dollar_format())


kickstart.plot