Decisions for Graph

When approaching the assignment I focused on two quantitative variables Total amount pledged and Success rate. A scatterplot design seemed best fit the data model. The quantitative measurements would form the axes and were both on a continuous scale. While other variables would have been informative in their own way, these variables represented the underlying message of the data. More money doesn’t gaurantee success.

Total Amount Pledged in millions would make up the y axis. Success rate in percent would make up the x axis. I experimented with flipping the axes but believed this combination was the easier to read.

In order to compare the data points against each other we need a categorical variable, Project Type (Dance, Design, etc) seperated projects into like categories which are represented by points and accompanying text on the plot.

Object color and size were both valid candidates for representing a 4th variable, average pledge amount. Both can provide limited perception of quantity. Object color just kept the plot neater and was kept blue to remain visible over a grey plot.

I wanted to recreate a proportion visual relating each project type back to the total amount funded. I avoided using shape or any other colors to limit background data noise. I found that by placing a discrete percentage inside the points it did not disrupt the plot area. A label in the top right identifying what the percentages represent had the added benefit of helping fill an area of empty plot space.

Scaling was used to minimize plot space to just large enough to fit all labels for the project types. This required a little playing around with to avoid any overlap or missing values.

Comparisons

My graph includes most of the variables in the Economist version. 2 of the 7 variables I believe are not present. Proportion of success rate was not added although proportion of money pledged is present (percent inside the points). I believed a % of a dollar figure would be easier to perceive than basically a % of a percent. The other variable I believe is not completely there is rank by money pledged. While it is easy to see what 4-5 types received, the y axis is more crowded further down. It is not clear which points outrank others when they are spread over the x axis at almost the same y value.

I also believe that Average amount pledged can only go so far being grouped by color. We’ve learned that hue cannot be quantitatively perceived so intensity was the way to go. Even so, most other variables are more clear and concise and average pledge comes close to being background noise.

I must say that you don’t miss stacked bars until you can’t use them!!!

#Load Libraries
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.3
library(scales)
## Warning: package 'scales' was built under R version 3.3.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.3.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
#Load csv File
ksdata <- read.csv("A2_kickstart - Sheet1.csv")
#Clean Data and Set Variable for Total
ksdata <- ksdata %>% mutate(Money_Pledged = Money_Pledged / 1000000, Success_Rate = Success_Rate / 100)
## Warning: package 'bindrcpp' was built under R version 3.3.3
sum_pledges <- sum(ksdata$Money_Pledged)
#Create Plot
ggplot(ksdata, aes(Success_Rate, Money_Pledged, col = Average_Pledge)) + geom_point(size = 8) + geom_text(aes(label = KS_Type), nudge_y = -3.7, nudge_x = .005, size = 2.8) + geom_text(aes(label = percent(Money_Pledged/sum_pledges)), col = "white", size = 2, nudge_x = .002, nudge_y = .2) + xlim(.26,.750) + ylim(-2, 83.5) + labs(title = "2012 Kistarter Crowdfunded Projects", x = "Success Rate (%)", y = "Total Amount Pledged ($M)", col = "Average Pledge Amount ($)") + geom_text(aes(.571, 83.5, label = "(%) represents proportion to Total Amount Pledged ($319.8M)"), size = 3)