Before diving into analysis, let’s begin by taking care of some preliminaries:
setwd("~/Documents/MBA 677/Assignment 1")
getwd()
library(ggplot2)
library(tidyr)
library(dplyr)
library(scales)
library(knitr)
library(stats)
library(RColorBrewer)
Let’s begin by reading in our data frame and completing some initial exploration. I expect to reformat the data into a long format too.
Kick <- read.csv('A2_kickstart_WORKING.csv')
str(Kick)
## 'data.frame': 13 obs. of 7 variables:
## $ Kickstarter_Category : Factor w/ 13 levels "Art","Comics",..: 8 6 4 9 12 11 7 1 2 13 ...
## $ Projects_Launched : int 2796 9600 1882 9086 831 5634 1828 3783 1170 1787 ...
## $ Projects_Fully_Funded: int 911 3891 759 5067 312 1666 688 1837 542 1194 ...
## $ Pledge_Amt : int 83144565 57951876 50124041 34953600 29003932 15311251 11117486 10477939 9242233 7084968 ...
## $ Pledge_Ct : int 1378143 647361 536469 522441 270912 262738 138204 155782 177070 95225 ...
## $ Success_Rate : num 0.326 0.405 0.403 0.558 0.375 0.296 0.376 0.486 0.463 0.668 ...
## $ Avg_Pledge_Val : num 60.3 89.5 93.4 66.9 107.1 ...
qplot(Kick, x=Kick$Pledge_Amt, y=Kick$Success_Rate)
#Trying something nicer in ggplot proper
plot1 <- ggplot(Kick, aes(x=(Pledge_Amt/1000000), y=Success_Rate)) +
geom_text(aes(label=Kickstarter_Category)) +
scale_y_continuous(labels = scales::percent) +
ylab("Kickstarter Success Rate") +
scale_x_continuous(labels=dollar) +
xlab("Pledges in Millions $")
plot1
Extending this first plot just a little bit, let’s see if there’s a discernable relationship between these variables:
plot1.1 <- plot1 + geom_smooth(method = "lm", se = FALSE, alpha = 0.2) +
coord_cartesian(ylim = c(0,.8))
plot1.1
By adding the linear trendline, we can more-easily see that there is indeed a negative relationship between success rate and money pledged. However, we’ve introduced some issues in terms of legibility.
One other initial finding to file away - the fine arts categories enjoy the top success rates:
TopCat <- Kick %>% select (Kickstarter_Category, Success_Rate) %>% arrange(desc(Success_Rate)) %>% top_n(4) %>% rename("Kickstarter Category"= Kickstarter_Category) %>% rename("Success Rate" = Success_Rate)
TopCat$`Success Rate` <- TopCat$`Success Rate` * 100
kable(TopCat)
| Kickstarter Category | Success Rate |
|---|---|
| Dance | 74.4 |
| Theater | 66.8 |
| Music | 55.8 |
| Art | 48.6 |
Finally, let’s take a look at the relationship between average size of pledge and success rate - still just exploring these relationships.
plot2 <- ggplot(Kick, aes(x=(Avg_Pledge_Val), y=Success_Rate)) +
geom_text(aes(label=Kickstarter_Category)) +
scale_y_continuous(labels = scales::percent) +
ylab("Kickstarter Success Rate") +
scale_x_continuous(labels=dollar) +
xlab("Average Pledge in $")
plot2
# This view isn't showing a strong relationship - let's try adding the geom_smooth to see if anything jumps out.
plot2 + geom_smooth()
Based on these two exploratory plots, I’m not identifying a strong relationship between the average pledge size and the success rate.
Note that this graph’s x-axis does not begin at zero - this would be a red flag if we were to use this as a final visualization, and must be addressed, lest the graph be called misleading.
Another question we might find interesting is to explore how many total projects and pledges there were by category:
ggplot(Kick,aes(reorder(Kickstarter_Category, desc(Pledge_Ct)), Pledge_Ct/1000, fill=Success_Rate)) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle=90, vjust = 0.5)) +
xlab("Kickstarter Category") +
ylab("Number of Pledges (000's)") +
scale_fill_gradient(low = "#8e0152", high = "#006837")
This graph shows another interesting result; some of the categories with the highest success rates (the fine arts we noted earlier, in green) enjoy the lowest number of pledges.
Since we also know the average pledge size for these categories was unremarkable, we can deduce that the projects in those categories tended to be smaller in scope than those in the larger categories.
Taken all together, these findings suggest that Kickstarter projects most likely to be successful are those with lower numbers of pledges, lower total pledge amount, and in the Fine Arts. Let’s think about a vis to highlight these points.
First, we’ll identify a “meta-category” for the Fine Arts, Popular Arts, and Consumer Goods.
Kick.Focused <- Kick %>%
arrange(desc(Success_Rate)) %>%
filter(row_number() %in% c(1, 2, 3, 11, 12, 13))
str(Kick.Focused)
## 'data.frame': 6 obs. of 7 variables:
## $ Kickstarter_Category : Factor w/ 13 levels "Art","Comics",..: 3 13 9 8 11 5
## $ Projects_Launched : int 512 1787 9086 2796 5634 1659
## $ Projects_Fully_Funded: int 381 1194 5067 911 1666 434
## $ Pledge_Amt : int 1773304 7084968 34953600 83144565 15311251 6317799
## $ Pledge_Ct : int 23807 95225 522441 1378143 262738 83067
## $ Success_Rate : num 0.744 0.668 0.558 0.326 0.296 0.262
## $ Avg_Pledge_Val : num 74.5 74.4 66.9 60.3 58.3 76.1
Kick.Focused$Kickstarter_Category
## [1] Dance Theater Music Games Publishing Fashion
## 13 Levels: Art Comics Dance Design Fashion Film & Video Food ... Theater
#Meta_Cat <- factor(c("Fine_Art", "Popular_Art"))
#Meta_Cat
Kick.Meta <- Kick %>% mutate(Meta_Cat = if_else(row_number() %in% c(1,2), "Popular Art", if_else(row_number() %in% c(3), "Consumer Goods", if_else(row_number() %in% c(4), "Fine Art", if_else(row_number() %in% c(5,6,7), "Consumer Goods", if_else(row_number() %in% c(8), "Fine Art", if_else(row_number() %in% c(9), "Popular Art", if_else(row_number() %in% c(10), "Fine Art", if_else(row_number() %in% c(11), "Consumer Goods", if_else(row_number() %in% c(12), "Popular Art", "Fine Art"))))))))))
str(Kick.Meta)
## 'data.frame': 13 obs. of 8 variables:
## $ Kickstarter_Category : Factor w/ 13 levels "Art","Comics",..: 8 6 4 9 12 11 7 1 2 13 ...
## $ Projects_Launched : int 2796 9600 1882 9086 831 5634 1828 3783 1170 1787 ...
## $ Projects_Fully_Funded: int 911 3891 759 5067 312 1666 688 1837 542 1194 ...
## $ Pledge_Amt : int 83144565 57951876 50124041 34953600 29003932 15311251 11117486 10477939 9242233 7084968 ...
## $ Pledge_Ct : int 1378143 647361 536469 522441 270912 262738 138204 155782 177070 95225 ...
## $ Success_Rate : num 0.326 0.405 0.403 0.558 0.375 0.296 0.376 0.486 0.463 0.668 ...
## $ Avg_Pledge_Val : num 60.3 89.5 93.4 66.9 107.1 ...
## $ Meta_Cat : chr "Popular Art" "Popular Art" "Consumer Goods" "Fine Art" ...
#I want to see Meta_Cat as a factor var
Kick.Meta$Meta_Cat <- factor(Kick.Meta$Meta_Cat)
str(Kick.Meta)
## 'data.frame': 13 obs. of 8 variables:
## $ Kickstarter_Category : Factor w/ 13 levels "Art","Comics",..: 8 6 4 9 12 11 7 1 2 13 ...
## $ Projects_Launched : int 2796 9600 1882 9086 831 5634 1828 3783 1170 1787 ...
## $ Projects_Fully_Funded: int 911 3891 759 5067 312 1666 688 1837 542 1194 ...
## $ Pledge_Amt : int 83144565 57951876 50124041 34953600 29003932 15311251 11117486 10477939 9242233 7084968 ...
## $ Pledge_Ct : int 1378143 647361 536469 522441 270912 262738 138204 155782 177070 95225 ...
## $ Success_Rate : num 0.326 0.405 0.403 0.558 0.375 0.296 0.376 0.486 0.463 0.668 ...
## $ Avg_Pledge_Val : num 60.3 89.5 93.4 66.9 107.1 ...
## $ Meta_Cat : Factor w/ 3 levels "Consumer Goods",..: 3 3 1 2 1 1 1 2 3 2 ...
It may have been a bit verbose, but we now have a data frame with a new factor variable which we’ll use to emphasize the success rates enjoyed by Fine Arts on Kickstarter.
final <- ggplot(Kick.Meta, aes(x=(Pledge_Amt/1000000), y=Success_Rate)) +
geom_point(aes(fill = Meta_Cat), size = 5, shape=23, alpha = 0.75) +
scale_y_continuous(labels = percent_format(), limit = c(0,1)) +
ylab("Kickstarter Success Rate") +
scale_x_continuous(labels=dollar) +
xlab("Pledges in Millions $") +
scale_fill_brewer(name = "Kickstarter Project Type", palette="Set1", breaks = c("Fine Art", "Popular Art" , "Consumer Goods")) +
ggtitle("On Kickstarter, Small Fine Arts Projects Find Success") +
geom_smooth(method = "lm", se=FALSE, color = "#984ea3", show.legend = FALSE, alpha = 0.75) +
theme(legend.position = "bottom") +
theme(plot.title = element_text(hjust = 0.5))
final
#Based on classmate feedback; adding a simple Kable table of the encoding from the meta groups.
last_kable <- select(Kick.Meta, c(8,1)) %>% arrange(Meta_Cat) %>% rename("Meta Category" = Meta_Cat) %>% rename ("Kickstarter Category" = Kickstarter_Category)
kable(last_kable)
| Meta Category | Kickstarter Category |
|---|---|
| Consumer Goods | Design |
| Consumer Goods | Technology |
| Consumer Goods | Publishing |
| Consumer Goods | Food |
| Consumer Goods | Fashion |
| Fine Art | Music |
| Fine Art | Art |
| Fine Art | Theater |
| Fine Art | Dance |
| Popular Art | Games |
| Popular Art | Film & Video |
| Popular Art | Comics |
| Popular Art | Photography |
With the final graph, we draw out a few things:
Fine Art projects enjoy a higher success rate than other types. As the total amount pledged for a category increases, the success rate actually decreases - this is perhaps a surprising conclusion. The linear regression line is intended to help make this relationship clear. Most categories of projects enjoy a little less than 50/50 shot of meeting their fundraising goals.
Notes Regarding Encoding Choices:
Comparing with the Economist: