Kickstarter

Author

Joshwin Lourdes Sunder

Published

February 27, 2026

Loading Libraries

# 1. Load libraries
library(dplyr)
library(ggplot2)

# 2. Use 'here' to find the file correctly
if (!require("here")) install.packages("here")
library(here)

# 3. Load the dataset using the absolute project path
# This bypasses the "No such file or directory" error
ks_data <- read.csv(here::here("ks-projects-201801.csv"))

# 4. Initial look at the data
glimpse(ks_data)

Rows: 378,661
Columns: 15
$ ID               <int> 1000002330, 1000003930, 1000004038, 1000007540, 10000…
$ name             <chr> "The Songs of Adelaide & Abullah", "Greeting From Ear…
$ category         <chr> "Poetry", "Narrative Film", "Narrative Film", "Music"…
$ main_category    <chr> "Publishing", "Film & Video", "Film & Video", "Music"…
$ currency         <chr> "GBP", "USD", "USD", "USD", "USD", "USD", "USD", "USD…
$ deadline         <chr> "2015-10-09", "2017-11-01", "2013-02-26", "2012-04-16…
$ goal             <dbl> 1000, 30000, 45000, 5000, 19500, 50000, 1000, 25000, …
$ launched         <chr> "2015-08-11 12:12:28", "2017-09-02 04:43:57", "2013-0…
$ pledged          <dbl> 0.00, 2421.00, 220.00, 1.00, 1283.00, 52375.00, 1205.…
$ state            <chr> "failed", "failed", "failed", "failed", "canceled", "…
$ backers          <int> 0, 15, 3, 1, 14, 224, 16, 40, 58, 43, 0, 100, 0, 0, 7…
$ country          <chr> "GB", "US", "US", "US", "US", "US", "US", "US", "US",…
$ usd.pledged      <dbl> 0.00, 100.00, 220.00, 1.00, 1283.00, 52375.00, 1205.0…
$ usd_pledged_real <dbl> 0.00, 2421.00, 220.00, 1.00, 1283.00, 52375.00, 1205.…
$ usd_goal_real    <dbl> 1533.95, 30000.00, 45000.00, 5000.00, 19500.00, 50000…

Processing Data

# Create date and year columns
ks_data <- ks_data %>%
  mutate(
    deadline_date = as.Date(deadline),
    launched_date = as.Date(launched),
    deadline_year = format(deadline_date, "%Y"),
    launched_year = format(launched_date, "%Y")
  )

# Verify the new columns
glimpse(ks_data)

Rows: 378,661
Columns: 19
$ ID               <int> 1000002330, 1000003930, 1000004038, 1000007540, 10000…
$ name             <chr> "The Songs of Adelaide & Abullah", "Greeting From Ear…
$ category         <chr> "Poetry", "Narrative Film", "Narrative Film", "Music"…
$ main_category    <chr> "Publishing", "Film & Video", "Film & Video", "Music"…
$ currency         <chr> "GBP", "USD", "USD", "USD", "USD", "USD", "USD", "USD…
$ deadline         <chr> "2015-10-09", "2017-11-01", "2013-02-26", "2012-04-16…
$ goal             <dbl> 1000, 30000, 45000, 5000, 19500, 50000, 1000, 25000, …
$ launched         <chr> "2015-08-11 12:12:28", "2017-09-02 04:43:57", "2013-0…
$ pledged          <dbl> 0.00, 2421.00, 220.00, 1.00, 1283.00, 52375.00, 1205.…
$ state            <chr> "failed", "failed", "failed", "failed", "canceled", "…
$ backers          <int> 0, 15, 3, 1, 14, 224, 16, 40, 58, 43, 0, 100, 0, 0, 7…
$ country          <chr> "GB", "US", "US", "US", "US", "US", "US", "US", "US",…
$ usd.pledged      <dbl> 0.00, 100.00, 220.00, 1.00, 1283.00, 52375.00, 1205.0…
$ usd_pledged_real <dbl> 0.00, 2421.00, 220.00, 1.00, 1283.00, 52375.00, 1205.…
$ usd_goal_real    <dbl> 1533.95, 30000.00, 45000.00, 5000.00, 19500.00, 50000…
$ deadline_date    <date> 2015-10-09, 2017-11-01, 2013-02-26, 2012-04-16, 2015…
$ launched_date    <date> 2015-08-11, 2017-09-02, 2013-01-12, 2012-03-17, 2015…
$ deadline_year    <chr> "2015", "2017", "2013", "2012", "2015", "2016", "2014…
$ launched_year    <chr> "2015", "2017", "2013", "2012", "2015", "2016", "2014…

Initial Calculations

# Calculating key metrics
total_projects <- nrow(ks_data)
successful_count <- ks_data %>% filter(state == "successful") %>% nrow()
failed_count <- ks_data %>% filter(state == "failed") %>% nrow()
percent_failed <- (failed_count / total_projects) * 100

There are 378661 projects in the dataset. Out of these, 133956 projects were marked as “successful,” while 197719 were marked as “failed.” This means that approximately 52.22% of the projects in this dataset resulted in failure.

Biggest Non-Success

# Finding the project with highest pledged amount that wasn't successful
biggest_non_success <- ks_data %>%
  filter(state != "successful") %>%
  filter(usd_pledged_real == max(usd_pledged_real, na.rm = TRUE))

The biggest non-success project in the dataset is “The Skarp Laser Razor: 21st Century Shaving (Suspended)”, which had a state of “suspended” despite raising $4,005,111.

Summary of Project: This project was a “laser razor” that promised to shave hair using a low-powered laser instead of blades. Although it raised millions of dollars, Kickstarter suspended the campaign because the creators could not provide a working prototype that met the site’s requirements, leading to the project being shut down.

Project State

# Summarize by state
state_summary <- ks_data %>%
  group_by(state) %>%
  summarize(count = n()) %>%
  arrange(desc(count))

# Create bar chart
ggplot(state_summary, aes(x = reorder(state, -count), y = count)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  labs(
    title = "Distribution of Kickstarter Project States",
    x = "Project State",
    y = "Number of Projects"
  ) +
  theme_minimal()

Observation: The bar chart clearly shows that “failed” and “successful” are the two most common states, but there is also a significant number of “canceled” projects. The “undefined” and “suspended” states represent a very small fraction of the total data.

Yearly Summary

# Grouping by launched_year
yearly_summary <- ks_data %>%
  group_by(launched_year) %>%
  summarize(
    count = n(),
    # Success rate: Successes divided by all projects in that year
    percent_success = (sum(state == "successful") / n()) * 100,
    # Using Median to avoid the influence of extreme outliers (like the laser razor)
    median_pledged = median(usd_pledged_real, na.rm = TRUE)
  )

# Display the dataset
print(yearly_summary)

# A tibble: 11 × 4
   launched_year count percent_success median_pledged
   <chr>         <int>           <dbl>          <dbl>
 1 1970              7             0              0  
 2 2009           1329            43.6          580  
 3 2010          10519            43.7          780  
 4 2011          26237            46.4         1021  
 5 2012          41165            43.5         1128  
 6 2013          44851            43.3         1453  
 7 2014          67745            31.2          420  
 8 2015          77300            27.1          222  
 9 2016          57184            32.8          501  
10 2017          52200            35.4          632  
11 2018            124             0             54.3

# Plotting success rate by year
ggplot(yearly_summary, aes(x = launched_year, y = percent_success)) +
  geom_bar(stat = "identity", fill = "darkgreen") +
  labs(
    title = "Success Rate of Kickstarter Projects by Year",
    x = "Year Launched",
    y = "Success Percentage (%)"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, vjust = 0.5))

Decisions Explanation: 1. Year: I chose launched_year because it reflects the economic and cultural trends of when people felt inspired to start a project. 2. Percent Success: I calculated this as successful / total projects for that year. I chose this because it gives a realistic view of a creator’s odds of success when they hit the “publish” button. 3. Pledge Summary: I chose median(usd_pledged_real) because Kickstarter data is heavily skewed by a few massive “viral” projects; the median gives a better representation of what a “typical” project raises.

Observation: Based on the chart and summary, Kickstarter saw a massive explosion in the number of projects starting around 2014-2015, but interestingly, the success percentage seems to have peaked in the earlier years (around 2011-2013) and then leveled off.

Unusual Data Values

One of the most unusual values in this dataset is the presence of projects with a launched year of 1970.

Why they are unusual: Kickstarter was founded in 2009, so it is impossible for projects to have launched in 1970.
Guess: This is likely a “Unix Epoch” error. In many computer systems, time is stored as the number of seconds since January 1, 1970. If a date value is missing or corrupted and defaults to “0”, the system interprets it as the beginning of 1970.
Follow-up Question: I would ask the creators: “Were these 1970 entries originally missing data, or did a specific import script error convert actual 2015/2016 dates into zero-values?”