Wednesday, September 24, 2014

Developer's Paradise

Business Plan

  • Establish a more Collaborative code developing environment
  • Pull more long term contributors
  • Become a pseudo social network for coders based on shared ideas and techniques

Developer's Paradise

Initial Goal: Understand Existing Users

  • Number of users
  • Language of users (coding and text)
  • Number of projects
  • How these change over time

Number of Existing Users

  • Number of users is 189900
  • Number of Public Repositories is 416546

Caveat: Only Public Repo Creators

  • We do not know, and cannot directly know the number of users with only private repositories or no repositories. However, those with no repositories would not be considered customers, since they are not using our service.
  • On June 4, 2013 they added their millionth user (blog.bibucket.org)
  • When aquired in July of 2010, there were >60,000 users
  • Possible future analysis would estimate number of total users based on proportions of public users in known years.

Activity of Projects

plot of chunk unnamed-chunk-1

Creation of Projects

plot of chunk unnamed-chunk-2

Activity on Old vs Creation of New

In the first few years of BitBucket, the active accounts are equal to the cumulative sum of created accounts. In more recent years, this pattern does not hold as accounts become unused or closed.

We fit a line to the first few years to show how the growth in new projects has become increasingly exponential and how projects are no longer worked on. From this we could possibly estimate lifecycle of projects.

Activity on Old vs Creation of New

plot of chunk unnamed-chunk-3

Number of Users by spoken Language

  • To characterize the existing customers we mined the description of repositories for spoken language
  • Used only Repositories with at least 10 words in the description to accurately estimate the language using python
  • Programming language was considerd, but computational time for going into each repository and characterizing each was considered out of scope.
  • This information should support customer segmentation analysis and could possibly be used as a proxy for country of origin.

Users by Spoken Language

plot of chunk unnamed-chunk-4

Users by Spoken Language (Excluding Unknowns)

plot of chunk unnamed-chunk-5

Users by Spoken Language (Excluding Unknowns and English)

plot of chunk unnamed-chunk-6