Overview
Welcome to the monthly COG Team Report!
As the COG Turns is an iterative work in progress, intended for IBM Cognitive Opentech (COG) (although certainly share-able beyond that).
This report seeks to provide insight into the following questions:
- What do AI Communities care about?
- What do organizations engaged with AI Communities care about?
- What impact is our team having in AI Communities?
The report lives in an IBM internal Github repository (requires IBM Login) and is available both as R markdown and HTML through Github pages.
To give feedback on the report, including content suggestions, please file a Github issue (Requires IBM Login).
Dear Gentle Reader,
This month’s report takes a nosy neighbor approach to commit activity. We take a small peak at what the affiliates have been up to in addition to the usual rigmarole. While it’s not an huge addition, it is nonetheless an addition, and I hope it provides you, Dear Reader, with some value.
Here’s to celebrating the small victories!
Augustina
New This Month
In addition to the previous month’s analysis, this report also includes:
- Commit Author affiliation over time
- Commit samples from public company affiliates on Github
- Github Events on COG IBM Code repositories
AI Community Trends
What organizations are most visibly active in contribution activities for projects of interest? This does not necessarily indicate that one organization is more influential than another. Having a clearly identifiable affiliation suggests a high degree of “official” interest that could be followed up on.
This section looks specifically at commit and event activity for projects of interest. If you would like to add a project of interest to the list, submit an issue to the “As the COG Turns” Github repo (Requires IBM Login)!
Commit Logs
What organizations had identifiable commit activity in projects of interest? Organizations were identified using the domain name in the commit log. When considering proportion of authors, a network was used to distinguish single authors using multiple email addresses.
A domain look-up service was used to identify organizations 1 2 Public status was verified by looking up the stock ticker symbol using the name reported from the domain look-up service. 3
Grouping organizations by type and looking at the proportion of activity we can identify on different projects may give us insight on how developers are engaging with different projects. A note about the “Github” category, this was either an automation or the contributor committed directly from the Github UI. In the case of a closed-governance project with strong company affiliation (like Tensorflow), one can estimate these came from that company. Because this is not always the case, it has been called out as a seperate category.
The meanings of the categories are as follows:
- Public - The company is publicly traded and has a stock ticker symbol
- Private - Non-publicly traded company, could also be a personal domain name
- Personal - Personal domain name
- Non-Profit - Not for profit entity (.org)
- No domain record - Clearbit did not have a domain record entry for this domain
- Github - Committer email domain was "noreply@github.com" and not otherwise affiliated with another contributor.
- Education - A university, college, or other institution of learning
We can guess whether projects have a diverse contributor base or they are mostly associated with that company by looking for the following patterns:
- If a project shows authors from Github and Personal, Public organizations, and the owner of the project is a Public company (like IBM), it is very likely that the contributers are all also from that company.
- Public projects that include authors from Education and a high proportion of authors from Private companies are likely more active.
- Projects from private companies, like H2O.ai, contain a high proportion of contributors from that company if the authors are primarily Private, Personal, and Github
- The presence of Education authors suggests wider spread community engagement
Proportion of authors grouped by Affiliation Type
A second view plots this over time to see how consistently these affiliations appear.
Affiliation Types having at least one commit, per Month
Another view of type diversity is presented below. Organizations are grouped by type and whether they had at least one commit present in 2018. The width the the type indicates how many organizations had at least one commit identified in each month. A wider bar suggests more organizations of a particular type had a commit present. If an organization had a commit in multiple months, it is counted twice.
This view also has the advantage of showing how active a project is.
Affiliation Types having at least one commit
Affiliation Types having at least one commit
Stratifying by type makes it easier to see what companies had identifiable activity in the repository. Public companies are either identified as such by the domain service or identified through the present of a stock ticker symbol.
Public Companies by name
The alternative view below shows consistency of the presence of a commit over time. It also makes it easier to see what companies are engaged with what projects.
Public Companies by name
The following plot shows private companies identified as having at least one commit in each of the listed repositories.
Private Companies by name
Commits for Education Institutions are plotted below.
Caveats
- Affiliation identification is limited by correct domain name information from Clearbit. Efforts have been made to correct more obvious errors (like IBM being considered “private” - stock ticker information fixes this).
- International companies may not be accurately identified. For example, educational institutions often use the “ac” suffix. While this has been corrected, there may be other cases of mis-identification.
- Clearbit did not return domain information for all of the domains. A manual check of a random sample indicated these domains did not resolve and no information was available.
- Github has been called out as its own type because Github email addresses are either used for commits made using their UI, automation tools, or authors wishing to obscure their actual email address.
- This analysis only looked at authors, not committers.
Commit Interval Metric Applied to Selected Deep Learning Repositories
Affiliated Commits
What are our “frenemies” up to? The following plots take a sample of affiliated contributors and look at what else they’re committing to (excluding anything they own). The contributors have been grouped by company.
COG Advocacy
Code Pattern Github Activity
Events
Stars
Page Views
Stars vs Views
Clones
Future Questions
- What advocacy activities generated the most interest in our code patterns?
- How do views and clones compare with other Github activities?
- How effective are traffic and clones for indicating developer interest?
Next Month
In addition to improved versions of this month’s analysis, next month’s report just might include the following:
- Committers in addition to authors (thought I’d included that this month but I was mistaken!)
- Wrike activity in relation to the traffic data
- Commit trends for top organizations in Github through other means
- Events for the affiliate sample
- Meetup event RSVP trends
Is there something you’d like to see next month? Submit a Github Issue to the As The COG Turns repo! (Requires IBM Login)
Some domain aliases were not fully consolidated so some affiliations may have been missed. A future version of this report will fix this.↩