Project: MESOS

Introduction

This report provides an analysis of a project’s change cycle times based on events recorded in Jira versus events recorded in a Git-alike, such as git, GitHub, or GitLab. If the project demonstrates DevOps behavior, the report estimates the change cycle time.

Software development and maintenance projects implementing DevOps practices often measure success based on metrics like change cycle time. This metric reflects whether a project development team(s) is able to respond with a predictable cadence to change requests from teams that are using, maintaining, supporting, deploying, or providing requirements the development product(s). A project development team(s) that establishes this cadence will be better able to support, and continually improve support for, teams dependent on the product(s) because how frequently change will occur is predictable, which makes the entire product life cycle more manageable.

Notes:

Analysis

Projects implementing DevOps practices will have a strong cadence at which work is performed, with project changes beginning with a ticket/epic/story/task change request and, through the project’s life cycle, resulting in software code changes, testing, and ultimately deployment to a production environment. In order to determine the project’s DevOps change cycle time, this report uses data collected from events recorded from the project’s use of Jira and git-alike tools. As project personnel perform work, they make changes to the tickets/epics/stories/tasks with which they are associated in Jira. Likewise, as personnel perform work, components that are controlled by the project using a git-alike tool (for example, GitHub, GitLab, or git itself) are changed. The analysis in this report is based on the assertion that as project personnel perform work, the level of activity recorded in Jira and git-alike tools will increase or decrease in corresponding to the project’s DevOps change cycle time. Work will typically begin by being documented in Jira. As project team members make changes to fulfill the ticket/epic/story/task from Jira, they will make changes to software code and associated artifacts and update the git-alike tool with those changes. In a typical project life cycle for any project (agile, iterative, waterfall, etc.), the work being done is a “wave” that begins in Jira and flows through the git-alike tool. The frequency with which that work wave crests or ebbs in the project’s change cycle time.

The analysis in this report uses a cross-correlation technique from signal processing to compare periods of work increase, decrease, and stasis between Jira and the git-alike tool to characterize the time offset between Jira and the git-alike tool. This time offset is an estimate of the project’s change cycle time.

The stronger the project’s implementation of DevOps techniques and, in particular, the more the project uses its toolchain to support implementation, the stronger the change cycle time estimate based on tool usage data.

This estimate is not perfect, of course. No project performs its process with mechnical perfection or records data like clockwork, either. The analytic techniques account for the “noise” in the underlying work “signal” in calculating the change cylce time estimate. The techniques also highlight the reliability of the underlying data, which is calculated based on the number of data points available (more data equals better reliability). Regardless of the analytic robustness, the calculated change cycle time is still an estimate.

The analysis uses a variety of metrics derived from the tool transaction data for comparison. Depending on changes being made, some metrics may provide better insight than others. The report provides all of the comparisons, but highlights the comparisons that provide the clearest estimates.

Projects that are not implementing DevOps practices may not have a strong change cycle time indicator, since non-DevOps practices do not necessarily have consistent change cycle cadence. This is neither good nor bad, but just a reflection of the project’s life cycle.

Projects that use tools in a haphazard or pathological manner may have unclear or a counter-intutive change cycle time estimate. For example, if a project typically makes software code changes, deploys the code, and only then updates Jira with task descriptions, then the estimated change cycle time will not make sense. This particular report does not analyze the project’s underlying tool usage consistency. Other avaiable analytic reports focus on addressing this area.

Metrics Used

This report calculates metrics from project transaction data generated either through automation or by project personnel with Jira or Git-alike tools. The metrics used are:

Git-Alike Metrics
Metric Description
Number Of Commits Rate The number of commits in the git-alike repository, collating the data monthly/weekly/daily as a time series, divided by the number of project contributors.
Number Of File Changes Rate The number of files changed in the git-alike repository, collating the data monthly/weekly/daily as a time series, divided by the number of project contributors.
Number Of Line Updates Rate The number of lines changed (added, deleted, edited) in the git-alike repository, collating the data monthly/weekly/daily as a time series, divided by the number of project contributors.
Number Of Line Additions Rate The number of lines added in the git-alike repository, collating the data monthly/weekly/daily as a time series, divided by the number of project contributors.
Number Of Line Deletions Rate The number of lines deleted in the git-alike repository, collating the data monthly/weekly/daily as a time series, divided by the number of project contributors.
Jira Metrics
Metric Description
Number Changes Contributor Rate The number of Jira ticket/epic/story/task changes made per all active project members, collating the data monthly/weekly/daily as a time series.
Number Creations Contributor Rate The number of Jira ticket/epic/story/task creations made per all active project members, collating the data monthly/weekly/daily as a time series.
Number Actions Contributor Rate The number of Jira ticket/epic/story/task actions (creations or changes) made per all active project members, collating the data monthly/weekly/daily as a time series.
Number Changes Changer Rate The number of Jira ticket/epic/story/task changes made per each project member listed as making a change, collating the data monthly/weekly/daily as a time series, divided by the number of project contributors.
Number Of Changes Assignee Rate The number of Jira ticket/epic/story/task changes made per project members listed as assigned a change, collating the data monthly/weekly/daily as a time series, divided by the number of project contributors.
Number Of Creations Reporter Rate The number of Jira ticket/epic/story/task creations made per project members listed as tickets/epic/story/task reporter, collating the data monthly/weekly/daily as a time series, divided by the number of project contributors.
Number Of Resolutions Assignee Rate The number of Jira ticket/epic/story/task resolutions made per project members listed as tickets/epic/story/task resolvers, collating the data monthly/weekly/daily as a time series, divided by the number of project contributors.
Total Days Worked Resolve Assignee Rate The total caledar days worked to resolve a ticket/epic/story/task per project members listed as the assignee, collating the data monthly/weekly/daily as a time series, divided by the number of project contributors.
Mean Days Worked Resolve Assignee Rate The average (arithmetic mean) calendar days worked to resolve a ticket/epic/story/task per project members listed as the assignee, collating the data monthly/weekly/daily as a time series, divided by the number of project contributors.

Metrics With Adequate Supporting Data

The analytic techniques used in this report are likely to give incorrect results if the data being analyzed has more values that are the same (tied) versus unique (not tied). In particular, if many data values are absent, the analytic techniques must interpolate these absent values, which often results in a larger number of ties.

For this report, a metric is labeled as having an adequate number of non-tied or present values if less than 50% are tied or absent.

The following metrics had an adequate number of non-tied and non-absent values to be used in the analysis:

For Git-alikes:

  • For the month interval:
    • Number Of Commits Rate
    • Number Of File Changes Rate
    • Number Of Line Updates Rate
    • Number Of Line Additions Rate
    • Number Of Line Deletions Rate
  • For the week interval:
    • Number Of File Changes Rate
    • Number Of Line Updates Rate
    • Number Of Line Additions Rate
    • Number Of Line Deletions Rate
  • For the day interval:
    • Number Of Line Updates Rate
    • Number Of Line Deletions Rate

For Jira:

  • For the month interval:
    • Number Of Commits Rate Git
    • Number Of File Changes Rate Git
    • Number Of Line Updates Rate Git
    • Number Of Line Additions Rate Git
    • Number Of Line Deletions Rate Git
  • For the week interval:
    • Number Of File Changes Rate Git
    • Number Of Line Updates Rate Git
    • Number Of Line Additions Rate Git
    • Number Of Line Deletions Rate Git
  • For the day interval:
    • Number Of Line Updates Rate Git
    • Number Of Line Deletions Rate Git

Metrics Without Adequate Supporting Data

The following metrics did not have an adequate number of non-tied and non-absent values to be used in the analysis:

For Git-alikes:

  • For the week interval:
    • Number Of Commits Rate
  • For the day interval:
    • Number Of Commits Rate
    • Number Of File Changes Rate
    • Number Of Line Additions Rate

For Jira:

  • For the week interval:
    • Number Changes Contributor Rate
    • Number Creations Contributor Rate
    • Number Actions Contributor Rate
    • Number Changes Changer Rate
    • Number Of Changes Assignee Rate
    • Number Of Creations Reporter Rate
    • Number Of Resolutions Assignee Rate
  • For the day interval:
    • Number Changes Contributor Rate
    • Number Creations Contributor Rate
    • Number Actions Contributor Rate
    • Number Changes Changer Rate
    • Number Of Changes Assignee Rate
    • Number Of Creations Reporter Rate
    • Number Of Resolutions Assignee Rate

Change Cycle Time Results

The data plots below show the most likely estimates of the DevOps change cycle time for the project.
The “y-axis” shows how strong a relationship exists between the “signal” from the Git-alike metric being used, as documented in the data plot title, versus the “signal” from the Jira metric, also documentated in the title.

  • Negative values on the y-axis, with -1 being the minimum possible value, corresponds negative relationship–as one metric rises, the other falls.
  • A value of 0 on the y-axis means that the metrics have no apparent relationship.
  • Positive values on the y-axis with 1 being the maxium possible value, correspond to a positive relationship–as one metric rises or falls, the other metrics also rises or falls.

The “x-axis” shows the offset in time between the two metrics. For instance, if the plot shows metrics measured on a daily basis, a value of 14 on the x-axis means that the events generating the Jira metric occurred 14 days prior to the events generating the Git-alike metric.

The coloring of data points corresponds to the quality of the underlying data as calculated based on the number of matching data points. The higher the data quality, the more likely the calculated DevOps change cycle time reflects the project’s genuine behavior.

We are most interested in the offset values on the x-axis that produce correspond to the largest non-negative values on the y-axis that have high data quality. The points emphazied with arrows and labels are the points where the y-axis value is highest in comparison to all of of the values around it–these points are the best estimates for the project’s change cycle time. Furthermore, smaller values on the x-axis are preferred to larger values, since events occurring closer in time are (usually) more likely estimates. Larger x-axis values may just be cyclic repetitions of earlier strong values. For example, a relationship that occurs at 45 days would also be observed at 90 days, 135 days, and so on. The statement “highest in comparison” means that the value is the highest within the follows windows:

  • For days, the window width is 9-days wide.
  • For weeks, the window width is 5-weeks wide.
  • For months, the window width is 7-months wide.

The recommended metrics selected to be used in computing the Change Cycle Time are listed below. The figure of merit for each metric is computed based on a set of factors. Each factor is assigned a weight from 0 to 1, with 1 being the highest weight, reflecting that factor’s relative importance in selecting the metrics to be used.

Metric Evaluation Factors For Computing Change Cycle Time
Factor Weight
Metrics with the heighest correlation value that is of moderate or better data reliability 0.25
Metric with the earliest peak correlation value within a time interval 0.25
Metrics with peak interval periodicity (for example, a metric with a peak at 2, 4, and 6 weeks) 0.25
Metrics that agree with other metrics about peak correlation values 0.25
The data plots shown are those that have scored highest with respect to the selection criteria. The selected metrics and time period, selection criteria, criteria weights, and metric score are shown below. Any metric and time period combination not shown scored zero.
For this report, the cut-off score was set to 0.25 .
Selected Git Versus Jira Metrics Providing The Best Change Cycle Time Estimates
Interval Metric Score
Month Git Metric ‘Number Of Commits Rate’ versus Jira Metric ‘Number Creations Contributor Rate’ 0.375
Month Git Metric ‘Number Of File Changes Rate’ versus Jira Metric ‘Number Creations Contributor Rate’ 0.125
Month Git Metric ‘Number Of Line Additions Rate’ versus Jira Metric ‘Number Creations Contributor Rate’ 0.125
Month Git Metric ‘Number Of Commits Rate’ versus Jira Metric ‘Number Of Creations Reporter Rate’ 0.125

The graphs shown and the points highlighted for the listed time intervals provide the most likely estimate for the project’s DevOps Change Cycle Time.

Statistical Analysis Technical Details

Time Series Analysis

A time series is defined in the National Institutes of Standards and Technologies’ Engineering Statistics Handbook1 as:

An ordered sequence of values of a variable at equally spaced time intervals. Applications: The usage of time series models is twofold:
\(\cdot\) Obtain an understanding of the underlying forces and structure that produced the observed data.
\(\cdot\) Fit a model and proceed to forecasting, monitoring or even feedback and feedforward control.

Irregular, Regular, and Strictly Regular Time Series

From the R package zoo2, the time spacing of a time series determines if it is irregular, regular, or strictly regular:

A time series can either be irregular (unequally spaced), strictly regular (equally spaced) or have an underlying regularity, i.e., be created from a regular series by omitting some observations. Here, the latter property is called regular. Consequently, regularity follows from strict regularity but not vice versa.

The analytic techniques used in this report are useful for regular and stricly regular time series but not irregular time series.

Most time series generated from either Jira or git-alikes events are, at the level of the time stamp actually be recorded for each event, irregular. This irregularity is expected, since:

  1. The time stamps are recorded at the second or even millisecond level,
  2. Many Jira and git-alike events are triggered by human actions, which do not happen at second or millisecond times, and occur throughout a work day.

This report does not provide an analysis of project actions at the second or millisecond time-granularity level. Instead, the project groups data at a daily, weekly, and monthly level as a more useful reflection of project activity. Grouping in this way better reflects the cadence of project activities since, for example, a developer may not commit a code change during any particular minute during the day but may make a commit of a particular commit at some time during the day.

The resulting time series of daily, weekly, and monthly metrics data may still be irregular, regular, or stricly regular. Any metric that is not regular or strictly regular when collected at the daily, weekly, or monthly leve is not used in the analysis. The table below lists the regularity properties for each of the metrics used in the report:

Metrics Interval Regular Strictly Regular Irregular
Number Of Line Updates Rate Day TRUE FALSE FALSE
Number Of Line Updates Rate Week TRUE FALSE FALSE
Number Of Line Updates Rate Month TRUE FALSE FALSE
Number Of Line Deletions Rate Day TRUE FALSE FALSE
Number Of Line Deletions Rate Week TRUE FALSE FALSE
Number Of Line Deletions Rate Month TRUE FALSE FALSE
Number Of File Changes Rate Week TRUE FALSE FALSE
Number Of File Changes Rate Month TRUE FALSE FALSE
Number Of Line Additions Rate Week TRUE FALSE FALSE
Number Of Line Additions Rate Month TRUE FALSE FALSE
Number Of Commits Rate Month TRUE FALSE FALSE

Displacement Analysis and Cross-Correlation

From Wikipedia3:

In signal processing, cross-correlation is a measure of similarity of two series as a function of the displacement of one relative to the other.

Similarly, if two scalar random vectors X and Y are time series, then

… the cross-correlations of X with Y across time are temporal cross-correlations. In probability and statistics, the definition of correlation always includes a standardising factor in such a way that correlations have values between −1 and +1.

This report uses the cross-correlation, standardized to values between -1 and +1, between all of the possible pairings of Jira and Git-alike tools metrics that have sufficient data points to detect the strongest matching “signal” between the tools. These strongest matchings corresponds to when an increase or decrease of activity in Jira corresponds to an increase or decrease in the Git-alike. The displacement between the two time series is the computed change cycle time for the project.

Correlation Statistics

Depending on the options chosen, this report calculates cross-correlation using the Pearson Product-Moment, Spearman, or Kendall correlation coefficients. As explained below, each correlation coefficient has different advantages and disadvantages.

The correlation coefficient chosen for this report is Spearman’s correlation coefficient.

Pearson Product-Moment Correlation Coefficient

From Wikipedia4:

…the Pearson product-moment correlation coefficient … is a statistic that measures linear correlation between two variables X and Y. It has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation.

The Pearson Product-Moment correlation coefficient handles tied values well, but calculates the correlation as the linear fit between two variables. However, two variables could be strongly related to each other in some way without that relationship being linear. The Pearson correlation coefficient assumes that the two underlying variables being measured, X and Y are normally distributed, which may not be true. If the normality assumption is violated, then a non-parametric correlation coefficient, such as Spearman or Kendall, would be more robust.

Spearman Correlation Coefficient

From Laerd Statistics5:

The Spearman’s rank-order correlation is the nonparametric version of the Pearson product-moment correlation. Spearman’s correlation coefficient measures the strength and direction of association between two ranked variables.
…Spearman’s correlation determines the strength and direction of the monotonic relationship between your two variables rather than the strength and direction of the linear relationship between your two variables. A monotonic relationship is a relationship that does one of the following: (1) as the value of one variable increases, so does the value of the other variable; or (2) as the value of one variable increases, the other variable value decreases.

The Spearman correlation coefficient is useful for determining if two time series are monotonically related, which is key insight into calculating the change cycle time based on the displacement between metrics generated by Jira and Git-alike tools. However, the Spearman correlation coefficient can perform poorly when too many ties exist in the data.

Kendall Correlation Coefficient

From Wikipedia6:

…the Kendall rank correlation coefficient … is a statistic used to measure the ordinal association between two measured quantities.

…It is a measure of rank correlation: the similarity of the orderings of the data when ranked by each of the quantities.

…the Kendall correlation between two variables will be high when observations have a similar (or identical for a correlation of 1) rank (i.e. relative position label of the observations within the variable: 1st, 2nd, 3rd, etc.) between the two variables, and low when observations have a dissimilar (or fully different for a correlation of −1) rank between the two variables.

The Kendall correlation coefficient is useful for determining if two time series are monotonically related. It has a specific formaulation for addressing data that has many ties, known as Kendall’s \(\tau\)-b. However, the amount of error in any given Kendall correlation coefficient based on the number of observations is hard to calculate, making Kendall a good candidate for comparing time series that is also difficult to assess with respect to how much error might exist in the calculations.

Appendices

Metrics Comparisons that Provide Ambiguous Results

The following data plots also had sufficient data to be computed, but were not distinguised from other data plots in providing the most likely estimates of the DevOps change cycle time metric. However, these data plots should not be dismissed because addtional data or adjustments in parameters such as the window width for determining values of interest might cause a re-evaluation of the estimated change cycle time.


  1. Downloaded May 1, 2020.link

  2. Downloaded April 29, 2020.link

  3. Downloaded May 1, 2020.link

  4. Downloaded May 1, 2020.link

  5. Downloaded May 1, 2020.link

  6. Downloaded May 1, 2020.link