Data overview

glimpse(all_data)
## Rows: 1,605
## Columns: 11
## $ repo                <chr> "opennlp", "opennlp", "opennlp", "opennlp", "openn…
## $ pr_number           <int> 153, 197, 207, 216, 216, 216, 216, 216, 240, 240, …
## $ rule                <chr> "java:S4973", "java:S112", "java:S1192", "java:S18…
## $ severity            <chr> "MAJOR", "MAJOR", "CRITICAL", "MAJOR", "MINOR", "C…
## $ file                <chr> "opennlp-tools/src/test/java/opennlp/tools/util/St…
## $ type                <chr> "BUG", "CODE_SMELL", "CODE_SMELL", "CODE_SMELL", "…
## $ status              <chr> "OPEN", "OPEN", "OPEN", "OPEN", "OPEN", "OPEN", "O…
## $ debt                <int> 5, 20, 10, 15, 5, 12, 2, 2, 8, 8, 5, 12, 10, 20, 2…
## $ ncloc_affected_file <int> 60, 23, 117, 32, 32, 32, 32, 32, 64, 64, 3, 132, 1…
## $ complexity          <int> 7, 4, 12, 5, 5, 5, 5, 5, 5, 5, 0, 18, 18, 18, 18, …
## $ origin              <chr> "NEW", "NEW", "PRE-EXISTING", "NEW", "NEW", "PRE-E…

RQ1

TDD (Technical Debt Density) by PR Distribution

## Warning in geom_text(aes(x = 0, y = 0, label = paste("Mean:", round(mean(tdd), : All aesthetics have length 1, but the data has 52 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
##   a single row.
## Warning in geom_text(aes(x = 0, y = 0, label = paste("Median:", round(median(tdd), : All aesthetics have length 1, but the data has 52 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
##   a single row.

TDV (Technical Debt Variation)

TDV Distribution

## Warning in geom_text(aes(x = 0, y = 0, label = paste("Mean:", round(mean(tdv), : All aesthetics have length 1, but the data has 52 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
##   a single row.
## Warning in geom_text(aes(x = 0, y = 0, label = paste("Median:", round(median(tdv), : All aesthetics have length 1, but the data has 52 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
##   a single row.

Percentages for TDV < 0 (TD decrease), TDV = 0 (TD unchanged) and TDV > 0 (TD increased) by repo

Mean of percentages of TDV (mean of repo percentages)

Percentages of pre-existing TDV by repo

Mean of percentages of pre-existing TDV (mean of repo percentages)

Percentages for new TDV by repo

Mean of percentages of new TDV (mean of repo percentages)

RQ2

As data is unbalanced across repositories, as shown in the issue characterization (../issues_characterization.Rmd), we will examine the top 10 fixed issues and the top 10 unfixed issues for each repository, and then aggregate them into a rank using the \(PositionScore\) metric given by the following formula:

\(PositionScore_{i} = \sum\limits_{j=1}^{k} Position_{j}\)

given a rule \(i\), its \(PositionScore_{i}\) will be the sum of its position across all \(k\) repositiries.After calculating the metrics for all rules, we rank them in a TOP 10 of the most frequent rules (fixed or unfixed, depending on the context) across repositories. The lower the PositionScore, the more frequent the rule across repositories.

TOP 10 fixed per repo ranked by count

All top 10 fixed rules for each repo.

TOP 10 fixed

TOP 10 fixed rules by PositionScore metric.

TOP 10 unfixed per repo ranked by count

All top 10 unfixed rules for each repo.

TOP 10 unfixed

TOP 10 unfixed rules by PositionScore metric.

Outliers

Higher TDD

Higher TDV

Lower TDV

Extra

Number of PRs with pre-existing TD

Number of java:S1192 issues that affect test code

Number of java:S1192 issues that affect not test code

Summary NCLOC

##      ncloc        
##  Min.   :  18.00  
##  1st Qu.:  72.25  
##  Median : 172.50  
##  Mean   : 480.85  
##  3rd Qu.: 281.00  
##  Max.   :9275.00