glimpse(all_data)
## Rows: 1,605
## Columns: 11
## $ repo <chr> "opennlp", "opennlp", "opennlp", "opennlp", "openn…
## $ pr_number <int> 153, 197, 207, 216, 216, 216, 216, 216, 240, 240, …
## $ rule <chr> "java:S4973", "java:S112", "java:S1192", "java:S18…
## $ severity <chr> "MAJOR", "MAJOR", "CRITICAL", "MAJOR", "MINOR", "C…
## $ file <chr> "opennlp-tools/src/test/java/opennlp/tools/util/St…
## $ type <chr> "BUG", "CODE_SMELL", "CODE_SMELL", "CODE_SMELL", "…
## $ status <chr> "OPEN", "OPEN", "OPEN", "OPEN", "OPEN", "OPEN", "O…
## $ debt <int> 5, 20, 10, 15, 5, 12, 2, 2, 8, 8, 5, 12, 10, 20, 2…
## $ ncloc_affected_file <int> 60, 23, 117, 32, 32, 32, 32, 32, 64, 64, 3, 132, 1…
## $ complexity <int> 7, 4, 12, 5, 5, 5, 5, 5, 5, 5, 0, 18, 18, 18, 18, …
## $ origin <chr> "NEW", "NEW", "PRE-EXISTING", "NEW", "NEW", "PRE-E…
## Warning in geom_text(aes(x = 0, y = 0, label = paste("Mean:", round(mean(tdd), : All aesthetics have length 1, but the data has 52 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
## a single row.
## Warning in geom_text(aes(x = 0, y = 0, label = paste("Median:", round(median(tdd), : All aesthetics have length 1, but the data has 52 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
## a single row.
## Warning in geom_text(aes(x = 0, y = 0, label = paste("Mean:", round(mean(tdv), : All aesthetics have length 1, but the data has 52 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
## a single row.
## Warning in geom_text(aes(x = 0, y = 0, label = paste("Median:", round(median(tdv), : All aesthetics have length 1, but the data has 52 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
## a single row.
As data is unbalanced across repositories, as shown in the issue characterization (../issues_characterization.Rmd), we will examine the top 10 fixed issues and the top 10 unfixed issues for each repository, and then aggregate them into a rank using the \(PositionScore\) metric given by the following formula:
\(PositionScore_{i} = \sum\limits_{j=1}^{k} Position_{j}\)
given a rule \(i\), its \(PositionScore_{i}\) will be the sum of its position across all \(k\) repositiries.After calculating the metrics for all rules, we rank them in a TOP 10 of the most frequent rules (fixed or unfixed, depending on the context) across repositories. The lower the PositionScore, the more frequent the rule across repositories.
All top 10 fixed rules for each repo.
TOP 10 fixed rules by PositionScore metric.
All top 10 unfixed rules for each repo.
TOP 10 unfixed rules by PositionScore metric.
## ncloc
## Min. : 18.00
## 1st Qu.: 72.25
## Median : 172.50
## Mean : 480.85
## 3rd Qu.: 281.00
## Max. :9275.00