Correlações no SIP Dataset

O que são os dados

## Rows: 12,299
## Columns: 17
## $ task_number            <chr> "1735", "1742", "1971", "2134", "2251", "2283",~
## $ summary                <chr> "Flag RI on SCM Message Summary screen using me~
## $ priority               <dbl> 1, 1, 2, 5, 10, 1, 5, 5, 6, 5, 2, 1, 3, 1, 1, 8~
## $ raised_by_id           <chr> "58", "58", "7", "50", "46", "13", "13", "13", ~
## $ assigned_to_id         <chr> "58", "42", "58", "42", "13", "13", "13", "58",~
## $ authorised_by_id       <chr> "6", "6", "6", "6", "6", "58", "6", "6", "6", "~
## $ status_code            <chr> "FINISHED", "FINISHED", "FINISHED", "FINISHED",~
## $ project_code           <chr> "PC2", "PC2", "PC2", "PC2", "PC2", "PC9", "PC2"~
## $ project_breakdown_code <chr> "PBC42", "PBC21", "PBC75", "PBC42", "PBC21", "P~
## $ category               <chr> "Development", "Development", "Operational", "D~
## $ sub_category           <chr> "Enhancement", "Enhancement", "In House Support~
## $ hours_estimate         <dbl> 14.00, 7.00, 0.70, 0.70, 3.50, 7.00, 7.00, 7.00~
## $ hours_actual           <dbl> 1.75, 7.00, 0.70, 0.70, 3.50, 7.00, 7.00, 7.00,~
## $ developer_id           <chr> "58", "42", "58", "42", "13", "13", "43", "58",~
## $ developer_hours_actual <dbl> 1.75, 7.00, 0.70, 0.70, 3.50, 7.00, 7.00, 7.00,~
## $ task_performance       <dbl> 12.25, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00~
## $ developer_performance  <dbl> 12.25, 0.00, 0.00, 0.00, 0.00, 0.00, NA, 0.00, ~

Entendendo os dados

## # A tibble: 1 x 6
##   projetos categoprias sub_categorias estimativas tasks prioridades
##      <int>       <int>          <int>       <int> <int>       <int>
## 1       20           3             24       12299 10266          10

Temos 20 projetos, com 12299 estimativas de tempo para tasks. Não há apenas uma estimativa por tarefa, já que há apenas 10266 valores distintos de task_number.

## # A tibble: 10,268 x 4
##    task_number category    summary                                             n
##    <chr>       <chr>       <chr>                                           <int>
##  1 10605       Management  Staff Meeting                                       8
##  2 6889        Management  SiP Staff Meeting                                   8
##  3 10089       Operational Office Move and bits and bobs                       7
##  4 10974       Management  Extended SiP Lunch                                  7
##  5 11056       Management  SiP Company Meeting                                 7
##  6 11270       Management  Staff Meeting                                       7
##  7 13124       Management  Company Meeting - scorecard and discussion          7
##  8 13190       Management  Marketing management meeting                        7
##  9 13253       Management  YYY ZZZ's Marketing presentation and meeting        7
## 10 3812        Development Weekly Developer Meeting 14th September 2005 -~     7
## # ... with 10,258 more rows

Qual a relação entre as estimativas e horas reais tomadas na empresa como um todo e como é essa relação em diferentes subcategorias de tarefa?

## # A tibble: 24 x 3
##    sub_category            tasks correlacao
##    <chr>                   <int>      <dbl>
##  1 Board Meeting              21      0.911
##  2 Business Specification     96      0.861
##  3 General Documentation     254      0.857
##  4 Support                  1045      0.837
##  5 Client Support            616      0.829
##  6 Enhancement              2592      0.821
##  7 Technical Specification    61      0.807
##  8 Third Party                18      0.805
##  9 Conversion                 75      0.799
## 10 Documentation              72      0.788
## # ... with 14 more rows

To provide examples of the relationship between the estimates and actual hours taken in the company as a whole and what this relationship looks like in different task subcategories. We present the correlation of the variables hours estimate and hours current in graph 1. The relationship of the two variables is shown in the log scale on base 10. In this graph, we have the hours estimate visually in the vertical axis and the current hours taken in the company in the horizontal axis. Given this, we can identify for example that the Board Meeting subcategory has a very strong correlation and Research has a weaker correlation as shown in the illustration shown in this graph.

Há uma relação perceptível entre o tamanho da equipe está relacionado com o erro médio das estimativas da equipe? Como é essa relação?

In graph 2, we have the evidence of the relationship between the team size and the average abs error of the team estimates visually. According to these samples, there is a noticeable relationship between the size of the team and is related to the average error of the team’s estimates. Therefore, it is concluded that the graph indicates when there are atypical observations, outliers and extremes as it is possible to be studied in the graph.

Qual a relação entre prioridade da tarefa e erro na sua estimativa?

In graph 3, it expresses the evidence of the relationship between task priority and the error in its estimate taken at the company. In this graph, the task’s priorities are organized on the horizontal axis and errors in its estimates are organized on the vertical axis of the graph. The results are presented on the square root scale. According to Kendall’s correlation of -0.02498766, it revealed no relationship. We can conclude that there is no positive or negative relationship between priority and absolute error, that is, the developers in these results, make similar mistakes in low-priority and high-priority estimates. It is concluded that the error is similar. We highlight only one outlier for priority tasks one and five, as it obtains an atypical value for the other tasks, as shown in the evidence in this graph.

Correlações no SIP Dataset

Fernando Tomaz

O que são os dados

Entendendo os dados

Qual a relação entre as estimativas e horas reais tomadas na empresa como um todo e como é essa relação em diferentes subcategorias de tarefa?

Há uma relação perceptível entre o tamanho da equipe está relacionado com o erro médio das estimativas da equipe? Como é essa relação?

Qual a relação entre prioridade da tarefa e erro na sua estimativa?