Distribution of worker skill level across questions has the potential to bias the outcomes of the larger crowdsourcing task. The reason is that software faults can be overlooked if they are covered by questions* that were mostly answered by lower skill workers.
*Each question covers a certain number of source code lines. Questions have the following format:“Do you believe the source code between lines 35 and 45 is related to the described failure?”
Therefore, my goal is to investigate how worker skill level is distributed across questions. Worker skill level was measured by the following three attributes. Profession: worker profession is also an indicator of quality. Professions consisted of professional programmers, hobbyists, graduate students, undergraduate students, and others.
The current analysis focuses only on the worker´s years of experience.
First let’s see how the workers are distributed in terms of years of experience.
## years_of_experience worker
## Min. : 0.000 Min. :1
## 1st Qu.: 2.000 1st Qu.:1
## Median : 5.000 Median :1
## Mean : 7.749 Mean :1
## 3rd Qu.:10.000 3rd Qu.:1
## Max. :50.000 Max. :1
We can obsreve that workers concentrated their information on multiples of 5 and 10.
Since the distribution is very skewed (to the right in this case), the average is not a good representation of the data. The alterantive in these cases is to look at the quartiles, which are shown in the boxplot below.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 2.000 5.000 7.749 10.000 50.000
Bloxplot shows that half of the data points are bewlo 5 years of experience. 25% are between 5 and 10 years of experience.
Therefore, in order to investigate the distribute of years of experience across questions, I selected the following four intervals: below 2 years, from 2 to 5 years, from 5 to 10, and above 10 years of experience.
As we can see, workers are not equally distributed across questions. This suggests that some questions might have been overlooked by having been answered by less experienced workers. As in the case of Figures 2, 3 and 4, some questions can have almost 5 times more experience workers than other questions.