Do bug-covering questions receive more answers from professional developers than other questions?

Are these differences statistically significant?

Actual analysis

Professional Developers

Can we assert that bug covering questions have more professional developers answers on average?

shapiro.test(bugCoveringList\(itemCount); #Normal W = 0.94579, p-value = 0.2011 shapiro.test(not_bugCoveringList\)itemCount); #Not normal W = 0.94811, p-value = 0.0004707

wilcox.test(bugCoveringList\(itemCount,not_bugCoveringList\)itemCount, alternative= “two.sided”, paired=FALSE); #data: bugCoveringList\(itemCount and not_bugCoveringList\)itemCount #W = 1330, p-value = 0.8587

Wilcox non-parametric test could not show that the average number of answers to bug-covering and non-bug covering

questions are distinct. However, we should be cautions that the lack of evidence of an effect is not proof that of lack of effect.

All non-students answers

shapiro.test(bugCoveringList\(itemCount); #Not Normal W = 0.90395, p-value = 0.02239 shapiro.test(not_bugCoveringList\)itemCount); #Not normal W = 0.94641, p-value = 0.0003637

wilcox.test(bugCoveringList\(itemCount,not_bugCoveringList\)itemCount, alternative= “two.sided”, paired=FALSE); #data: bugCoveringList\(itemCount and not_bugCoveringList\)itemCount #W = 1196.5, p-value = 0.5333

Wilcox non-parametric test could not show that the average number of answers to bug-covering and non-bug covering

questions are distinct. However, we should be cautions that the lack of evidence of an effect is not proof that of lack of effect.

PROFESSION

All non-students answer

OK, filter does not cause bias

shapiro.test(bugCoveringList\(itemCount); #Not Normal W = 0.90395, p-value = 0.02239 shapiro.test(not_bugCoveringList\)itemCount); #Not normal W = 0.94641, p-value = 0.0003637

wilcox.test(bugCoveringList\(itemCount,not_bugCoveringList\)itemCount, alternative= “two.sided”, paired=FALSE); #data: bugCoveringList\(itemCount and not_bugCoveringList\)itemCount #W = 1196.5, p-value = 0.5333

Wilcox non-parametric test could not show that the average number of answers to bug-covering and non-bug covering

questions are distinct. However, we should be cautions that the lack of evidence of an effect is not proof that of lack of effect.

FILTER-OUT Undergraduate_Student,Graduate_Student, hobbyists, others

OK, filter does not cause bias

shapiro.test(bugCoveringList\(itemCount); #Not Normal W = 0.94579, p-value = 0.2011 shapiro.test(not_bugCoveringList\)itemCount); #Not normalW = 0.94811, p-value = 0.0004707

wilcox.test(bugCoveringList\(itemCount,not_bugCoveringList\)itemCount, alternative= “two.sided”, paired=FALSE); #data: bugCoveringList\(itemCount and not_bugCoveringList\)itemCount #W = 1330, p-value = 0.8587

Wilcox non-parametric test could not show that the average number of answers to bug-covering and non-bug covering

questions are distinct. However, we should be cautions that the lack of evidence of an effect is not proof that of lack of effect.

FILTER-OUT Graduate_Student, hobbyists, others

OK, filter does not cause bias

shapiro.test(bugCoveringList\(itemCount); #Normal W = 0.94106, p-value = 0.1566 shapiro.test(not_bugCoveringList\)itemCount); #Not normal W = 0.9589, p-value = 0.002643

wilcox.test(bugCoveringList\(itemCount,not_bugCoveringList\)itemCount, alternative= “two.sided”, paired=FALSE); #data: bugCoveringList\(itemCount and not_bugCoveringList\)itemCount #W = 1337.5, p-value = 0.8231

Wilcox non-parametric test could not show that the average number of answers to bug-covering and non-bug covering

questions are distinct. However, we should be cautions that the lack of evidence of an effect is not proof that of lack of effect.

FILTER-OUT Undergraduate_Student,Graduate_Student, others

OK, filter does not cause bias

shapiro.test(bugCoveringList\(itemCount); #Not Normal W = 0.90256, p-value = 0.02086 shapiro.test(not_bugCoveringList\)itemCount); #Not normal W = 0.95576, p-value = 0.001575

wilcox.test(bugCoveringList\(itemCount,not_bugCoveringList\)itemCount, alternative= “two.sided”, paired=FALSE); #data: bugCoveringList\(itemCount and not_bugCoveringList\)itemCount #W = 1477.5, p-value = 0.2851

Wilcox non-parametric test could not show that the average number of answers to bug-covering and non-bug covering

questions are distinct. However, we should be cautions that the lack of evidence of an effect is not proof that of lack of effect.

PROFESSION AND SCORE

worker score = 100%

Not OK, filter injected bias

shapiro.test(bugCoveringList\(itemCount); #Not Normal W = 0.88385, p-value = 0.008305 shapiro.test(not_bugCoveringList\)itemCount); #Not normal W = 0.95902, p-value = 0.002696

mean(bugCoveringList\(itemCount) #9.48 mean(not_bugCoveringList\)itemCount) #8.5

wilcox.test(bugCoveringList\(itemCount,not_bugCoveringList\)itemCount, alternative= “two.sided”, paired=FALSE, conf.int = TRUE); #data: bugCoveringList\(itemCount and not_bugCoveringList\)itemCount #W = 1643, p-value = 0.03867 #alternative hypothesis: true location shift is not equal to 0 #95 percent confidence interval: # 3.075891e-05 1.999998e+00 #sample estimates: # difference in location #0.9999684

Wilcox non-parametric test SHOWED that the average number of answers for bug covering and non-bug covering are distinct with a 95% confidence interval.

Moreover, bug-covering received more answers in average. This unbalance implies that we cannot trust that the filter is actually fair. However, the difference

between the two median number of answers is small (approximately one answer of difference).

Therefore, we kept the outcome of the filter but we marked it with a star.

worker score = 80% or 60%

Not OK, filter injected bias

shapiro.test(bugCoveringList\(itemCount); #Not Normal W = 0.90188, p-value = 0.02016 shapiro.test(not_bugCoveringList\)itemCount); #Not normal W = 0.96726, p-value = 0.01121

mean(bugCoveringList\(itemCount) #10.28 mean(not_bugCoveringList\)itemCount) #11.33654

wilcox.test(bugCoveringList\(itemCount,not_bugCoveringList\)itemCount, alternative= “two.sided”, paired=FALSE, conf.int = TRUE); #data: bugCoveringList\(itemCount and not_bugCoveringList\)itemCount #W = 924.5, p-value = 0.0238 #alternative hypothesis: true location shift is not equal to 0 #95 percent confidence interval: # -1.999977e+00 -4.944597e-05 #sample estimates: # difference in location -1.000029

Wilcox non-parametric test SHOWED that the average number of answers for bug covering and non-bug covering are distinct with a 95% confidence interval.

Moreover, bug-covering received more answers in average. This unbalance implies that we cannot trust that the filter is actually fair. However, the difference

between the two median number of answers is small (approximately one answer of difference).

Therefore, we kept the outcome of the filter but we marked it with a star.

PROFESSION AND SCORE

Not OK, filter injected bias

non-students score = 100% (Professional_Developers, Hobbyists, and Others)

shapiro.test(bugCoveringList\(itemCount); #Normal W = 0.95015, p-value = 0.2527 shapiro.test(not_bugCoveringList\)itemCount); #Not normal W = 0.96348, p-value = 0.005757

mean(bugCoveringList\(itemCount) #7.36 mean(not_bugCoveringList\)itemCount) #6.27

wilcox.test(bugCoveringList\(itemCount,not_bugCoveringList\)itemCount, alternative= “two.sided”, paired=FALSE, conf.int = TRUE); #data: bugCoveringList\(itemCount and not_bugCoveringList\)itemCount #W = 1715, p-value = 0.01214 #alternative hypothesis: true location shift is not equal to 0 #95 percent confidence interval: # 3.068510e-06 1.999979e+00 #sample estimates difference in location = 1.000018

Wilcox non-parametric test SHOWED that the average number of answers for bug covering and non-bug covering are distinct with a 95% confidence interval.

Moreover, bug-covering received more answers in average. This unbalance implies that we cannot trust that the filter is actually fair, therefore the

outcome of this filter cannot be un For this reason, we cannot

FILTER OUT (all students) && (non-students below 60%)

leave only non-students with score 100% and 80%

OK, filter does not cause bias

shapiro.test(bugCoveringList\(itemCount); #Normal W = 0.93489, p-value = 0.1128 shapiro.test(not_bugCoveringList\)itemCount); #Not normal W = 0.96072, p-value = 0.003591

mean(bugCoveringList\(itemCount) #11.28 mean(not_bugCoveringList\)itemCount) #10.49

wilcox.test(bugCoveringList\(itemCount,not_bugCoveringList\)itemCount, alternative= “two.sided”, paired=FALSE, conf.int = TRUE); #data: bugCoveringList\(itemCount and not_bugCoveringList\)itemCount #W = 1547, p-value = 0.1384

Wilcox non-parametric test COULD NOT SHOW that the average number of answers for bug covering and non-bug covering are distinct with a 95% confidence interval.