Type of Analysis: Descriptive analysis - Univariate description

Intention of Analysis: 1- Understand how Core WordPress Community coproduce code. 2- Identify potential indicators for coherence analysis.

General Question: Wich are the atributes of coproduction (colunms of dataframe or variables)? Specific Questions: Which type of developers groups exist in WC? It is possible to make some indicator from this data?

Source: Data come from WordPress Report Trac System. URL Source Dataframe: GitHub Repository

Date collection: 04/07/2019.

1.GENERAL ANALYSIS

#READ DATA
library(readr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(plotly)
## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(tidyr)

TicketW <- read_csv('~/PhD Analysis/1. PhD escriptive exploratory analysis/TicketW.csv')
## Parsed with column specification:
## cols(
##   id = col_double(),
##   Summary = col_character(),
##   Status = col_character(),
##   Version = col_logical(),
##   Owner = col_character(),
##   Type = col_character(),
##   Priority = col_character(),
##   Milestone = col_character(),
##   Component = col_character(),
##   Severity = col_character(),
##   Resolution = col_character(),
##   Created = col_character(),
##   Modified = col_character(),
##   Focuses = col_character(),
##   Reporter = col_character(),
##   Keywords = col_character()
## )
View(TicketW)
dim(TicketW) #dimension
## [1] 2333   16
TicketW[1:5,]  #5 fist lines
## # A tibble: 5 x 16
##      id Summary Status Version Owner Type  Priority Milestone Component
##   <dbl> <chr>   <chr>  <lgl>   <chr> <chr> <chr>    <chr>     <chr>    
## 1 24579 Add Dr~ new    NA      <NA>  enha~ high     Future R~ Upgrade/~
## 2 30361 Correc~ assig~ NA      pento task~ high     <NA>      General  
## 3 32502 Cannot~ new    NA      <NA>  defe~ high     <NA>      Administ~
## 4 36441 Custom~ new    NA      <NA>  defe~ high     Future R~ Customize
## 5 40439 Save p~ assig~ NA      mike~ enha~ high     5.3       Media    
## # ... with 7 more variables: Severity <chr>, Resolution <chr>,
## #   Created <chr>, Modified <chr>, Focuses <chr>, Reporter <chr>,
## #   Keywords <chr>
summary(TicketW)
##        id          Summary             Status          Version       
##  Min.   : 5235   Length:2333        Length:2333        Mode:logical  
##  1st Qu.:34555   Class :character   Class :character   NA's:2333     
##  Median :40511   Mode  :character   Mode  :character                 
##  Mean   :38029                                                       
##  3rd Qu.:44485                                                       
##  Max.   :47640                                                       
##     Owner               Type             Priority        
##  Length:2333        Length:2333        Length:2333       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##   Milestone          Component           Severity        
##  Length:2333        Length:2333        Length:2333       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##   Resolution          Created            Modified        
##  Length:2333        Length:2333        Length:2333       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##    Focuses            Reporter           Keywords        
##  Length:2333        Length:2333        Length:2333       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
## 
glimpse(TicketW)
## Observations: 2,333
## Variables: 16
## $ id         <dbl> 24579, 30361, 32502, 36441, 40439, 41292, 41886, 41...
## $ Summary    <chr> "Add Drag'n'Drop UI to plugin and theme manual uplo...
## $ Status     <chr> "new", "assigned", "new", "new", "assigned", "reope...
## $ Version    <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ Owner      <chr> NA, "pento", NA, NA, "mikeschroder", "jnylen0", "me...
## $ Type       <chr> "enhancement", "task (blessed)", "defect (bug)", "d...
## $ Priority   <chr> "high", "high", "high", "high", "high", "high", "hi...
## $ Milestone  <chr> "Future Release", NA, NA, "Future Release", "5.3", ...
## $ Component  <chr> "Upgrade/Install", "General", "Administration", "Cu...
## $ Severity   <chr> "normal", "normal", "major", "normal", "normal", "n...
## $ Resolution <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ Created    <chr> "06/14/2013 05:03:38 PM", "11/17/2014 12:10:55 PM",...
## $ Modified   <chr> "04/12/2019 11:04:54 AM", "06/04/2019 07:42:28 PM",...
## $ Focuses    <chr> NA, "ui, administration", NA, NA, "ui", NA, NA, NA,...
## $ Reporter   <chr> "tw2113", "pento", "ryan", "azaozz", "mikeschroder"...
## $ Keywords   <chr> "ui-feedback ux-feedback needs-patch shiny-updates"...

Function unianalysis() - Transform vector into a data frame with frequency of levels and proportion

# Transform vector into a data frame with frequency of levels and proportion
.Unianalysis = function (x) {
    y <- as.data.frame(as.table(table(x)))
    y <- mutate(y, proportion = prop.table(y$Freq) *100)#Proportion
    y <- arrange(y, desc(y$Freq))
return(y)
}

1.1 Variables related with members Analysis Goal: Find which variables have hight variability, and find a line of cut, in order to use into Bivariate Analysis.

# Var1
Status<-.Unianalysis(TicketW$Status)

#Var2
TType<-.Unianalysis(TicketW$Type)

#Var3
Priority<-.Unianalysis(TicketW$Priority)

#Var4
Milestone<-.Unianalysis(TicketW$Milestone)

#Var5
Component<-.Unianalysis(TicketW$Component) 
Component = filter(Component, Freq>90) # Filter components with more than 100 tickets
sum(Component$Freq) #Total Tickets into most frequent components (more than 100 tickets)
## [1] 738
#Var6
Severity<-.Unianalysis(TicketW$Severity)

#Var7
Focuses<-.Unianalysis(TicketW$Focuses)
Focuses = filter(Focuses, Freq>10) # Filter Focuses with more than 10 tickets
Focuses<-Focuses[order(Focuses$Freq, decreasing = TRUE),]
sum(Focuses$Freq) #Total Tickets into most frequent components (more than 100 tickets)
## [1] 832
#Var8
Keywords<-.Unianalysis(TicketW$Keywords)
Keywords = filter(Keywords, Freq>8)
Keywords<-Keywords[order(Keywords$Freq, decreasing = TRUE),]
sum(Keywords$Freq)
## [1] 1082

Functions for establish groups of active (GroupActive()), median (GroupMedian()), or less active (GroupAlien()) members:

#1. Filter Group of agents 
.Grouping = function(x, less, more) {
        y = filter(x, Freq < less & Freq > more)
        x <- y[order(y$Freq, decreasing = TRUE),]
        return(x)
}

1.2 Variable Members Analysis - Reporters: Reporters are WordPress Community members who find and report a problem from WP Platform, into a Ticket.

Goal: Find type of report member groups (actives, medians, and aliens), and find a line of cut, in order to use into Bivariate Analysis.

#Find Groups and quantity of people for each group
             
Reporter<-.Unianalysis(TicketW$Reporter) 
totalReporter<-nrow(Reporter)

ActiveReporters<-.Grouping(Reporter, 1000, 10)
totalAR<-nrow(ActiveReporters)

MedianReporters<-.Grouping(Reporter, 10, 4)
totalMR<-nrow(MedianReporters)

LessReporters<-.Grouping(Reporter, 4, 0)
totalLR<-nrow(LessReporters)

1.3 Variable Member Analysis - Owners: Owners are WordPress Community members who pick up a ticket from WP Platform (sended by a reporter) in order to solve it.

Goal: Find type of owner member groups (actives, medians, and aliens), and find a line of cut, in order to use into Bivariate Analysis.

Owner <-.Unianalysis(TicketW$Owner) 
totalOwner <-nrow(Owner)

#Find Groups and quantity of people for each group
             
ActiveOwner <-.Grouping(Owner, 1000, 10)
totalAO <-nrow(ActiveOwner)

MedianOwner <-.Grouping(Owner, 10, 4)
totalMO <-nrow(MedianOwner)

LessOwner <-.Grouping(Owner, 4, 0)
totalLO <-nrow(LessOwner)

1.4 Sum of groups by Reporters and Owners:

#Total members per Reporter
MembersTotal<-c(totalAR,totalMR,totalLR)
CoreGroup<-rbind("Active Reporters","Median Reporters","Alien Reporters")
WPCGroupR<-data.frame(CoreGroup,MembersTotal)

#Total members per Owner
MembersTotal<-c(totalAO,totalMO,totalLO)
CoreGroup<-rbind("Active Owners","Median Owners","Alien Owners")
WPCGroupO<-data.frame(CoreGroup,MembersTotal)

#Total members per every groups
MembersTotal<- c(totalOwner,totalReporter)
CoreGroup <- rbind("Owners","Reporters")
WPCGroups <- data.frame(CoreGroup,MembersTotal)

2. DESCRIPTION ANALYSIS REPORT :

2.1 Variables related with members Report: Variables selected to bivariate analysis are Component(+100 tickets per level), Focuses (+1 Ticket per level), Keywords(+9 tickets per level), Type, Status.

library(plotrix)
library(plotly)
library(ggplot2)
library(wordcloud)
## Loading required package: RColorBrewer
#Graphic Function
#Fun Plot
.Plot_FunPlot = function(x,y) {
  fan.plot(x$Freq,
           max.span=pi,
           labels=paste(x$x, x$Freq, sep=": "),
            main=y,ticks=360)
}
#Word Plot
.Plot_word = function(x, num1, num2) {
     wordcloud(words = x$x, freq = x$Freq, min.freq = num1,
              max.words=num2, random.order=FALSE, rot.per=0.35, 
              colors=brewer.pal(8, "Dark2")) 
}

Status#Var1
##           x Freq proportion
## 1       new 1844  79.039863
## 2  assigned  215   9.215602
## 3  reopened  116   4.972139
## 4 reviewing  105   4.500643
## 5  accepted   53   2.271753
#Graphics
.Plot_FunPlot(Status, "Tickets per Status")

TType# Var2
##                 x Freq proportion
## 1     enhancement 1109  47.535362
## 2    defect (bug) 1012  43.377625
## 3 feature request  168   7.201029
## 4  task (blessed)   44   1.885984
#Graphics
.Plot_FunPlot(TType, "Tickets per Type")

Priority# Var3
##        x Freq proportion
## 1 normal 2265 97.0852979
## 2    low   49  2.1003000
## 3   high   12  0.5143592
## 4 lowest    7  0.3000429
#Graphics
.Plot_FunPlot(Priority, "Tickets per Priority")

Milestone#Var4
##                 x Freq  proportion
## 1 Awaiting Review 1104 60.19629226
## 2  Future Release  544 29.66194111
## 3             5.3  172  9.37840785
## 4   WordPress.org   13  0.70883315
## 5           5.2.3    1  0.05452563
#Graphics
.Plot_FunPlot(Milestone, "Tickets per Milestone")

Component#Var5
##                   x Freq proportion
## 1           General  205   8.786970
## 2             Media  166   7.115302
## 3    Administration  162   6.943849
## 4 Posts, Post Types  114   4.886412
## 5             Users   91   3.900557
#Graphic
.Plot_FunPlot(Component, "Tickets per Component (more than 90 tickets)")

Severity #Var6
##          x Freq  proportion
## 1   normal 2186 93.69909987
## 2    minor   91  3.90055722
## 3    major   27  1.15730819
## 4  trivial   19  0.81440206
## 5 critical    9  0.38576940
## 6  blocker    1  0.04286327
#Graphics
.Plot_FunPlot(Severity, "Tickets per Severity")

Focuses#Var7
##                     x Freq proportion
## 1      administration  192  19.512195
## 2                  ui  103  10.467480
## 3  ui, administration  101  10.264228
## 4           multisite   95   9.654472
## 5                docs   58   5.894309
## 6         performance   55   5.589431
## 7   ui, accessibility   51   5.182927
## 8          javascript   49   4.979675
## 9            template   48   4.878049
## 10           rest-api   30   3.048780
## 11   coding-standards   27   2.743902
## 12      accessibility   23   2.337398
#Graphics
pie(Focuses$Freq, main="Focuses", label = paste(Focuses$x, sep=": ", Focuses$Freq), col = rainbow(7))

.Plot_word(Focuses, 20, 200)

Keywords#Var8
##                               x Freq proportion
## 1                   needs-patch  266 15.2873563
## 2                     has-patch  246 14.1379310
## 3             reporter-feedback   57  3.2758621
## 4                   2nd-opinion   55  3.1609195
## 5       has-patch needs-refresh   47  2.7011494
## 6  needs-patch needs-unit-tests   46  2.6436782
## 7       has-patch needs-testing   44  2.5287356
## 8        has-patch dev-feedback   40  2.2988506
## 9      has-patch has-unit-tests   31  1.7816092
## 10        has-patch 2nd-opinion   29  1.6666667
## 11                 dev-feedback   28  1.6091954
## 12                        close   27  1.5517241
## 13   has-patch needs-unit-tests   24  1.3793103
## 14      needs-patch 2nd-opinion   19  1.0919540
## 15     good-first-bug has-patch   18  1.0344828
## 16  has-patch reporter-feedback   17  0.9770115
## 17              has-screenshots   15  0.8620690
## 18    has-screenshots has-patch   15  0.8620690
## 19    has-patch has-screenshots   14  0.8045977
## 20     dev-feedback needs-patch   13  0.7471264
## 21     needs-patch dev-feedback   13  0.7471264
## 22      2nd-opinion needs-patch    9  0.5172414
## 23       dev-feedback has-patch    9  0.5172414
#Graphics
pie(Keywords$Freq, main="Focuses", label = paste(Keywords$x, sep=": ", Keywords$Freq), col = rainbow(7))

.Plot_word(Keywords, 4, 300)
## Warning in wordcloud(words = x$x, freq = x$Freq, min.freq = num1, max.words
## = num2, : has-patch could not be fit on page. It will not be plotted.

2.2 Reporter Members Analysis Report The groups of Reporters:

ActiveReporters #Ranking the most active reporters
##                   x Freq proportion
## 1       johnbillion   72  3.0861552
## 2           afercia   53  2.2717531
## 3   johnjamesjacoby   47  2.0145735
## 4  sebastian.pisula   43  1.8431204
## 5          flixos90   42  1.8002572
## 6   danielbachhuber   39  1.6716674
## 7        karmatosed   33  1.4144878
## 8            rmccue   27  1.1573082
## 9  subrataemfluence   25  1.0715817
## 10      westonruter   24  1.0287184
## 11            nacin   23  0.9858551
## 12          desrosj   22  0.9429919
## 13  garrett-eclipse   22  0.9429919
## 14           pbiron   22  0.9429919
## 15           azaozz   21  0.9001286
## 16             dd32   21  0.9001286
## 17     boonebgorges   19  0.8144021
## 18            pento   17  0.7286755
## 19            ramiy   17  0.7286755
## 20           scribu   17  0.7286755
## 21          anevins   16  0.6858123
## 22         dshanske   15  0.6429490
## 23            helen   15  0.6429490
## 24          iseulde   15  0.6429490
## 25      markjaquith   15  0.6429490
## 26        melchoyce   15  0.6429490
## 27   SergeyBiryukov   15  0.6429490
## 28       swissspidy   15  0.6429490
## 29        tazotodua   15  0.6429490
## 30        Presskopp   14  0.6000857
## 31     henry.wright   13  0.5572225
## 32           mark-k   12  0.5143592
## 33         mukesh27   12  0.5143592
## 34    peterwilsoncc   11  0.4714959
MedianReporters #Ranking the Median active reporters
##                 x Freq proportion
## 1       ericlewis    8  0.3429061
## 2         iandunn    8  0.3429061
## 3      jeremyfelt    8  0.3429061
## 4     joostdevalk    8  0.3429061
## 5          kjellr    8  0.3429061
## 6          netweb    8  0.3429061
## 7       pavelevap    8  0.3429061
## 8      programmin    8  0.3429061
## 9       alexvorn2    7  0.3000429
## 10           andy    7  0.3000429
## 11         eclare    7  0.3000429
## 12    Frank Klein    7  0.3000429
## 13       jdgrimes    7  0.3000429
## 14        pbearne    7  0.3000429
## 15     soulseekah    7  0.3000429
## 16 wonderboymusic    7  0.3000429
## 17           xkon    7  0.3000429
## 18            dlh    6  0.2571796
## 19        Ipstenu    6  0.2571796
## 20 jonoaldersonwp    6  0.2571796
## 21         jorbin    6  0.2571796
## 22        kraftbj    6  0.2571796
## 23          mor10    6  0.2571796
## 24      smerriman    6  0.2571796
## 25    WraithKenny    6  0.2571796
## 26          yoavf    6  0.2571796
## 27      allancole    5  0.2143163
## 28         BjornW    5  0.2143163
## 29   ComputerGuru    5  0.2143163
## 30          imath    5  0.2143163
## 31       jipmoors    5  0.2143163
## 32      joemcgill    5  0.2143163
## 33     mikejolley    5  0.2143163
## 34     milana_cap    5  0.2143163
## 35          Rarst    5  0.2143163
## 36          rnaby    5  0.2143163
LessReporters[c(1:30),c(1:2)] #Ranking the 30 less active Reporters
##                    x Freq
## 1           allendav    3
## 2              arena    3
## 3       chetan200891    3
## 4  chinteshprajapati    3
## 5           danieltj    3
## 6  Denis-de-Bernardy    3
## 7          diddledan    3
## 8            dimadin    3
## 9         dotancohen    3
## 10             duck_    3
## 11           fliespl    3
## 12             GaryJ    3
## 13       hlashbrooke    3
## 14          ishitaka    3
## 15   jason_the_adams    3
## 16         javorszky    3
## 17          joehoyle    3
## 18          joyously    3
## 19       jtsternberg    3
## 20           keraweb    3
## 21  ketanumretiya030    3
## 22            knutsp    3
## 23         krogsgard    3
## 24         mehulwpos    3
## 25      MikeHansenMe    3
## 26         Mista-Flo    3
## 27          mnelson4    3
## 28         monikarao    3
## 29            mrwweb    3
## 30           mt8.biz    3
#Grafic of Top 5 active Reporters
.Plot_FunPlot(ActiveReporters[1:5, 1:2], "5 Top Active Reporters")

.Plot_FunPlot(MedianReporters[1:5, 1:2], "5 Top Median Reporters")

.Plot_FunPlot(LessReporters[1:5, 1:2], "5 Top Less Active Reporters")

2.3 Owner Members Analysis Report: The groups of Owners:

ActiveOwner#Ranking the most active Owners
##                 x Freq proportion
## 1  SergeyBiryukov   67  18.662953
## 2        audrasjb   18   5.013928
## 3     johnbillion   16   4.456825
## 4       chriscct7   15   4.178273
## 5       joemcgill   14   3.899721
## 6 adamsilverstein   13   3.621170
MedianOwner#Ranking the Median Owners
##                 x Freq proportion
## 1         desrosj    9   2.506964
## 2           nacin    8   2.228412
## 3         afercia    7   1.949861
## 4    boonebgorges    7   1.949861
## 5           pento    7   1.949861
## 6          azaozz    6   1.671309
## 7        flixos90    6   1.671309
## 8      jeremyfelt    6   1.671309
## 9          rmccue    6   1.671309
## 10          westi    6   1.671309
## 11    westonruter    5   1.392758
## 12 wonderboymusic    5   1.392758
LessOwner[c(1:30),c(1:2)]#Ranking the 30 less active Owners
##                    x Freq
## 1               dd32    3
## 2           joehoyle    3
## 3           tz-media    3
## 4               xkon    3
## 5      davidakennedy    2
## 6           dswebsme    2
## 7          ericlewis    2
## 8    garrett-eclipse    2
## 9        ianbelanger    2
## 10           iseulde    2
## 11           jnylen0    2
## 12       joostdevalk    2
## 13           kapeels    2
## 14       markjaquith    2
## 15        schlessera    2
## 16          sorich87    2
## 17      technosailor    2
## 18      valendesigns    2
## 19     aaroncampbell    1
## 20          adam3128    1
## 21       allisonplus    1
## 22             antpb    1
## 23          bassgang    1
## 24 bhargavbhandari90    1
## 25           birgire    1
## 26   danielbachhuber    1
## 27      danielpataki    1
## 28     davidjlaietta    1
## 29      DrewAPicture    1
## 30             duck_    1
#Grafic of Owners

#Grafic of Top 5 active Owners
.Plot_FunPlot(ActiveOwner[1:5, 1:2], "5 Top Active Owners")

.Plot_FunPlot(MedianOwner[1:5, 1:2], "5 Top Median Owners")

.Plot_FunPlot(LessOwner[1:5, 1:2], "5 Top Less Active Owners")

2.4 Comparation of groups by Reporters and Owners:

WPCGroups
##   CoreGroup MembersTotal
## 1    Owners          116
## 2 Reporters         1028
par(mfrow=c(1,2))
barplot(WPCGroupR$MembersTotal,
        names.arg=WPCGroupR$MembersTotal,
        xlab="Reporter Groups",
        ylab="Total Members",
        legend=WPCGroupR$CoreGroup,
        col=rainbow(8),
        main="Reporters per Groups",border="red")

barplot(WPCGroupO$MembersTotal,
        names.arg=WPCGroupO$MembersTotal,
        xlab="Owners Groups",
        ylab="Total Members",
        legend=WPCGroupO$CoreGroup,
        col=rainbow(8),
        main="Owners per Groups",border="red")

# Load ggplot2
library(ggplot2)
library(dplyr)
# library
library(treemap)

# treemap
.Tree_Map = function(x){
  group <- paste(x$CoreGroup, x$MembersTotal)
  value <- x$MembersTotal
  data <- data.frame(group,value)
     treemap(data,
            index="group",
            vSize="value",
            type="index")  
}


par(mfrow=c(1,3))
.Tree_Map(WPCGroups) #Total Owner and Reporters

.Tree_Map(WPCGroupO) #Groups of Owners 

.Tree_Map(WPCGroupR) #Groups of Reporters

3. POTENTIAL INDICATORS:

Hypothesis: \(PropMo = PropMR\)

Means: percentual of tickets into any subgroups into Owner Groups are similar thand percentual of tickets into any subgroups into Reporter Groups

Where PropMo is the proportion of tickets per active, median, and alien groups of Owners, and PropMR is the proportion of tickets per active, median, and alien groups of Reporters.

Hypothesis test:

PropMR<-prop.table(WPCGroupR$MembersTotal)# Proportion
PropMO<-prop.table(WPCGroupO$MembersTotal)# Proportion

similarityOR<-ifelse(PropMO>PropMR,1-(PropMO-PropMR),1-(PropMR-PropMO))
similarityOR<-mean(similarityOR)
similarityOR
## [1] 0.9396642

Thus, Mode_Agents_Rules indicator is = similarityOR.

4. FINAL CONCLUSIONS:

There are more Reporters than Owners. Maybe because to report a problem in the WorPress Platform is easear than fix a problem. There are a two members, who can been considers the super active menbers of WP Community: SergeyBiryukov and johnbillion. SergeyBiryukov (number 1 of owners) was reponsable to 67 tickets, and also he report 15 Tickets (23o position of reporters). johnbillion (number 1 of owners) reports 72 tickets, and also was responsable for 16 tickets (3o position of owners).

The hypothesis test proof (result: 0.9772515) indicates that there are a strong coesion into formals groups of core: owners and reporters.

Despite the most of tickets have no owner, there are not so many urgent issues. Around 93% of tickets are classified as Severity “normal”, and 97% are classified as Priority “Normal”.

According with this description, are recommended analyse in a bivariable description the folow variables: