分析的问题:
library(bupaR)
##
## Attaching package: 'bupaR'
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:utils':
##
## timestamp
patients
## # Log of 5442 events consisting of:
## 7 traces
## 500 cases
## 2721 instances of 7 activities
## 7 resources
## Events occurred from 2017-01-02 11:41:53 until 2018-05-05 07:16:02
##
## # Variables were mapped as follows:
## Case identifier: patient
## Activity identifier: handling
## Resource identifier: employee
## Activity instance identifier: handling_id
## Timestamp: time
## Lifecycle transition: registration_type
##
## # A tibble: 5,442 × 7
## handling patient employee handling_id regist…¹ time .order
## <fct> <chr> <fct> <chr> <fct> <dttm> <int>
## 1 Registration 1 r1 1 start 2017-01-02 11:41:53 1
## 2 Registration 2 r1 2 start 2017-01-02 11:41:53 2
## 3 Registration 3 r1 3 start 2017-01-04 01:34:05 3
## 4 Registration 4 r1 4 start 2017-01-04 01:34:04 4
## 5 Registration 5 r1 5 start 2017-01-04 16:07:47 5
## 6 Registration 6 r1 6 start 2017-01-04 16:07:47 6
## 7 Registration 7 r1 7 start 2017-01-05 04:56:11 7
## 8 Registration 8 r1 8 start 2017-01-05 04:56:11 8
## 9 Registration 9 r1 9 start 2017-01-06 05:58:54 9
## 10 Registration 10 r1 10 start 2017-01-06 05:58:54 10
## # … with 5,432 more rows, and abbreviated variable name ¹registration_type
一共有7条trace,有500个case,有7种activities,有7种resource。时间范围是:2017-01-02 11:41:53 until 2018-05-05 07:16:02
框架只需要了解这一幅图就可以了.
通常在流程中,执行人记录在resource这个特征当中。感兴趣的问题包括: 1. Who executes the work 哪些人执行哪些activity 2. who specialize in certain work 哪些人专门执行哪些活动 3. is there a risk of brain drain 是否有人员流失风险 4. who transfer work to whom
library(edeaR)
library(bupaR)
# 查看整个event log 中,source 的分布
resource_frequency(sepsis,level = "log") %>% head()
## # A tibble: 1 × 8
## min q1 median mean q3 max st_dev iqr
## <int> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl>
## 1 1 33.5 58 585. 206. 8111 1684. 173.
# 查看不同的case中,有多少不同的resource以及其次数分布. 例如看caseid 为A 的列,这个case 出现了5个不同的source,这5个source中,有个source出现次数最多15此。其他的统计量都是根据不同的source出现次数计算出来的。验证方法很简单,查看某一个case 的resource,进行统计
resource_frequency(sepsis,level = "case")%>% head()
## # A tibble: 6 × 11
## case_id nr_of_resour…¹ min q1 mean median q3 max st_dev iqr total
## <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <int>
## 1 A 5 1 1 4.4 1 4 15 6.07 3 22
## 2 AA 3 1 2 2.67 3 3.5 4 1.53 1.5 8
## 3 AAA 6 1 1 1.83 1 2.5 4 1.33 1.5 11
## 4 AB 3 1 2 2.67 3 3.5 4 1.53 1.5 8
## 5 ABA 5 1 1 3.4 1 4 10 3.91 3 17
## 6 AC 6 1 1 2.17 1 3.25 5 1.83 2.25 13
## # … with abbreviated variable name ¹nr_of_resources
# 查看某个activity 出现了多少个不同的resource,以及其次数的分布
resource_frequency(sepsis,level = "activity")%>% head()
## # A tibble: 6 × 11
## activity nr_of…¹ min q1 mean median q3 max st_dev iqr total
## <fct> <int> <int> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <int>
## 1 Admission IC 4 1 7 29.2 31 53.2 54 28.2 46.2 117
## 2 Admission NC 20 1 17 59.1 40.5 68.2 216 62.7 51.2 1182
## 3 CRP 1 3262 3262 3262 3262 3262 3262 NA 0 3262
## 4 ER Registra… 2 65 295 525 525 755 985 651. 460 1050
## 5 ER Sepsis T… 2 65 295. 524. 524. 754. 984 650. 460. 1049
## 6 ER Triage 1 1053 1053 1053 1053 1053 1053 NA 0 1053
## # … with abbreviated variable name ¹nr_of_resources
# 统计resource的次数以及占比,占比是次数/总次数
resource_frequency(sepsis,level = "resource")%>% head()
## # A tibble: 6 × 3
## resource absolute relative
## <fct> <int> <dbl>
## 1 B 8111 0.533
## 2 A 3462 0.228
## 3 C 1053 0.0692
## 4 E 782 0.0514
## 5 ? 294 0.0193
## 6 F 216 0.0142
# 统计activity-resource对出现的次数,比例.其中relative_resource的值表示resource.A-activity.A出现的次数除以resource.A出现的次数.
resource_frequency(sepsis,level = "resource-activity")%>% head()
## # A tibble: 6 × 5
## resource activity absolute relative_resource relative_activity
## <fct> <fct> <int> <dbl> <dbl>
## 1 B Leucocytes 3383 0.417 1
## 2 B CRP 3262 0.402 1
## 3 B LacticAcid 1466 0.181 1
## 4 C ER Triage 1053 1 1
## 5 A ER Registration 985 0.285 0.938
## 6 A ER Sepsis Triage 984 0.284 0.938
通过以上的分析,可以从不同的维度了解资源的分布情况
资源参与度
# 计算每个案例中执行活动的不同资源的绝对数量和相对数量,以了解哪些案例由少量资源处理,哪些案例需要更多资源,这表明流程中的差异程度较高。
# 绝对数量就是这个case出现了多少不同的resource,相对数量是这个case出现的resour# ce种类占据resource总种类的比例. caseid 为A的case, 出现了5种不同的resource,占比5/26 .
resource_involvement(sepsis,level = "case")%>% head()
## # A tibble: 6 × 3
## case_id absolute relative
## <chr> <int> <dbl>
## 1 ES 9 0.346
## 2 GN 9 0.346
## 3 MN 9 0.346
## 4 NGA 9 0.346
## 5 PGA 9 0.346
## 6 XI 9 0.346
# 该度量提供了每种资源所涉及的案例(注意是参与的案例适量)的绝对数量和相对数量,表明流程中哪些资源比其他资源更“必要”。
# 绝对数量就是出现次数,相对数量就是出现次数除以出现次数最多的resource的次数
resource_involvement(sepsis,level = "resource")%>% head()
## # A tibble: 6 × 3
## resource absolute relative
## <fct> <int> <dbl>
## 1 C 1050 1
## 2 B 1013 0.965
## 3 A 985 0.938
## 4 E 782 0.745
## 5 ? 294 0.28
## 6 F 200 0.190
# resource-activity 的计算很有问题
分析资源是否专门用于特定活动。该度量可以概括哪些资源比其他资源执行某些活动更多
# 从整体分析resource 处理几个activity,从结果来看,最多情况下,一个resource会处理5个不同的activity
resource_specialisation(log = sepsis,level = "log")%>% head()
## # A tibble: 1 × 8
## min q1 median mean q3 max st_dev iqr
## <int> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl>
## 1 1 1 1 1.62 2 5 1.13 1
# 此度量提供了完整日志中执行此活动的不同资源的绝对和相对数量的概览
# 其中绝对频率计算有多少不同的resource 执行了该activity,这些resource数量占总resource数量的比例
resource_specialisation(log = sepsis,level = "activity")%>% head()
## # A tibble: 6 × 3
## activity absolute relative
## <fct> <int> <dbl>
## 1 Admission NC 20 0.769
## 2 Admission IC 4 0.154
## 3 ER Registration 2 0.0769
## 4 ER Sepsis Triage 2 0.0769
## 5 IV Antibiotics 2 0.0769
## 6 IV Liquid 2 0.0769
# 分析不同resource 参与了多少不同的activity
# 绝对数量就是参与的activity数量,相对比例是activity数量对于总数的占比
resource_specialisation(log = sepsis,level = "resource")%>% head()
## # A tibble: 6 × 3
## resource absolute relative
## <fct> <int> <dbl>
## 1 E 5 0.312
## 2 A 4 0.25
## 3 L 4 0.25
## 4 B 3 0.188
## 5 J 2 0.125
## 6 K 2 0.125
creating a resource map of an event log based on handover of work.
resource_map(log = sepsis)
Resource precedence matrix 资源矩阵函数与优先矩阵函数具有相同的工作原理,为工作交接提供了一种更为紧凑的表示方法。
sepsis %>%
resource_matrix() %>%
plot()
一些可视化方法: 1. process map 2. trace explorer 3. precedence matrix
# 查看event log中有start activity的数量和占比
start_activities(log = sepsis,level = "log")%>% head()
## # A tibble: 1 × 2
## absolute relative
## <int> <dbl>
## 1 6 0.375
# 获取不同的case 的开始活动
start_activities(log = sepsis,level = "case")%>% head()
## # A tibble: 6 × 2
## case_id activity
## <chr> <fct>
## 1 A ER Registration
## 2 AA ER Registration
## 3 AAA ER Registration
## 4 AB ER Registration
## 5 ABA ER Registration
## 6 AC ER Registration
# 分析所有起始活动的数量和占比(这个占比是相对于所有起始活动的)
start_activities(log = sepsis,level = "activity")%>% head()
## # A tibble: 6 × 4
## activity absolute relative cum_sum
## <fct> <int> <dbl> <dbl>
## 1 ER Registration 995 0.948 0.948
## 2 Leucocytes 18 0.0171 0.965
## 3 IV Liquid 14 0.0133 0.978
## 4 CRP 10 0.00952 0.988
## 5 ER Sepsis Triage 7 0.00667 0.994
## 6 ER Triage 6 0.00571 1
# 分析哪些resource 执行了第一个活动,其数量和占比
start_activities(log = sepsis,level = "resource")%>% head()
## # A tibble: 4 × 4
## resource absolute relative cum_sum
## <fct> <int> <dbl> <dbl>
## 1 A 954 0.909 0.909
## 2 L 62 0.0590 0.968
## 3 B 28 0.0267 0.994
## 4 C 6 0.00571 1
# 分析起始的resource activity比的数量和比例
start_activities(log = sepsis,level = "resource-activity")%>% head()
## # A tibble: 6 × 5
## resource activity absolute relative cum_sum
## <fct> <fct> <int> <dbl> <dbl>
## 1 A ER Registration 933 0.889 0.889
## 2 L ER Registration 62 0.0590 0.948
## 3 B Leucocytes 18 0.0171 0.965
## 4 A IV Liquid 14 0.0133 0.978
## 5 B CRP 10 0.00952 0.988
## 6 A ER Sepsis Triage 7 0.00667 0.994
# 结束活动的分析类似
end_activities(log = sepsis,level = "log")%>% head()
## # A tibble: 1 × 2
## absolute relative
## <int> <dbl>
## 1 14 0.875
使用跟踪频率分析日志的结构。什么叫做覆盖率,就是trace出现的次数
# coverage 是trace 的分布,
trace_coverage(log = sepsis,level = "log")%>% head()
## # A tibble: 1 × 8
## min q1 median mean q3 max st_dev iqr
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.000952 0.000952 0.000952 0.00118 0.000952 0.0333 0.00168 0
# 分析不同的trace的占比,35表示这条trace出现了35次,占比就是35/总的trace数量
trace_coverage(log = sepsis,level = "trace")%>% head()
## # A tibble: 6 × 4
## trace absol…¹ relat…² cum_sum
## <chr> <int> <dbl> <dbl>
## 1 ER Registration,ER Triage,ER Sepsis Triage 35 0.0333 0.0333
## 2 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes… 24 0.0229 0.0562
## 3 ER Registration,ER Triage,ER Sepsis Triage,CRP,Leucoc… 22 0.0210 0.0771
## 4 ER Registration,ER Triage,ER Sepsis Triage,CRP,Lactic… 13 0.0124 0.0895
## 5 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes… 11 0.0105 0.1
## 6 ER Registration,ER Triage,ER Sepsis Triage,Leucocytes… 9 0.00857 0.109
## # … with abbreviated variable names ¹absolute, ²relative
# 与上面类似,不过显示除了不同的case 对应的trace
trace_coverage(log = sepsis,level = "case")%>% head()
## # A tibble: 6 × 4
## case_id trace absolute relative
## <chr> <chr> <int> <dbl>
## 1 Q ER Registration,ER Triage,ER Sepsis Triage 35 0.0333
## 2 SKA ER Registration,ER Triage,ER Sepsis Triage 35 0.0333
## 3 HU ER Registration,ER Triage,ER Sepsis Triage 35 0.0333
## 4 YU ER Registration,ER Triage,ER Sepsis Triage 35 0.0333
## 5 CDA ER Registration,ER Triage,ER Sepsis Triage 35 0.0333
## 6 HMA ER Registration,ER Triage,ER Sepsis Triage 35 0.0333
迹线长度分析 此度量提供了每个跟踪中发生的活动数量的概述。 重要的一点是,该度量考虑了活动的每个实例,但不考虑单个生命周期事件。
# 从整个eventlog 来看
trace_length(log = sepsis,level = "log")%>% head()
## # A tibble: 1 × 8
## min q1 median mean q3 max st_dev iqr
## <int> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl>
## 1 3 9 13 14.5 16 185 11.5 7
# 显示不同trace的长度
# 第三列是出现次数的占比. 第四列不太清楚具体含义,约等于第二列的值除以第二列的和
trace_length(log = sepsis,level = "trace")%>% head()
## # A tibble: 6 × 4
## trace absol…¹ relat…² relat…³
## <chr> <int> <dbl> <dbl>
## 1 ER Registration,ER Triage,ER Sepsis Triage,IV Liquid,… 185 9.52e-4 13.2
## 2 ER Registration,ER Triage,ER Sepsis Triage,IV Liquid,… 170 9.52e-4 12.1
## 3 ER Registration,CRP,Leucocytes,ER Triage,ER Sepsis Tr… 118 9.52e-4 8.43
## 4 Leucocytes,CRP,ER Registration,ER Triage,ER Sepsis Tr… 88 9.52e-4 6.29
## 5 ER Registration,ER Triage,ER Sepsis Triage,IV Liquid,… 84 9.52e-4 6
## 6 IV Liquid,ER Registration,ER Triage,ER Sepsis Triage,… 66 9.52e-4 4.71
## # … with abbreviated variable names ¹absolute, ²relative_trace_frequency,
## # ³relative_to_median
# 查看不同case id 的trace 长度
trace_length(log = sepsis,level = "case")%>% head()
## # A tibble: 6 × 2
## case_id absolute
## <chr> <int>
## 1 NGA 185
## 2 KM 170
## 3 OD 118
## 4 GK 88
## 5 YX 84
## 6 ZMA 66
提供重复次数的信息统计 重复是指在一个案例中执行一个活动,而该活动之前已经执行过,但在其间执行了一个或多个其他活动。 repeat 和redo 的区别是?文档中有说明:
# 度量显示了一个案例中重复次数的汇总统计信息.由同一资源执行的同一活动的两个或多个事件的每一组合,不是紧接着执行的,则被视为该活动的一次重复。
# 什么叫做 repetition,2个或者多个相同的活动被同一个resource处理,但是这些活动并不是连着.
# 总的统计
number_of_repetitions(log = sepsis,level = "log")%>% head()
## # A tibble: 1 × 8
## min q1 median mean q3 max st_dev iqr
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 0 2 1.64 3 5 1.28 3
# 查看不同case中的重复情况。relative的值计算方式是重复的次数/trace长度,例如KM有5个重复,trace长度为170 = 5/170=0.02941176
number_of_repetitions(log = sepsis,level = "case")%>% head()
## case_id absolute relative
## 1 KM 5 0.02941176
## 2 MN 5 0.15151515
## 3 XI 5 0.12820513
## 4 AD 4 0.13793103
## 5 AE 4 0.22222222
## 6 BFA 4 0.19047619
# 查看不同活动的的重复情况,relative 是activity的重复次数/出现总次数.例如CRP重复次数692,一共3262,692/3262=0.2121398
number_of_repetitions(log = sepsis,level = "activity")%>% head()
## activity absolute relative
## 1 CRP 692 0.212139792
## 2 Leucocytes 674 0.199231451
## 3 Admission NC 177 0.149746193
## 4 LacticAcid 170 0.115961801
## 5 Admission IC 6 0.051282051
## 6 ER Triage 3 0.002849003
# 查看resource的重复次数以及比例
number_of_repetitions(log = sepsis,level = "resource")%>% head()
## first_resource absolute relative
## 1 B 1536 0.18937246
## 2 G 67 0.45270270
## 3 F 16 0.07407407
## 4 R 13 0.22807018
## 5 I 12 0.09523810
## 6 Q 11 0.17460317
# 查看resource-activity的重复情况
number_of_repetitions(log = sepsis,level = "resource-activity")%>% head()
## resource activity absolute relative_activity relative_resource
## 1 B CRP 692 0.212139792 0.085316237
## 2 B Leucocytes 674 0.199231451 0.083097029
## 3 B LacticAcid 170 0.115961801 0.020959191
## 4 F Admission NC 6 0.005076142 0.027777778
## 5 C ER Triage 3 0.002849003 0.002849003
## 6 T Admission NC 3 0.002538071 0.085714286
提供跟踪中自循环数的信息统计信息。 同一活动类型的活动实例由同一资源在彼此之后立即执行多次,它们处于自循环(“length-1-loop”)中。如果同一活动类型的活动实例被同一资源执行3次,则这被定义为“大小2的自循环”。
# 循环和重复的结果理解是类似的
# 查看整体的循环情况
number_of_selfloops(log = sepsis,level = "log")%>% head()
## # A tibble: 1 × 8
## min q1 median mean q3 max st_dev iqr
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 0 0 0.827 1 33 1.82 1
# 查看不同case 的循环
number_of_selfloops(log = sepsis,level = "case")%>% head()
## case_id absolute relative
## 1 NGA 33 0.1783784
## 2 KM 18 0.1058824
## 3 OD 17 0.1440678
## 4 GK 15 0.1704545
## 5 YX 11 0.1309524
## 6 JS 7 0.1794872
# 在“活动”级别上,每个活动的自循环的绝对和相对数量可以指示哪些活动在过程中造成了最大的浪费。
number_of_selfloops(log = sepsis,level = "activity")%>% head()
## activity absolute relative
## 1 Leucocytes 333 0.098433343
## 2 CRP 293 0.089822195
## 3 Admission NC 168 0.142131980
## 4 LacticAcid 73 0.049795362
## 5 Admission IC 1 0.008547009
## 6 ER Registration 0 0.000000000
# 在“资源”层面上,该指标可以深入了解在一个案例中哪些资源最需要重复他们的工作,或者他们所做的工作应该由同一案例中的另一个资源重做。
number_of_selfloops(log = sepsis,level = "resource")%>% head()
## first_resource absolute relative
## 1 B 699 0.08617926
## 2 G 67 0.45270270
## 3 N 11 0.23913043
## 4 Q 11 0.17460317
## 5 F 10 0.04629630
## 6 O 10 0.05376344
# 在“资源活动”级别上,该度量可用于了解哪些活动对哪些资源最重要。
number_of_selfloops(log = sepsis,level = "resource-activity")%>% head()
## first_resource activity absolute relative_activity relative_resource
## 1 B Leucocytes 333 0.098433343 0.041055357
## 2 B CRP 293 0.089822195 0.036123783
## 3 B LacticAcid 73 0.049795362 0.009000123
## 4 G Admission NC 67 0.056683587 0.452702703
## 5 N Admission NC 11 0.009306261 0.239130435
## 6 Q Admission NC 11 0.009306261 0.174603175
计算每种活动类型的事例百分比。
# 这个数量是指出现在多少个case 当中,以及出现次数/总case数量。这个和activity 的频率不一样的
activity_presence(log = sepsis)%>% head()
## # A tibble: 6 × 3
## activity absolute relative
## <fct> <int> <dbl>
## 1 ER Registration 1050 1
## 2 ER Triage 1050 1
## 3 ER Sepsis Triage 1049 0.999
## 4 Leucocytes 1012 0.964
## 5 CRP 1007 0.959
## 6 LacticAcid 860 0.819
# 计算activity的频率
activity_frequency(log = sepsis,level = "activity")%>% head()
## # A tibble: 6 × 3
## activity absolute relative
## <fct> <int> <dbl>
## 1 Leucocytes 3383 0.222
## 2 CRP 3262 0.214
## 3 LacticAcid 1466 0.0964
## 4 Admission NC 1182 0.0777
## 5 ER Triage 1053 0.0692
## 6 ER Registration 1050 0.0690
process map
process_map(log = sepsis)
trace explor
可视化日志中的不同活动序列。使用type参数,它可以用于探索频繁和不频繁的跟踪。覆盖率参数指定了您要探索的日志的大小。
trace_explorer(log = sepsis,coverage = 0.2)
# coverage ,trace数量,累计占比
优先级矩阵,显示活动前后顺序的。
precedence_matrix(sepsis)
## Warning: `precedence_matrix()` was deprecated in processmapR 0.4.0.
## ℹ Please use `process_matrix()` instead.
## # A tibble: 135 × 3
## antecedent consequent n
## <fct> <fct> <int>
## 1 ER Sepsis Triage Admission IC 1
## 2 Release A CRP 1
## 3 Admission NC IV Antibiotics 2
## 4 Admission NC ER Triage 1
## 5 LacticAcid Release B 4
## 6 IV Liquid Release A 2
## 7 Leucocytes Return ER 1
## 8 Release A Leucocytes 1
## 9 Admission NC Release D 1
## 10 IV Liquid Admission IC 3
## # … with 125 more rows
plot(precedence_matrix(sepsis))
提供有关案例吞吐量时间的汇总统计信息。开始到结束的时间就是吞吐时间,
# 整个eventlog 中吞吐时间的分布
throughput_time(log = sepsis,level = "log")%>% head()
## # A tibble: 1 × 8
## min q1 median mean q3 max st_dev iqr
## <drtn> <drtn> <drtn> <drtn> <drt> <drt> <dbl> <drt>
## 1 2.033333 mins 926.5875 mins 7694.475 mins 40995.85 m… 2591… 6081… 87174. 2498…
# 不同trace 吞吐时间的分布.这个相对占比应该是该trace的数量除以总数量
throughput_time(log = sepsis,level = "trace")%>% head()
## # A tibble: 6 × 11
## trace relat…¹ min q1 mean median q3 max st_dev iqr total
## <chr> <dbl> <drt> <drt> <drt> <drtn> <drt> <drt> <dbl> <dbl> <drt>
## 1 ER Registrati… 0.0333 2.… 8.… 24.… 17.3… 30.… 105.… 24.5 21.8 848…
## 2 ER Registrati… 0.0229 7.… 17.… 33.… 27.0… 47.… 76.… 21.2 29.5 808…
## 3 ER Registrati… 0.0210 6.… 15.… 27.… 21.3… 31.… 76.… 18.3 16.6 605…
## 4 ER Registrati… 0.0124 19.… 45.… 122.… 78.4… 189.… 382.… 105. 144. 1588…
## 5 ER Registrati… 0.0105 6.… 19.… 30.… 25.8… 32.… 78.… 20.2 13.2 332…
## 6 ER Registrati… 0.00857 130.… 159.… 214.… 170.7… 244.… 373.… 79.0 85.4 1931…
## # … with abbreviated variable name ¹relative_trace_frequency
# 查看不同case 的吞吐时间
throughput_time(log = sepsis,level = "case")%>% head()
## case_id throughput_time
## 1 HG 608146.5 mins
## 2 VAA 607486.8 mins
## 3 ZS 574450.1 mins
## 4 LS 571634.5 mins
## 5 UBA 557236.7 mins
## 6 HDA 525518.6 mins
提供有关进程处理时间的摘要统计信息。它只能在活动实例有可用的开始和结束时间戳时计算。
# 显示整个event log 每个case 处理时间的分布
processing_time(log = sepsis,level = "log")%>% head()
## # A tibble: 1 × 8
## min q1 median mean q3 max st_dev iqr
## <drtn> <drtn> <drtn> <drtn> <drtn> <drtn> <dbl> <drtn>
## 1 0 secs 0 secs 0 secs 0 secs 0 secs 0 secs 0 0 secs
# 显示trace
processing_time(log = sepsis,level = "trace")%>% head()
## # A tibble: 6 × 11
## trace relat…¹ min q1 mean median q3 max st_dev iqr total
## <chr> <dbl> <drt> <drt> <drt> <drtn> <drt> <drt> <dbl> <dbl> <drt>
## 1 ER Registrati… 0.0333 0 se… 0 se… 0 se… 0 secs 0 se… 0 se… 0 0 0 se…
## 2 ER Registrati… 0.0229 0 se… 0 se… 0 se… 0 secs 0 se… 0 se… 0 0 0 se…
## 3 ER Registrati… 0.0210 0 se… 0 se… 0 se… 0 secs 0 se… 0 se… 0 0 0 se…
## 4 ER Registrati… 0.0124 0 se… 0 se… 0 se… 0 secs 0 se… 0 se… 0 0 0 se…
## 5 ER Registrati… 0.0105 0 se… 0 se… 0 se… 0 secs 0 se… 0 se… 0 0 0 se…
## 6 ER Registrati… 0.00857 0 se… 0 se… 0 se… 0 secs 0 se… 0 se… 0 0 0 se…
## # … with abbreviated variable name ¹relative_frequency
# 其他可选包括activity,resource,resource-activity
它只能在活动实例有可用的开始和结束时间戳时计算. 结果类似。
dotted_chart(sepsis)
尽管bupaR具有从XES文件中读取事件日志的功能,但从业人员通常必须从原始数据开始,并确保将其正确转换为事件日志。
LogbuildR它提供了一个图形界面,引导用户通过不同的步骤来构建事件日志.
# devtools::install_github("bupaverse/logbuildR")
其中调用build_log函数会出现一个shiny程序. 目前在测试的时候,设置时间的时候总是会出错。
这个包用于检查event log 的质量。这个包的作者们提供了比较详细的文档,通过这个包即可查看。
https://cran.r-project.org/web/packages/daqapo/vignettes/Introduction-to-DaQAPO.html
processcheckR 的目标是支持基于规则的一致性检查。目前可以检查以下声明性规则:
library(bupaR)
library(processcheckR)
##
## Attaching package: 'processcheckR'
## The following object is masked from 'package:base':
##
## xor
sepsis %>%
# Check if cases starts with "ER Registration".
check_rule(starts("ER Registration"), label = "r1") %>%
# Check if activities "CRP" and "LacticAcid" occur together.
check_rule(and("CRP","LacticAcid"), label = "r2") %>%
group_by(r1, r2) %>%
n_cases()
## # A tibble: 4 × 3
## r1 r2 n_cases
## <lgl> <lgl> <int>
## 1 FALSE FALSE 10
## 2 FALSE TRUE 45
## 3 TRUE FALSE 137
## 4 TRUE TRUE 858
第一步,构建因果图
library(heuristicsmineR)
##
## Attaching package: 'heuristicsmineR'
## The following object is masked from 'package:processmapR':
##
## precedence_matrix
library(eventdataR)
data(patients)
# Dependency graph / matrix
dependency_matrix(patients)
## consequent
## antecedent Blood test Check-out Discuss Results End
## Blood test 0.0000000 0.0000000 0.0000000 0.0000000
## Check-out 0.0000000 0.0000000 0.0000000 0.9979716
## Discuss Results 0.0000000 0.9979716 0.0000000 0.0000000
## End 0.0000000 0.0000000 0.0000000 0.0000000
## MRI SCAN 0.0000000 0.0000000 0.9957806 0.0000000
## Registration 0.0000000 0.0000000 0.0000000 0.0000000
## Start 0.0000000 0.0000000 0.0000000 0.0000000
## Triage and Assessment 0.9957983 0.0000000 0.0000000 0.0000000
## X-Ray 0.0000000 0.0000000 0.9961538 0.0000000
## consequent
## antecedent MRI SCAN Registration Start Triage and Assessment
## Blood test 0.9957806 0.000000 0 0.000000
## Check-out 0.0000000 0.000000 0 0.000000
## Discuss Results 0.0000000 0.000000 0 0.000000
## End 0.0000000 0.000000 0 0.000000
## MRI SCAN 0.0000000 0.000000 0 0.000000
## Registration 0.0000000 0.000000 0 0.998004
## Start 0.0000000 0.998004 0 0.000000
## Triage and Assessment 0.0000000 0.000000 0 0.000000
## X-Ray 0.0000000 0.000000 0 0.000000
## consequent
## antecedent X-Ray
## Blood test 0.0000000
## Check-out 0.0000000
## Discuss Results 0.0000000
## End 0.0000000
## MRI SCAN 0.0000000
## Registration 0.0000000
## Start 0.0000000
## Triage and Assessment 0.9961832
## X-Ray 0.0000000
## attr(,"class")
## [1] "dependency_matrix" "matrix" "array"
# Causal graph / Heuristics net
causal_net(patients)
## Nodes
## # A tibble: 9 × 12
## act from_id n n_dis…¹ bindi…² bindi…³ label color…⁴ shape fontc…⁵ color
## <chr> <int> <dbl> <dbl> <list> <list> <chr> <dbl> <chr> <chr> <chr>
## 1 Blood… 1 237 237 <list> <list> "Blo… 237 rect… black black
## 2 Check… 2 492 492 <list> <list> "Che… 492 rect… white black
## 3 Discu… 3 495 495 <list> <list> "Dis… 495 rect… white black
## 4 End 4 500 500 <list> <list> "End" 500 circ… brown4 brow…
## 5 MRI S… 5 236 236 <list> <list> "MRI… 236 rect… black black
## 6 Regis… 6 500 500 <list> <list> "Reg… 500 rect… white black
## 7 Start 7 500 500 <list> <list> "Sta… 500 circ… chartr… char…
## 8 Triag… 8 500 500 <list> <list> "Tri… 500 rect… white black
## 9 X-Ray 9 261 261 <list> <list> "X-R… 261 rect… black black
## # … with 1 more variable: tooltip <chr>, and abbreviated variable names
## # ¹n_distinct_cases, ²bindings_input, ³bindings_output, ⁴color_level,
## # ⁵fontcolor
## Edges
## # A tibble: 9 × 8
## antecedent consequent dep from_id to_id n label penwi…¹
## <chr> <chr> <dbl> <int> <int> <dbl> <chr> <dbl>
## 1 Triage and Assessment Blood test 0.996 8 1 237 237 2.90
## 2 Discuss Results Check-out 0.998 3 2 495 495 4.96
## 3 MRI SCAN Discuss Results 0.996 5 3 236 236 2.89
## 4 X-Ray Discuss Results 0.996 9 3 261 261 3.09
## 5 Check-out End 0.998 2 4 492 492 4.94
## 6 Blood test MRI SCAN 0.996 1 5 237 237 2.90
## 7 Start Registration 0.998 7 6 500 500 5
## 8 Registration Triage and Asse… 0.998 6 8 500 500 5
## 9 Triage and Assessment X-Ray 0.996 8 9 261 261 3.09
## # … with abbreviated variable name ¹penwidth
使用L_heur_1构建因果图
# Efficient precedence matrix
m <- precedence_matrix_absolute(L_heur_1)
as.matrix(m)
## consequent
## antecedent a b c d e End Start
## a 0 11 11 13 5 0 0
## b 0 0 10 0 11 0 0
## c 0 10 0 0 11 0 0
## d 0 0 0 4 13 0 0
## e 0 0 0 0 0 40 0
## End 0 0 0 0 0 0 0
## Start 40 0 0 0 0 0 0
# Example from Process mining book
dependency_matrix(L_heur_1, threshold = .7)
## consequent
## antecedent a b c d e End Start
## a 0.0000000 0.9166667 0.9166667 0.9285714 0.8333333 0.0000000 0
## b 0.0000000 0.0000000 0.0000000 0.0000000 0.9166667 0.0000000 0
## c 0.0000000 0.0000000 0.0000000 0.0000000 0.9166667 0.0000000 0
## d 0.0000000 0.0000000 0.0000000 0.8000000 0.9285714 0.0000000 0
## e 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.9756098 0
## End 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0
## Start 0.9756098 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0
## attr(,"class")
## [1] "dependency_matrix" "matrix" "array"
causal_net(L_heur_1, threshold = .7)
## Nodes
## # A tibble: 7 × 12
## act from_id n n_dist…¹ bindi…² bindi…³ label color…⁴ shape fontc…⁵ color
## <chr> <int> <dbl> <dbl> <list> <list> <chr> <dbl> <chr> <chr> <chr>
## 1 a 1 40 40 <list> <list> "a\n… 40 rect… white black
## 2 b 2 21 21 <list> <list> "b\n… 21 rect… black black
## 3 c 3 21 21 <list> <list> "c\n… 21 rect… black black
## 4 d 4 17 13 <list> <list> "d\n… 17 rect… black black
## 5 e 5 40 40 <list> <list> "e\n… 40 rect… white black
## 6 End 6 40 40 <list> <list> "End" 40 circ… brown4 brow…
## 7 Start 7 40 40 <list> <list> "Sta… 40 circ… chartr… char…
## # … with 1 more variable: tooltip <chr>, and abbreviated variable names
## # ¹n_distinct_cases, ²bindings_input, ³bindings_output, ⁴color_level,
## # ⁵fontcolor
## Edges
## # A tibble: 10 × 8
## antecedent consequent dep from_id to_id n label penwidth
## <chr> <chr> <dbl> <int> <int> <dbl> <chr> <dbl>
## 1 Start a 0.976 7 1 40 40 5
## 2 a b 0.917 1 2 21 21 3.1
## 3 a c 0.917 1 3 21 21 3.1
## 4 a d 0.929 1 4 13 13 2.3
## 5 d d 0.8 4 4 4 4 1.4
## 6 a e 0.833 1 5 5 5 1.5
## 7 b e 0.917 2 5 21 21 3.1
## 8 c e 0.917 3 5 21 21 3.1
## 9 d e 0.929 4 5 13 13 2.3
## 10 e End 0.976 5 6 40 40 5
第二步,将因果图转变成为PetriNet
# Convert to Petri net
library(petrinetR)
cn <- causal_net(L_heur_1, threshold = .7)
#cn <- causal_net(patients)
pn <- as.petrinet(cn)
render_PN(pn)
使用移动标记的动画可以是一个强大的可视化工具,帮助理解一般的过程行为。包procesanimateR为bupaR实现了一个动画库,该库使用web标准SVG呈现交互式过程动画.
# install.packages("processanimateR")
library(processanimateR)
library(eventdataR)
基础的用法就是直接使用animate_process函数
animate_process(patients)
另外,可以修改token的美学,例如大小,形状,颜色。
animate_process(example_log, mapping = token_aes(size = token_scale(12), shape = "rect"))
animate_process(example_log, mapping = token_aes(color = token_scale("red")))
animate_process(patients, mode = "relative", jitter = 10, legend = "color",
mapping = token_aes(color = token_scale("employee",
scale = "ordinal",
range = RColorBrewer::brewer.pal(7, "Paired"))))
更多的高级功能可以参考: https://bupaverse.github.io/processanimateR/articles/