The metrics for exploring and describing event data

  1. Time perspective 2.Organizational perspective 3.Structuredness perspective:Variance,Rework

data

patients data set ,this Eventlog containing 500 patient cases

library(bupaR)
## 
## Attaching package: 'bupaR'
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:utils':
## 
##     timestamp
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.3     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   2.1.2     ✓ forcats 0.5.1
## Warning: package 'readr' was built under R version 4.1.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks bupaR::filter(), stats::filter()
## x dplyr::lag()    masks stats::lag()
patients %>% head()
## Log of 6 events consisting of:
## 1 trace 
## 6 cases 
## 6 instances of 1 activity 
## 1 resource 
## Events occurred from 2017-01-02 11:41:53 until 2017-01-04 16:07:47 
##  
## Variables were mapped as follows:
## Case identifier:     patient 
## Activity identifier:     handling 
## Resource identifier:     employee 
## Activity instance identifier:    handling_id 
## Timestamp:           time 
## Lifecycle transition:        registration_type 
## 
## # A tibble: 6 × 7
##   handling     patient employee handling_id registration_type time               
##   <fct>        <chr>   <fct>    <chr>       <fct>             <dttm>             
## 1 Registration 1       r1       1           start             2017-01-02 11:41:53
## 2 Registration 2       r1       2           start             2017-01-02 11:41:53
## 3 Registration 3       r1       3           start             2017-01-04 01:34:05
## 4 Registration 4       r1       4           start             2017-01-04 01:34:04
## 5 Registration 5       r1       5           start             2017-01-04 16:07:47
## 6 Registration 6       r1       6           start             2017-01-04 16:07:47
## # … with 1 more variable: .order <int>
str(patients)
## eventlog [5,442 × 7] (S3: eventlog/tbl_df/tbl/data.frame)
##  $ handling         : Factor w/ 7 levels "Blood test","Check-out",..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ patient          : chr [1:5442] "1" "2" "3" "4" ...
##  $ employee         : Factor w/ 7 levels "r1","r2","r3",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ handling_id      : chr [1:5442] "1" "2" "3" "4" ...
##  $ registration_type: Factor w/ 2 levels "complete","start": 2 2 2 2 2 2 2 2 2 2 ...
##  $ time             : POSIXct[1:5442], format: "2017-01-02 11:41:53" "2017-01-02 11:41:53" ...
##  $ .order           : int [1:5442] 1 2 3 4 5 6 7 8 9 10 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   handling = col_character(),
##   ..   patient = col_integer(),
##   ..   employee = col_character(),
##   ..   handling_id = col_integer(),
##   ..   registration_type = col_character(),
##   ..   time = col_datetime(format = "")
##   .. )
##  - attr(*, "case_id")= chr "patient"
##  - attr(*, "activity_id")= chr "handling"
##  - attr(*, "activity_instance_id")= chr "handling_id"
##  - attr(*, "lifecycle_id")= chr "registration_type"
##  - attr(*, "resource_id")= chr "employee"
##  - attr(*, "timestamp")= chr "time"

1.Time perspective

Three different time metrics can be computed:

1.throughput time: the time between the very first event of the case and the very last 2. processing time: the sum of the duration of all activity instances 3. idle time: the time when no activity instance is active

Idle Time

The idle time is the time that there is no activity in a case or for a resource.

patients %>%
    idle_time("resource", units = "days")
## # A tibble: 7 × 2
##   employee idle_time
##   <fct>        <dbl>
## 1 r7            464.
## 2 r1            450.
## 3 r4            443.
## 4 r5            430.
## 5 r3            429.
## 6 r6            426.
## 7 r2            215.
patients %>%
    idle_time("resource", units = "days") %>% plot()

Processing Time

The processing time can be computed at the levels log, trace, case, activity and resource-activity.

patients %>% 
    processing_time("case")
## # A tibble: 500 × 2
##    patient processing_time
##    <chr>             <dbl>
##  1 452                1.59
##  2 63                 1.59
##  3 437                1.54
##  4 70                 1.53
##  5 386                1.52
##  6 291                1.51
##  7 57                 1.50
##  8 307                1.49
##  9 93                 1.49
## 10 476                1.48
## # … with 490 more rows
patients %>% 
    processing_time("case") %>% plot()

Throughput Time

The throughput time is the time form the very first event to the last event of a case.

patients %>%
    throughput_time("log")
##       min        q1    median      mean        q3       max    st_dev       iqr 
##  1.496088  4.313924  6.085509  6.676308  8.586693 23.106759  3.224242  4.272769 
## attr(,"units")
## [1] "days"
patients %>%
    throughput_time("log") %>%
    plot()

2.Organizational Perspective

Resource Frequency

The resource frequency metric allows the computation of the number/frequency of resources at the levels of log, case, activity, resource, and resource-activity.

patients %>%
    resource_frequency("case") %>% head()
## # A tibble: 6 × 11
##   patient nr_of_resources   min    q1  mean median    q3   max st_dev   iqr
##   <chr>             <int> <int> <dbl> <dbl>  <dbl> <dbl> <int>  <dbl> <dbl>
## 1 1                     6     1     1     1      1     1     1      0     0
## 2 10                    5     1     1     1      1     1     1      0     0
## 3 100                   5     1     1     1      1     1     1      0     0
## 4 101                   5     1     1     1      1     1     1      0     0
## 5 102                   5     1     1     1      1     1     1      0     0
## 6 103                   6     1     1     1      1     1     1      0     0
## # … with 1 more variable: total <int>

Resource Involvement

Resource involvement refers to the notion of the number of cases in which a resource is involved.

patients %>%
    resource_involvement(level = "case") %>% head()
## # A tibble: 6 × 3
##   patient absolute relative
##   <chr>      <int>    <dbl>
## 1 1              6    0.857
## 2 103            6    0.857
## 3 104            6    0.857
## 4 105            6    0.857
## 5 106            6    0.857
## 6 110            6    0.857
patients %>%
    resource_involvement(level = "case") %>% plot

Resource Specialization

The resource specalization metric shows whether resources are specialized in certain activities or not.

patients %>%
    resource_specialisation("case")
## # A tibble: 500 × 11
## # Groups:   patient [500]
##    patient nr_of_activity_typ…   min    q1  mean median    q3   max st_dev   iqr
##    <chr>                 <int> <int> <dbl> <dbl>  <dbl> <dbl> <int>  <dbl> <dbl>
##  1 1                         6     1     1     1      1     1     1      0     0
##  2 10                        5     1     1     1      1     1     1      0     0
##  3 100                       5     1     1     1      1     1     1      0     0
##  4 101                       5     1     1     1      1     1     1      0     0
##  5 102                       5     1     1     1      1     1     1      0     0
##  6 103                       6     1     1     1      1     1     1      0     0
##  7 104                       6     1     1     1      1     1     1      0     0
##  8 105                       6     1     1     1      1     1     1      0     0
##  9 106                       6     1     1     1      1     1     1      0     0
## 10 107                       5     1     1     1      1     1     1      0     0
## # … with 490 more rows, and 1 more variable: total <int>

3. Structuredness

Activity Presence

Activity presence shows in what percentage of cases an activity is present.

patients %>% activity_presence() %>%
    plot

Activity Frequency

The frequency of activities can be calculated using the activity_frequency function, at the levels log, trace and activity.

patients %>%
    activity_frequency("case")
## # A tibble: 500 × 3
##    patient absolute relative
##    <chr>      <int>    <dbl>
##  1 1              6        1
##  2 103            6        1
##  3 104            6        1
##  4 105            6        1
##  5 106            6        1
##  6 110            6        1
##  7 111            6        1
##  8 113            6        1
##  9 114            6        1
## 10 116            6        1
## # … with 490 more rows

Start Activities

The start of cases can be described using the start_activities function.

patients %>%
    start_activities("case")
## # A tibble: 500 × 2
##    patient handling    
##    <chr>   <fct>       
##  1 1       Registration
##  2 10      Registration
##  3 100     Registration
##  4 101     Registration
##  5 102     Registration
##  6 103     Registration
##  7 104     Registration
##  8 105     Registration
##  9 106     Registration
## 10 107     Registration
## # … with 490 more rows

End Activities

the end_activities functions describes the end of cases, using the same levels: log, case, activity, resource and resource-activity.

patients %>%
    end_activities("resource-activity")
## # A tibble: 5 × 5
##   employee handling              absolute relative cum_sum
##   <fct>    <fct>                    <int>    <dbl>   <dbl>
## 1 r7       Check-out                  492    0.984   0.984
## 2 r6       Discuss Results              3    0.006   0.99 
## 3 r2       Triage and Assessment        2    0.004   0.994
## 4 r5       X-Ray                        2    0.004   0.998
## 5 r3       Blood test                   1    0.002   1

Trace Coverage

The trace coverage metric shows the relationship between the number of different activity sequences (i.e. traces) and the number of cases they cover.

patients %>%
    trace_coverage("trace") %>%
    plot()

Trace Length

The trace length metric describes the length of traces, i.e. the number of activity instances for each case.

patients %>%
    trace_length("log") %>%
    plot