patients data set ,this Eventlog containing 500 patient cases
library(bupaR)
##
## Attaching package: 'bupaR'
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:utils':
##
## timestamp
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.3 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 2.1.2 ✓ forcats 0.5.1
## Warning: package 'readr' was built under R version 4.1.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks bupaR::filter(), stats::filter()
## x dplyr::lag() masks stats::lag()
patients %>% head()
## Log of 6 events consisting of:
## 1 trace
## 6 cases
## 6 instances of 1 activity
## 1 resource
## Events occurred from 2017-01-02 11:41:53 until 2017-01-04 16:07:47
##
## Variables were mapped as follows:
## Case identifier: patient
## Activity identifier: handling
## Resource identifier: employee
## Activity instance identifier: handling_id
## Timestamp: time
## Lifecycle transition: registration_type
##
## # A tibble: 6 × 7
## handling patient employee handling_id registration_type time
## <fct> <chr> <fct> <chr> <fct> <dttm>
## 1 Registration 1 r1 1 start 2017-01-02 11:41:53
## 2 Registration 2 r1 2 start 2017-01-02 11:41:53
## 3 Registration 3 r1 3 start 2017-01-04 01:34:05
## 4 Registration 4 r1 4 start 2017-01-04 01:34:04
## 5 Registration 5 r1 5 start 2017-01-04 16:07:47
## 6 Registration 6 r1 6 start 2017-01-04 16:07:47
## # … with 1 more variable: .order <int>
str(patients)
## eventlog [5,442 × 7] (S3: eventlog/tbl_df/tbl/data.frame)
## $ handling : Factor w/ 7 levels "Blood test","Check-out",..: 5 5 5 5 5 5 5 5 5 5 ...
## $ patient : chr [1:5442] "1" "2" "3" "4" ...
## $ employee : Factor w/ 7 levels "r1","r2","r3",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ handling_id : chr [1:5442] "1" "2" "3" "4" ...
## $ registration_type: Factor w/ 2 levels "complete","start": 2 2 2 2 2 2 2 2 2 2 ...
## $ time : POSIXct[1:5442], format: "2017-01-02 11:41:53" "2017-01-02 11:41:53" ...
## $ .order : int [1:5442] 1 2 3 4 5 6 7 8 9 10 ...
## - attr(*, "spec")=
## .. cols(
## .. handling = col_character(),
## .. patient = col_integer(),
## .. employee = col_character(),
## .. handling_id = col_integer(),
## .. registration_type = col_character(),
## .. time = col_datetime(format = "")
## .. )
## - attr(*, "case_id")= chr "patient"
## - attr(*, "activity_id")= chr "handling"
## - attr(*, "activity_instance_id")= chr "handling_id"
## - attr(*, "lifecycle_id")= chr "registration_type"
## - attr(*, "resource_id")= chr "employee"
## - attr(*, "timestamp")= chr "time"
Three different time metrics can be computed:
1.throughput time: the time between the very first event of the case and the very last 2. processing time: the sum of the duration of all activity instances 3. idle time: the time when no activity instance is active
The idle time is the time that there is no activity in a case or for a resource.
patients %>%
idle_time("resource", units = "days")
## # A tibble: 7 × 2
## employee idle_time
## <fct> <dbl>
## 1 r7 464.
## 2 r1 450.
## 3 r4 443.
## 4 r5 430.
## 5 r3 429.
## 6 r6 426.
## 7 r2 215.
patients %>%
idle_time("resource", units = "days") %>% plot()
The processing time can be computed at the levels log, trace, case, activity and resource-activity.
patients %>%
processing_time("case")
## # A tibble: 500 × 2
## patient processing_time
## <chr> <dbl>
## 1 452 1.59
## 2 63 1.59
## 3 437 1.54
## 4 70 1.53
## 5 386 1.52
## 6 291 1.51
## 7 57 1.50
## 8 307 1.49
## 9 93 1.49
## 10 476 1.48
## # … with 490 more rows
patients %>%
processing_time("case") %>% plot()
The throughput time is the time form the very first event to the last event of a case.
patients %>%
throughput_time("log")
## min q1 median mean q3 max st_dev iqr
## 1.496088 4.313924 6.085509 6.676308 8.586693 23.106759 3.224242 4.272769
## attr(,"units")
## [1] "days"
patients %>%
throughput_time("log") %>%
plot()
The resource frequency metric allows the computation of the number/frequency of resources at the levels of log, case, activity, resource, and resource-activity.
patients %>%
resource_frequency("case") %>% head()
## # A tibble: 6 × 11
## patient nr_of_resources min q1 mean median q3 max st_dev iqr
## <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl>
## 1 1 6 1 1 1 1 1 1 0 0
## 2 10 5 1 1 1 1 1 1 0 0
## 3 100 5 1 1 1 1 1 1 0 0
## 4 101 5 1 1 1 1 1 1 0 0
## 5 102 5 1 1 1 1 1 1 0 0
## 6 103 6 1 1 1 1 1 1 0 0
## # … with 1 more variable: total <int>
Resource involvement refers to the notion of the number of cases in which a resource is involved.
patients %>%
resource_involvement(level = "case") %>% head()
## # A tibble: 6 × 3
## patient absolute relative
## <chr> <int> <dbl>
## 1 1 6 0.857
## 2 103 6 0.857
## 3 104 6 0.857
## 4 105 6 0.857
## 5 106 6 0.857
## 6 110 6 0.857
patients %>%
resource_involvement(level = "case") %>% plot
The resource specalization metric shows whether resources are specialized in certain activities or not.
patients %>%
resource_specialisation("case")
## # A tibble: 500 × 11
## # Groups: patient [500]
## patient nr_of_activity_typ… min q1 mean median q3 max st_dev iqr
## <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl>
## 1 1 6 1 1 1 1 1 1 0 0
## 2 10 5 1 1 1 1 1 1 0 0
## 3 100 5 1 1 1 1 1 1 0 0
## 4 101 5 1 1 1 1 1 1 0 0
## 5 102 5 1 1 1 1 1 1 0 0
## 6 103 6 1 1 1 1 1 1 0 0
## 7 104 6 1 1 1 1 1 1 0 0
## 8 105 6 1 1 1 1 1 1 0 0
## 9 106 6 1 1 1 1 1 1 0 0
## 10 107 5 1 1 1 1 1 1 0 0
## # … with 490 more rows, and 1 more variable: total <int>
Activity presence shows in what percentage of cases an activity is present.
patients %>% activity_presence() %>%
plot
The frequency of activities can be calculated using the activity_frequency function, at the levels log, trace and activity.
patients %>%
activity_frequency("case")
## # A tibble: 500 × 3
## patient absolute relative
## <chr> <int> <dbl>
## 1 1 6 1
## 2 103 6 1
## 3 104 6 1
## 4 105 6 1
## 5 106 6 1
## 6 110 6 1
## 7 111 6 1
## 8 113 6 1
## 9 114 6 1
## 10 116 6 1
## # … with 490 more rows
The start of cases can be described using the start_activities function.
patients %>%
start_activities("case")
## # A tibble: 500 × 2
## patient handling
## <chr> <fct>
## 1 1 Registration
## 2 10 Registration
## 3 100 Registration
## 4 101 Registration
## 5 102 Registration
## 6 103 Registration
## 7 104 Registration
## 8 105 Registration
## 9 106 Registration
## 10 107 Registration
## # … with 490 more rows
the end_activities functions describes the end of cases, using the same levels: log, case, activity, resource and resource-activity.
patients %>%
end_activities("resource-activity")
## # A tibble: 5 × 5
## employee handling absolute relative cum_sum
## <fct> <fct> <int> <dbl> <dbl>
## 1 r7 Check-out 492 0.984 0.984
## 2 r6 Discuss Results 3 0.006 0.99
## 3 r2 Triage and Assessment 2 0.004 0.994
## 4 r5 X-Ray 2 0.004 0.998
## 5 r3 Blood test 1 0.002 1
The trace coverage metric shows the relationship between the number of different activity sequences (i.e. traces) and the number of cases they cover.
patients %>%
trace_coverage("trace") %>%
plot()
The trace length metric describes the length of traces, i.e. the number of activity instances for each case.
patients %>%
trace_length("log") %>%
plot