Resources

Pre-requirements

The easiest way to install the core bupaR packages is by installing the bupaverse-package.

install.packages("bupaverse")
install.packages("processanimateR")
install.packages("psmineR")

You can then load the packages using library().

library(bupaverse)
print(R.version.string)

In case R needs to be updated to version 4.4.2, update the packages after installation.

update.packages("bupaR")
update.packages("eventdataR")
update.packages("xesreadR")
update.packages("edeaR")
update.packages("processmapR")
update.packages("processmonitR")
library(bupaverse)
library(dplyr)
library(processanimateR)
library(bpmnR)
library(processcheckR)

Loading an event log

log_xes <- read_xes("C:/Users/Admin/Downloads/Sepsis Cases - Event Log.xes")
log_xes
# Log of 15214 events consisting of:
846 traces 
1050 cases 
15214 instances of 16 activities 
1 resource 
Events occurred from 2013-11-07 07:18:29 until 2015-06-05 10:25:11 
 
# Variables were mapped as follows:
Case identifier:        CASE_concept_name 
Activity identifier:        activity_id 
Resource identifier:        resource_id 
Activity instance identifier:   activity_instance_id 
Timestamp:          timestamp 
Lifecycle transition:       lifecycle_id 

The mapping function can be used to retrieve all the meta data from a log object, i.e. the relation between log identifiers and the corresponding data fields.

log_xes %>% activities()
#log_xes %>% cases()
#log_xes %>% resources()
#log_xes %>% traces()

Discovery

Log statistics

An analysis of the control flow can be done by using metrics on activities and traces.

activity_presence <- log_xes %>% activity_presence()
activity_presence %>% plot()

The trace coverage metric shows the relationship between the number of different activity sequences (i.e. traces) and the number of cases they cover.

trace_coverage <- log_xes %>% trace_coverage("trace")
trace_coverage %>% plot()

The trace length metric describes the length of traces, i.e. the number of activity instances for each case. It can be computed at the levels case, trace and log.

trace_length <- log_xes %>% trace_length("log")
trace_length %>% plot()

Process models

Frequency maps

A process map of a log can be created using process_map(). A process map is a directly-follows graph, where each distinct activity is represented by a node, and each directly-follows relationship between activities is shown by directed edges, i.e. arrows between the nodes.

frequency <- log_xes %>% filter_trace_frequency(percentage = 0.8)
frequency %>% process_map(frequency("absolute"))

Performance maps

Instead of a frequencies, process maps can also be used to visualize performance of the process, by using performance() to configure the map, instead of frequency().

performance <- log_xes %>% process_map(performance())
performance

Information about frequencies and performance, or any other value, can be combined in the same graph.

frequency_performance <- log_xes %>% process_map(type_nodes = frequency("relative_case"), type_edges = performance(mean))
frequency_performance

Animation

It is possible to determine the aesthetics of tokens regardless of the timestamps at which activities occurred. This could be useful if some measurements were taken throughout a process, but the measurement event itself should not be included in the process map.

For example, the lacticacid measurements could be used in that way:

# Extract only the lacticacid measurements
lactic <- log_xes %>%
    mutate(LacticAcid = as.numeric(LacticAcid)) %>%
    filter_activity(c("LacticAcid")) %>%
    as.data.frame() %>%
    select("case" = CASE_concept_name, 
            "time" =  timestamp, 
            value = LacticAcid) # format needs to be 'case,time,value'

# Remove the measurement events from the sepsis log
sepsisBase <- log_xes %>%
    filter_activity(c("LacticAcid", "CRP", "Leucocytes", "Return ER",
                      "IV Liquid", "IV Antibiotics"), reverse = T) %>%
    filter_trace_frequency(percentage = 0.95)

# Animate with the secondary data frame `lactic`
animate_process(sepsisBase, 
                mode = "relative", 
                duration = 300,
                legend = "color", 
                mapping = token_aes(color = token_scale(lactic, 
                                                        scale = "linear", 
                                                        range = c("#fff5eb","#7f2704")))) 

Process visualizations

Process matrix

A process matrix is a two-dimensional matrix showing the flows between activities. Its configuration is exactly the same as that used by process_map().

matrix_frequency <- log_xes %>% process_matrix(frequency("absolute")) 
matrix_frequency %>% plot()

Dotted chart

Dotted charts can be made with dotted_chart(). A dotted chart is a graph in which each activity instance is displayed by a dot. The x-axis refers to the time aspect, while the y-axis refers to cases.

dotted_chart <- log_xes %>% dotted_chart(x = "absolute")
dotted_chart

Trace explorer

Different activity sequences in the log can be visualized with trace_explorer(). It can be used to explore frequent as well as infrequent traces.

trace_explorer <- log_xes %>% trace_explorer()
Warning: No `coverage` or `n_traces` set.
! Defaulting to `coverage` = 0.2 for `type` = "frequent" traces.
trace_explorer

Performance spectrum

Both detailed and aggregated performance spectrum can be created using ps_detailed() and ps_aggregated(), respectively.

library(psmineR)

spectrum_detailed <- log_xes %>% ps_detailed()
spectrum_detailed

Filtering

Infrequent flows

Filtering infrequent flows allows us to select a set of cases in which every directly-follows flow has a minimum frequency. For example, consider the process map below.

log_xes %>% process_map()

In this map, we can observe several unique directly follows relations, as well as flows occurring less than 30 times. Using the filter, we can remove the cases that lead to these flows as follows:

log_xes <- log_xes %>%
  mutate(activity_instance_id = as.character(activity_instance_id))

infrequent_flows <- log_xes %>% filter_infrequent_flows(min_n = 30) %>% process_map()
infrequent_flows

Time period

Filtering cases by time period can be done using the filter_time_period(). There are four different filter_method’s that act as case filters:

  • “start”: all cases started in an interval.
  • “complete”: all cases completed in an interval.
  • “contained”: all cases contained in an interval.
  • “intersecting”: all cases with some activity in an interval.

Using the interval January 2015, you can compare the results of different filtering methods below using dotted charts.

time_period <- log_xes %>% filter_time_period(interval = ymd(c(20150101, 20150131)), filter_method = "start") %>% dotted_chart() 
time_period

Case condition

filter_case_condition() can be used to select cases for which a condition holds. This condition can be related to any of the variables in the log.

For example, select all cases where age higher than 85 is involved.

age_85 <- log_xes %>% filter(!is.na(Age)) %>% filter_case_condition(Age >= 85)
age_85
# Log of 304 events consisting of:
6 traces 
304 cases 
304 instances of 6 activities 
1 resource 
Events occurred from 2013-11-07 07:18:29 until 2015-02-19 17:15:45 
 
# Variables were mapped as follows:
Case identifier:        CASE_concept_name 
Activity identifier:        activity_id 
Resource identifier:        resource_id 
Activity instance identifier:   activity_instance_id 
Timestamp:          timestamp 
Lifecycle transition:       lifecycle_id 
process_map(age_85)

Precedence

The filter_precedence() allows us to filter cases based on flows between activities.

If there is more than one antecedent or consequent activity, the filter will test all possible pairs. The filter_method will tell the filter whether all of the rules should hold, at least one, or none are allowed.

The following filter takes only cases where Triage and Assessment is directly followed by Blood test.

precedence <- log_xes %>%
    filter_precedence(antecedents = "ER Triage",
                      consequents = "Leucocytes",
                      precedence_type = "directly_follows") %>%
    traces()
head(precedence)

Conformance checking

Rule-based conformance

Using the packages processcheckr prodecural rules can be checked in an event log. Checking rules will add a boolean case attribute, which can be used for filtering or in analysis.

Rules can be checked using the check_rule function (see example below). It will create a new logical variable to indicate for which cases the rule holds. The name of the variable can be configured using the label argument in check_rule.

In the following example, the first rule checks the starting activity, while the second rule checks whether CRP and LacticAcid occur together.

log_xes %>%
  # check if cases starts with "ER Registration"
  check_rule(starts("ER Registration"), label = "r1") %>%
  # check if activities "CRP" and "LacticAcid" occur together
  check_rule(and("CRP","LacticAcid"), label = "r2") %>%
  group_by(r1, r2) %>%
  n_cases()

Alignments

Alignments is under development and can be used with the bupaRminer library. More information can be found in the GitHub: https://github.com/bupaverse/bupaRminer.

---
title: "BupaR tutorial"
output: html_notebook
---

## Resources

* Documentation: https://bupar.net/
* Cheet sheet:https://www.bupar.net/materials/20170904%20poster%20bupaR.pdf
* GitHub: https://github.com/bupaverse/

## Pre-requirements

The easiest way to install the core bupaR packages is by installing the bupaverse-package.

```{r}
install.packages("bupaverse")
```

```{r}
install.packages("processanimateR")
install.packages("psmineR")
```

You can then load the packages using library().

```{r}
library(bupaverse)
print(R.version.string)
```

In case R needs to be updated to version 4.4.2, update the packages after installation.

```{r}
update.packages("bupaR")
update.packages("eventdataR")
update.packages("xesreadR")
update.packages("edeaR")
update.packages("processmapR")
update.packages("processmonitR")
```

```{r}
library(bupaverse)
library(dplyr)
library(processanimateR)
library(bpmnR)
library(processcheckR)
```

## Loading an event log

```{r}
log_xes <- read_xes("C:/Users/Admin/Downloads/Sepsis Cases - Event Log.xes")
```

```{r}
log_xes
```
The mapping function can be used to retrieve all the meta data from a log object, i.e. the relation between log identifiers and the corresponding data fields.

```{r}
log_xes %>% activities()
#log_xes %>% cases()
#log_xes %>% resources()
#log_xes %>% traces()
```

## Discovery

### Log statistics

An analysis of the control flow can be done by using metrics on activities and traces.

```{r}
activity_presence <- log_xes %>% activity_presence()
activity_presence %>% plot()
```
The trace coverage metric shows the relationship between the number of different activity sequences (i.e. traces) and the number of cases they cover.

```{r}
trace_coverage <- log_xes %>% trace_coverage("trace")
trace_coverage %>% plot()
```
The trace length metric describes the length of traces, i.e. the number of activity instances for each case. It can be computed at the levels case, trace and log.

```{r}
trace_length <- log_xes %>% trace_length("log")
trace_length %>% plot()
```

### Process models

#### Frequency maps

A process map of a log can be created using process_map(). A process map is a directly-follows graph, where each distinct activity is represented by a node, and each directly-follows relationship between activities is shown by directed edges, i.e. arrows between the nodes.

```{r}
frequency <- log_xes %>% filter_trace_frequency(percentage = 0.8)
```

```{r}
frequency %>% process_map(frequency("absolute"))
```

#### Performance maps

Instead of a frequencies, process maps can also be used to visualize performance of the process, by using performance() to configure the map, instead of frequency().

```{r}
performance <- log_xes %>% process_map(performance())
performance
```

Information about frequencies and performance, or any other value, can be combined in the same graph.

```{r}
frequency_performance <- log_xes %>% process_map(type_nodes = frequency("relative_case"), type_edges = performance(mean))
frequency_performance
```

#### Animation

It is possible to determine the aesthetics of tokens regardless of the timestamps at which activities occurred. This could be useful if some measurements were taken throughout a process, but the measurement event itself should not be included in the process map.

For example, the lacticacid measurements could be used in that way:

```{r}
# Extract only the lacticacid measurements
lactic <- log_xes %>%
    mutate(LacticAcid = as.numeric(LacticAcid)) %>%
    filter_activity(c("LacticAcid")) %>%
    as.data.frame() %>%
    select("case" = CASE_concept_name, 
            "time" =  timestamp, 
            value = LacticAcid) # format needs to be 'case,time,value'

# Remove the measurement events from the sepsis log
sepsisBase <- log_xes %>%
    filter_activity(c("LacticAcid", "CRP", "Leucocytes", "Return ER",
                      "IV Liquid", "IV Antibiotics"), reverse = T) %>%
    filter_trace_frequency(percentage = 0.95)

# Animate with the secondary data frame `lactic`
animate_process(sepsisBase, 
                mode = "relative", 
                duration = 300,
                legend = "color", 
                mapping = token_aes(color = token_scale(lactic, 
                                                        scale = "linear", 
                                                        range = c("#fff5eb","#7f2704")))) 
```

### Process visualizations

#### Process matrix

A process matrix is a two-dimensional matrix showing the flows between activities. Its configuration is exactly the same as that used by process_map().

```{r}
matrix_frequency <- log_xes %>% process_matrix(frequency("absolute")) 
matrix_frequency %>% plot()
```

#### Dotted chart

Dotted charts can be made with dotted_chart(). A dotted chart is a graph in which each activity instance is displayed by a dot. The x-axis refers to the time aspect, while the y-axis refers to cases.

```{r}
dotted_chart <- log_xes %>% dotted_chart(x = "absolute")
dotted_chart
```

#### Trace explorer

Different activity sequences in the log can be visualized with trace_explorer(). It can be used to explore frequent as well as infrequent traces.

```{r}
trace_explorer <- log_xes %>% trace_explorer()
trace_explorer
```

#### Performance spectrum

Both detailed and aggregated performance spectrum can be created using ps_detailed() and ps_aggregated(), respectively.

```{r}
library(psmineR)

spectrum_detailed <- log_xes %>% ps_detailed()
spectrum_detailed
```

## Filtering

### Infrequent flows

Filtering infrequent flows allows us to select a set of cases in which every directly-follows flow has a minimum frequency. For example, consider the process map below.

```{r}
log_xes %>% process_map()
```

In this map, we can observe several unique directly follows relations, as well as flows occurring less than 30 times. Using the filter, we can remove the cases that lead to these flows as follows:

```{r}
log_xes <- log_xes %>%
  mutate(activity_instance_id = as.character(activity_instance_id))

infrequent_flows <- log_xes %>% filter_infrequent_flows(min_n = 30) %>% process_map()
infrequent_flows
```

### Time period

Filtering cases by time period can be done using the filter_time_period(). There are four different filter_method’s that act as case filters:

* “start”: all cases started in an interval.
* “complete”: all cases completed in an interval.
* “contained”: all cases contained in an interval.
* “intersecting”: all cases with some activity in an interval.

Using the interval January 2015, you can compare the results of different filtering methods below using dotted charts.

```{r}
time_period <- log_xes %>% filter_time_period(interval = ymd(c(20150101, 20150131)), filter_method = "start") %>% dotted_chart() 
time_period
```

### Case condition

filter_case_condition() can be used to select cases for which a condition holds. This condition can be related to any of the variables in the log.

For example, select all cases where age higher than 85 is involved.

```{r}
age_85 <- log_xes %>% filter(!is.na(Age)) %>% filter_case_condition(Age >= 85)
age_85
process_map(age_85)
```

### Precedence

The filter_precedence() allows us to filter cases based on flows between activities.

If there is more than one antecedent or consequent activity, the filter will test all possible pairs. The filter_method will tell the filter whether all of the rules should hold, at least one, or none are allowed.

The following filter takes only cases where Triage and Assessment is directly followed by Blood test.

```{r}
precedence <- log_xes %>%
    filter_precedence(antecedents = "ER Triage",
                      consequents = "Leucocytes",
                      precedence_type = "directly_follows") %>%
    traces()
head(precedence)
```

## Conformance checking

### Rule-based conformance

Using the packages processcheckr prodecural rules can be checked in an event log. Checking rules will add a boolean case attribute, which can be used for filtering or in analysis.

Rules can be checked using the check_rule function (see example below). It will create a new logical variable to indicate for which cases the rule holds. The name of the variable can be configured using the label argument in check_rule.

In the following example, the first rule checks the starting activity, while the second rule checks whether CRP and LacticAcid occur together.

```{r}
log_xes %>%
  # check if cases starts with "ER Registration"
  check_rule(starts("ER Registration"), label = "r1") %>%
  # check if activities "CRP" and "LacticAcid" occur together
  check_rule(and("CRP","LacticAcid"), label = "r2") %>%
  group_by(r1, r2) %>%
  n_cases()
```

### Alignments

Alignments is under development and can be used with the bupaRminer library. More information can be found in the GitHub: https://github.com/bupaverse/bupaRminer.




