Pre-requirements
The easiest way to install the core bupaR packages is by installing
the bupaverse-package.
install.packages("bupaverse")
install.packages("processanimateR")
install.packages("psmineR")
You can then load the packages using library().
library(bupaverse)
print(R.version.string)
In case R needs to be updated to version 4.4.2, update the packages
after installation.
update.packages("bupaR")
update.packages("eventdataR")
update.packages("xesreadR")
update.packages("edeaR")
update.packages("processmapR")
update.packages("processmonitR")
library(bupaverse)
library(dplyr)
library(processanimateR)
library(bpmnR)
library(processcheckR)
Loading an event log
log_xes <- read_xes("C:/Users/Admin/Downloads/Sepsis Cases - Event Log.xes")
log_xes
# Log of 15214 events consisting of:
846 traces
1050 cases
15214 instances of 16 activities
1 resource
Events occurred from 2013-11-07 07:18:29 until 2015-06-05 10:25:11
# Variables were mapped as follows:
Case identifier: CASE_concept_name
Activity identifier: activity_id
Resource identifier: resource_id
Activity instance identifier: activity_instance_id
Timestamp: timestamp
Lifecycle transition: lifecycle_id
The mapping function can be used to retrieve all the meta data from a
log object, i.e. the relation between log identifiers and the
corresponding data fields.
log_xes %>% activities()
#log_xes %>% cases()
#log_xes %>% resources()
#log_xes %>% traces()
Discovery
Log statistics
An analysis of the control flow can be done by using metrics on
activities and traces.
activity_presence <- log_xes %>% activity_presence()
activity_presence %>% plot()

The trace coverage metric shows the relationship between the number
of different activity sequences (i.e. traces) and the number of cases
they cover.
trace_coverage <- log_xes %>% trace_coverage("trace")
trace_coverage %>% plot()

The trace length metric describes the length of traces, i.e. the
number of activity instances for each case. It can be computed at the
levels case, trace and log.
trace_length <- log_xes %>% trace_length("log")
trace_length %>% plot()

Process models
Frequency maps
A process map of a log can be created using process_map(). A process
map is a directly-follows graph, where each distinct activity is
represented by a node, and each directly-follows relationship between
activities is shown by directed edges, i.e. arrows between the
nodes.
frequency <- log_xes %>% filter_trace_frequency(percentage = 0.8)
frequency %>% process_map(frequency("absolute"))
Animation
It is possible to determine the aesthetics of tokens regardless of
the timestamps at which activities occurred. This could be useful if
some measurements were taken throughout a process, but the measurement
event itself should not be included in the process map.
For example, the lacticacid measurements could be used in that
way:
# Extract only the lacticacid measurements
lactic <- log_xes %>%
mutate(LacticAcid = as.numeric(LacticAcid)) %>%
filter_activity(c("LacticAcid")) %>%
as.data.frame() %>%
select("case" = CASE_concept_name,
"time" = timestamp,
value = LacticAcid) # format needs to be 'case,time,value'
# Remove the measurement events from the sepsis log
sepsisBase <- log_xes %>%
filter_activity(c("LacticAcid", "CRP", "Leucocytes", "Return ER",
"IV Liquid", "IV Antibiotics"), reverse = T) %>%
filter_trace_frequency(percentage = 0.95)
# Animate with the secondary data frame `lactic`
animate_process(sepsisBase,
mode = "relative",
duration = 300,
legend = "color",
mapping = token_aes(color = token_scale(lactic,
scale = "linear",
range = c("#fff5eb","#7f2704"))))
Process visualizations
Process matrix
A process matrix is a two-dimensional matrix showing the flows
between activities. Its configuration is exactly the same as that used
by process_map().
matrix_frequency <- log_xes %>% process_matrix(frequency("absolute"))
matrix_frequency %>% plot()

Dotted chart
Dotted charts can be made with dotted_chart(). A dotted chart is a
graph in which each activity instance is displayed by a dot. The x-axis
refers to the time aspect, while the y-axis refers to cases.
dotted_chart <- log_xes %>% dotted_chart(x = "absolute")
dotted_chart

Trace explorer
Different activity sequences in the log can be visualized with
trace_explorer(). It can be used to explore frequent as well as
infrequent traces.
trace_explorer <- log_xes %>% trace_explorer()
Warning: No `coverage` or `n_traces` set.
! Defaulting to `coverage` = 0.2 for `type` = "frequent" traces.
trace_explorer

Filtering
Infrequent flows
Filtering infrequent flows allows us to select a set of cases in
which every directly-follows flow has a minimum frequency. For example,
consider the process map below.
log_xes %>% process_map()
In this map, we can observe several unique directly follows
relations, as well as flows occurring less than 30 times. Using the
filter, we can remove the cases that lead to these flows as follows:
log_xes <- log_xes %>%
mutate(activity_instance_id = as.character(activity_instance_id))
infrequent_flows <- log_xes %>% filter_infrequent_flows(min_n = 30) %>% process_map()
infrequent_flows
Time period
Filtering cases by time period can be done using the
filter_time_period(). There are four different filter_method’s that act
as case filters:
- “start”: all cases started in an interval.
- “complete”: all cases completed in an interval.
- “contained”: all cases contained in an interval.
- “intersecting”: all cases with some activity in an interval.
Using the interval January 2015, you can compare the results of
different filtering methods below using dotted charts.
time_period <- log_xes %>% filter_time_period(interval = ymd(c(20150101, 20150131)), filter_method = "start") %>% dotted_chart()
time_period

Case condition
filter_case_condition() can be used to select cases for which a
condition holds. This condition can be related to any of the variables
in the log.
For example, select all cases where age higher than 85 is
involved.
age_85 <- log_xes %>% filter(!is.na(Age)) %>% filter_case_condition(Age >= 85)
age_85
# Log of 304 events consisting of:
6 traces
304 cases
304 instances of 6 activities
1 resource
Events occurred from 2013-11-07 07:18:29 until 2015-02-19 17:15:45
# Variables were mapped as follows:
Case identifier: CASE_concept_name
Activity identifier: activity_id
Resource identifier: resource_id
Activity instance identifier: activity_instance_id
Timestamp: timestamp
Lifecycle transition: lifecycle_id
process_map(age_85)
Precedence
The filter_precedence() allows us to filter cases based on flows
between activities.
If there is more than one antecedent or consequent activity, the
filter will test all possible pairs. The filter_method will tell the
filter whether all of the rules should hold, at least one, or none are
allowed.
The following filter takes only cases where Triage and Assessment is
directly followed by Blood test.
precedence <- log_xes %>%
filter_precedence(antecedents = "ER Triage",
consequents = "Leucocytes",
precedence_type = "directly_follows") %>%
traces()
head(precedence)
---
title: "BupaR tutorial"
output: html_notebook
---

## Resources

* Documentation: https://bupar.net/
* Cheet sheet:https://www.bupar.net/materials/20170904%20poster%20bupaR.pdf
* GitHub: https://github.com/bupaverse/

## Pre-requirements

The easiest way to install the core bupaR packages is by installing the bupaverse-package.

```{r}
install.packages("bupaverse")
```

```{r}
install.packages("processanimateR")
install.packages("psmineR")
```

You can then load the packages using library().

```{r}
library(bupaverse)
print(R.version.string)
```

In case R needs to be updated to version 4.4.2, update the packages after installation.

```{r}
update.packages("bupaR")
update.packages("eventdataR")
update.packages("xesreadR")
update.packages("edeaR")
update.packages("processmapR")
update.packages("processmonitR")
```

```{r}
library(bupaverse)
library(dplyr)
library(processanimateR)
library(bpmnR)
library(processcheckR)
```

## Loading an event log

```{r}
log_xes <- read_xes("C:/Users/Admin/Downloads/Sepsis Cases - Event Log.xes")
```

```{r}
log_xes
```
The mapping function can be used to retrieve all the meta data from a log object, i.e. the relation between log identifiers and the corresponding data fields.

```{r}
log_xes %>% activities()
#log_xes %>% cases()
#log_xes %>% resources()
#log_xes %>% traces()
```

## Discovery

### Log statistics

An analysis of the control flow can be done by using metrics on activities and traces.

```{r}
activity_presence <- log_xes %>% activity_presence()
activity_presence %>% plot()
```
The trace coverage metric shows the relationship between the number of different activity sequences (i.e. traces) and the number of cases they cover.

```{r}
trace_coverage <- log_xes %>% trace_coverage("trace")
trace_coverage %>% plot()
```
The trace length metric describes the length of traces, i.e. the number of activity instances for each case. It can be computed at the levels case, trace and log.

```{r}
trace_length <- log_xes %>% trace_length("log")
trace_length %>% plot()
```

### Process models

#### Frequency maps

A process map of a log can be created using process_map(). A process map is a directly-follows graph, where each distinct activity is represented by a node, and each directly-follows relationship between activities is shown by directed edges, i.e. arrows between the nodes.

```{r}
frequency <- log_xes %>% filter_trace_frequency(percentage = 0.8)
```

```{r}
frequency %>% process_map(frequency("absolute"))
```

#### Performance maps

Instead of a frequencies, process maps can also be used to visualize performance of the process, by using performance() to configure the map, instead of frequency().

```{r}
performance <- log_xes %>% process_map(performance())
performance
```

Information about frequencies and performance, or any other value, can be combined in the same graph.

```{r}
frequency_performance <- log_xes %>% process_map(type_nodes = frequency("relative_case"), type_edges = performance(mean))
frequency_performance
```

#### Animation

It is possible to determine the aesthetics of tokens regardless of the timestamps at which activities occurred. This could be useful if some measurements were taken throughout a process, but the measurement event itself should not be included in the process map.

For example, the lacticacid measurements could be used in that way:

```{r}
# Extract only the lacticacid measurements
lactic <- log_xes %>%
    mutate(LacticAcid = as.numeric(LacticAcid)) %>%
    filter_activity(c("LacticAcid")) %>%
    as.data.frame() %>%
    select("case" = CASE_concept_name, 
            "time" =  timestamp, 
            value = LacticAcid) # format needs to be 'case,time,value'

# Remove the measurement events from the sepsis log
sepsisBase <- log_xes %>%
    filter_activity(c("LacticAcid", "CRP", "Leucocytes", "Return ER",
                      "IV Liquid", "IV Antibiotics"), reverse = T) %>%
    filter_trace_frequency(percentage = 0.95)

# Animate with the secondary data frame `lactic`
animate_process(sepsisBase, 
                mode = "relative", 
                duration = 300,
                legend = "color", 
                mapping = token_aes(color = token_scale(lactic, 
                                                        scale = "linear", 
                                                        range = c("#fff5eb","#7f2704")))) 
```

### Process visualizations

#### Process matrix

A process matrix is a two-dimensional matrix showing the flows between activities. Its configuration is exactly the same as that used by process_map().

```{r}
matrix_frequency <- log_xes %>% process_matrix(frequency("absolute")) 
matrix_frequency %>% plot()
```

#### Dotted chart

Dotted charts can be made with dotted_chart(). A dotted chart is a graph in which each activity instance is displayed by a dot. The x-axis refers to the time aspect, while the y-axis refers to cases.

```{r}
dotted_chart <- log_xes %>% dotted_chart(x = "absolute")
dotted_chart
```

#### Trace explorer

Different activity sequences in the log can be visualized with trace_explorer(). It can be used to explore frequent as well as infrequent traces.

```{r}
trace_explorer <- log_xes %>% trace_explorer()
trace_explorer
```

#### Performance spectrum

Both detailed and aggregated performance spectrum can be created using ps_detailed() and ps_aggregated(), respectively.

```{r}
library(psmineR)

spectrum_detailed <- log_xes %>% ps_detailed()
spectrum_detailed
```

## Filtering

### Infrequent flows

Filtering infrequent flows allows us to select a set of cases in which every directly-follows flow has a minimum frequency. For example, consider the process map below.

```{r}
log_xes %>% process_map()
```

In this map, we can observe several unique directly follows relations, as well as flows occurring less than 30 times. Using the filter, we can remove the cases that lead to these flows as follows:

```{r}
log_xes <- log_xes %>%
  mutate(activity_instance_id = as.character(activity_instance_id))

infrequent_flows <- log_xes %>% filter_infrequent_flows(min_n = 30) %>% process_map()
infrequent_flows
```

### Time period

Filtering cases by time period can be done using the filter_time_period(). There are four different filter_method’s that act as case filters:

* “start”: all cases started in an interval.
* “complete”: all cases completed in an interval.
* “contained”: all cases contained in an interval.
* “intersecting”: all cases with some activity in an interval.

Using the interval January 2015, you can compare the results of different filtering methods below using dotted charts.

```{r}
time_period <- log_xes %>% filter_time_period(interval = ymd(c(20150101, 20150131)), filter_method = "start") %>% dotted_chart() 
time_period
```

### Case condition

filter_case_condition() can be used to select cases for which a condition holds. This condition can be related to any of the variables in the log.

For example, select all cases where age higher than 85 is involved.

```{r}
age_85 <- log_xes %>% filter(!is.na(Age)) %>% filter_case_condition(Age >= 85)
age_85
process_map(age_85)
```

### Precedence

The filter_precedence() allows us to filter cases based on flows between activities.

If there is more than one antecedent or consequent activity, the filter will test all possible pairs. The filter_method will tell the filter whether all of the rules should hold, at least one, or none are allowed.

The following filter takes only cases where Triage and Assessment is directly followed by Blood test.

```{r}
precedence <- log_xes %>%
    filter_precedence(antecedents = "ER Triage",
                      consequents = "Leucocytes",
                      precedence_type = "directly_follows") %>%
    traces()
head(precedence)
```

## Conformance checking

### Rule-based conformance

Using the packages processcheckr prodecural rules can be checked in an event log. Checking rules will add a boolean case attribute, which can be used for filtering or in analysis.

Rules can be checked using the check_rule function (see example below). It will create a new logical variable to indicate for which cases the rule holds. The name of the variable can be configured using the label argument in check_rule.

In the following example, the first rule checks the starting activity, while the second rule checks whether CRP and LacticAcid occur together.

```{r}
log_xes %>%
  # check if cases starts with "ER Registration"
  check_rule(starts("ER Registration"), label = "r1") %>%
  # check if activities "CRP" and "LacticAcid" occur together
  check_rule(and("CRP","LacticAcid"), label = "r2") %>%
  group_by(r1, r2) %>%
  n_cases()
```

### Alignments

Alignments is under development and can be used with the bupaRminer library. More information can be found in the GitHub: https://github.com/bupaverse/bupaRminer.




