contents

part 1 introduction

  1. datascience in action
  2. Processing mining :The Missing Link

part 2 Preliminaries

  1. process modeling and Analysis
  2. Data analysis

part 3 From Event Logs to Process Models

  1. Getting the data
  2. Process Discovery :An Introduction
  3. advance process Discovery Techniques

part 4 Beyond Process Discovery

  1. conformance Checking
  2. Mining Additional Perspective
  3. Operational Support

part 5 Putting Process Mining to Work

  1. Process Mining Software
  2. Process Mining in the large
  3. Analyzing Lasagna Processes
  4. Analyzing Spaghetti Process

part 6 Reflection

  1. Cartography and Navigation
  2. Epilogue

part 1 Introduction

1. Data science in action

this chapter is a introduction of data science and process mining

nowdays ,data science become a new and important disciplines.It can be viewed as an amalgamation of some classican discipline ,such as statistic ,data mining ,database 。

we need to combined Existing approach to turn abundantly available data into value for individual ,organization ,society.

this book focus on the analysis of behavior based on event data

Process mining techniques use event data to discover processes, check compliance, analyze bottlenecks, compare process variants, and suggest improvements

key word :event data

1.1 Internet of event

problem:One of the main challenges of today’s organizations is to extract information and value from data stored in their information systems.

about internet of event :

Process mining aims to exploit event data in a meaningful way, for example, to provide in- sights, identify bottlenecks, anticipate problems, record policy violations, recom- mend countermeasures, and streamline processes. This explains our focus on event data.

1.2 Data science

definition of Data science :

Data science is an interdisciplinary field aiming to turn data into real value. Data may be structured or unstructured, big or small, static or streaming. Value may be provided in the form of predictions, automated decisions, mod- els learned from data, or any type of data visualization delivering insights. Data science includes data extraction, data preparation, data exploration, data transformation, storage and retrieval, computing infrastructures, var- ious types of mining and learning, presentation of explanations and pre- dictions, and the exploitation of results taking into account ethical, social, legal, and business aspects.

Data scientists assist organizations in turning data into value.A data scientist can answer a variety of data-driven questions. such as :

  • (Reporting) What happened?
  • (Diagnosis) Why did it happen?
  • (Prediction) What will happen?
  • (Recommendation) What is the best that can happen?

the ingredient contribution to data science :

data science is quite broad and located at the intersection of existing disciplines. It is difficult to combine all the different skills needed in a single person.

1.3 Bridging the Gap Between Process Science and Data Science

Process science is an umbrella term for the broader discipline that combines knowledge from information technology and knowledge from management sciences to improve and run operational processes

Mainstream data science approaches tend to be process agnostic. Data mining, statistics and machine learning techniques do not consider end-to-end process models. Process science approaches are process- centric, but often focus on modeling rather than learning from event data.

Process mining only recently emerged as a subdiscipline of both data science and process science, but the corresponding techniques can be applied to any type of operational processes (organizations and systems). Example applications include: analyzing treatment processes in hospitals, improving customer service processes in a multinational corporation, understanding the browsing behavior of customers using a booking site, analyzing failures of a baggage handling system, and improv- ing the user interface of an X-ray machine. What all of these applications have in common is that dynamic behavior needs to be related to process models.

1.4 Outlook

The outlook of this book

tool:ProM

Part 2 Preliminaries

3. Process Modeling and Analysis

3.1 the art of modeling

transition systems, Petri nets, BPMN, C-nets, EPCs, YAWL, and process trees

business processes have become more complex, heavily rely on information systems, and may span multiple organizations. There- fore, process modeling has become of the utmost importance.

BPM and operations management have in common that making a good model is “an art rather than a science”. Creating models is therefore a difficult and error-prone task. Typical errors include:

  1. The model describes an idealized version of reality.
  2. Inability to adequately capture human behavior.
  3. The model is at the wrong abstraction level

These are just some of the problems organizations face when making models by hand. Only experienced designers and analysts can make models that have a good predictive value and can be used as a starting point for a (re)implementation or redesign. An inadequate model can lead to wrong conclusions. Therefore, we advo- cate the use of event data. Process mining allows for the extraction of models based on facts. Moreover, process mining does not aim at creating a single model of the process. Instead, it provides various views on the same reality at different abstrac- tion levels.

3.2 Process Model

It is not easy to make good process models. Yet, they are important. Fortunately, pro- cess mining can facilitate the construction of better models in less time. Process dis- covery algorithms like the α-algorithm can automatically generate a process model.

α-algorithm produces a Petri net, it is easy to convert the result into a BPMN model, BPEL model, or UML Activity Diagram

A transition system having one initial state and one final state

3.2.1 Transition Systems

The most basic process modeling notation is a transition system. A transition system consists of states and transitions.

above figure shows a transition system consisting of seven states. It models the handling of a request for compensation within an airline as described in Sect.

The states are represented by black circles.There is one initial state labeled s1 and one final state labeled s7. Each state has a unique label.Transitions are represented by arcs.Each transition connects two states and is labeled with the name of an activity.Multiple arcs can bear the same label

definition of Transition Systems :

The sets Sstart and Send are defined implicitly. In principle, S can be infinite. However, for most practical applications the state space is finite. In this case the transition system is also referred to as a Finite-State Machine (FSM) or a finite-state automaton.

The transition system depicted in Fig. 3.1 can be formalized as follows: S = {s1, s2, s3, s4, s5, s6, s7}, Sstart = {s1}, Send = {s7}, A = {register request, ex- amine thoroughly, examine casually, check ticket, decide, reinitiate request, rejectrequest, pay compensation}, and T = {(s1, register request, s2), (s2, examine ca- sually, s3), (s2, examine thoroughly, s3), (s2, check ticket, s4), (s3, check ticket, s5), (s4, examine casually, s5), (s4, examine thoroughly, s5), (s5, decide, s6), (s6, reinitiate request, s2), (s6, pay compensation, s7), (s6, reject request, s7)}.

Transition systems are simple but have problems expressing concurrency suc- cinctly. Suppose that there are n parallel activities, i.e., all n activities need to be executed but any order is allowed. There are n! possible execution sequences.

3.2.2 Petri net(quite do not understand this part )

Petri nets are the oldest and best investigated process modeling language allowing for the modeling of concurrency.

A marked Petri net:

definition of petri net :

The Petri net shown above can be formalized as follows: P = {start, c1, c2, c3, c4,c5,end},T ={a,b,c,d,e,f,g,h},andF ={(start,a),(a,c1),(a,c2),(c1,b), (c1,c), (c2,d), (b,c3), (c,c3), (d,c4), (c3,e), (c4,e), (e,c5), (c5,f), (f,c1), (f, c2), (c5, g), (c5, h), (g, end), (h, end)}.

Definition of Firing rule:

Definition of Labeled Petri net:

Definiton of Reachability graph:

The reachability graph of the marked Petri net shown in Fig

Three Petri nets: (a) a Petri net with an infinite state space, (b) a Petri net with only one reachable marking, (c) a Petri net with 7776 reachable markings

3.2.3 Workflow Nets

When modeling business processes in terms of Petri nets, we often consider a sub- class of Petri nets known as WorkFlow nets (WF-nets)

Definition of Workflow net :

Definition of Soundness:

3.2.4 YAWL

YAWL is both a workflow modeling language and an open-source workflow system.The acronym YAWL stands for “Yet Another Workflow Language”

YAWL notation

Process model using the YAWL notation

3.2.5 Business Process Modeling Notation (BPMN)

Business Process Modeling Notation (BPMN)

BPMN is supported by many tool vendors and has been standardized by the OMG

BPMN notation:

Process model using the BPMN notation:

3.2.6 Event-Driven Process Chains (EPCs)

EPC notation:

Process model using the EPC notation:

3.2.7 Casual net(quite do not understand)

Causal nets are a representation tailored towards process mining. A causal net is a graph where nodes represent activities and arcs represent causal dependencies.

Definition of casual net :

3.2.8 Process Trees

definition of process tree:

3.3 Model-Based Process Analysis

mainstream approaches for model-based analysis: verification and performance analysis

Verification is con- cerned with the correctness of a system or process. Performance analysis focuses on flow times, waiting times, utilization, and service levels.

3.3.1 Verification

3.3.2 Performance Analysis

3.3.3 Limitations of Model-Based Analysis

4 Data mining

skip this part

part 3 From Event logs to Process Models

5 Getting the data

5.1 Data source

The goal of process mining is to answer questions about operational processes. Examples are:

What really happened in the past? Why did it happen? What is likely to happen in the future? When and why do organizations and people deviate? How to control a process better? How to redesign a process to improve its performance?

5.2 Events log

fragment of event log

Structure of event logs:

5.3 XES

(Mining eXtensible Markup Language). MXML

(eXtensible Event Stream) XES is the successor of MXML.

5.4 data quality

data quality is very important for process mining

6 Process Discovery: An Introduction

event data to process model

alpha algorithm

6.1 problem statment

6.2 A Simple Algorithm for Process Discovery

this part introduce alpha algorithm

6.3 Rediscovering Process Models

6.4 Challenges

Representational Bias Noise and Incompleteness

7.Advanced Process Discovery Techniques

7.1 overview

Overview of the challenges that process discovery techniques need to address

typical characteristics of process discovery algorithms.

7.1.1 Characteristic 1: Representational Bias

7.1.2 Characteristic 2: Ability to Deal With Noise

7.1.3 Characteristic 3: Completeness Notion Assumed

7.1.4 Characteristic 4: Approach Used

7.2 Heuristic Mining

Causal Nets

Both the representational bias provided by causal nets and the usage of frequencies makes the approach much more robust than most other approaches.

7.3 Genetic Process Mining

The α-algorithm and techniques for heuristic and fuzzy mining provide process models in a direct and deterministic manner. Evolutionary approaches use an it- erative procedure to mimic the process of natural evolution.

7.4 Region-Based Mining

在 Petri 网的背景下,研究人员一直在研究所谓的综合问题,即从对其行为的描述中构建系统模型。 基于状态的区域可用于从过渡系统构建 Petri 网。 基于语言的区域可用于从前缀封闭的语言构建 Petri 网。 使用基于语言的区域的综合方法可以直接应用于事件日志。 要应用基于状态的区域,首先需要创建一个过渡系统。

7.5 Inductive Mining

The inductive mining framework is highly extendible and allows for many variants of the basic approach. The “fam- ily” of inductive mining techniques includes members that can handle infrequent behavior and deal with huge models and logs while ensuring formal correctness cri- teria such as the ability to rediscover the original model (in the limit). The results returned by these techniques can easily be converted to other notations ranging from Petri nets and BPMN models to process calculi and statecharts. Inductive mining is currently one of the leading process discovery approaches due to its flexibility, for- mal guarantees and scalability.

7.6 Historical Perspective

introduction the history of process mining

part 4 Beyond Process Discovery

8 Conformance Checking

Conformance checking relates events in the event log to activities in the process model and compares both. The goal is to find commonalities and discrepancies between the modeled behavior and the observed behavior. Conformance checking is relevant for business alignment and auditing. For example, the event log can be replayed on top of the process model to find undesirable deviations suggesting fraud or inefficiencies. Moreover, conformance checking techniques can also be used for measuring the performance of process discovery algorithms and to repair models that are not aligned well with reality.

8.1 Business Alignment and Auditing

8.2 Token Replay

8.3 Alignments

if a case does not fit, the approach does not create a corresponding path through the model. We would like to map observed behavior onto modeled behavior to provide better diagnostics and to relate also non- fitting cases to the model.

8.4 Comparing Footprints

8.5 Other Applications of Conformance Checking

Conformance checking can be used for improving the alignment of business pro- cesses, organizations, and information systems. As shown, replay techniques and footprint analysis help to identify differences between a process model and the real process as recorded in the event log. The differences identified may lead to changes of the model or process. For example, exposing deviations between the model and process may lead to better work instructions or changes in management. Confor- mance checking is also a useful tool for auditors that need to make sure that pro- cesses are executed within the boundaries set by various stakeholders.

9 Mining Additional Perspectives

Timestamps and frequencies of activities can be used to identify bottlenecks and diagnose other performance related problems.

9.1 Perspectives

9.2 Attributes: A Helicopter View

9.3 Organizational Mining

Organizational mining focuses on the organizational perspective

1.Social Network Analysis 2.Discovering Organizational Structures 3. Analyzing Resource Behavior

9.4 Time and Probabilities

The presence of timestamps enables the discovery of bottlenecks, the analysis of service levels, the monitoring of resource utilization, and the prediction of remaining processing times of running cases.

9.5 Decision Mining

Decision mining aims to find rules explaining such choices in terms of characteristics of the case

9.6 Bringing It All Together

step :

1.Step 1: obtain an event log 2.Step 2: create or discover a process model. 3.Step 3: connect events in the log to activities in the model. 4.Step 4: extend the model. Step 4a: add the organizational perspective.Step 4b: add the time perspective.Step 4c: add the case perspective.Step 4d: add other perspectives. 5.Step 5: return the integrated model.

10 perational Support

Today, however, many data sources are updated in (near) real-time and sufficient computing power is available to analyze events when they occur. Therefore, process mining should not be restricted to off-line analysis and can also be used for online operational support.

10.1 Refined Process Mining Framework

Refined process mining framework:

10.2 Online Process Mining

10.3 Detect

This can be seen as conformance checking “on-the-fly”.

10.4 Predict

10.5 Recommend

A recommendation is always given with respect to a specific goal.

10.6 Processes Are Not in Steady State!

There are three challenges when dealing with concept drift

Change point detection. Did the process change? If so, when did it change? Change localization and characterization. What has changed? *Changeprocessdiscovery.Howtocaptureandpredict“second-order”dynamics?

part 5 Putting Process Mining to Work

11. Process Mining Software

Process discovery, conformance checking, social network analysis, or- ganizational mining, clustering, decision mining, prediction, and recommendation are all supported by ProM plug-ins. However, the usability of the hundreds of avail- able plug-ins varies and the complexity of the tool may be overwhelming for end- users. In recent years, several vendors released dedicated process mining tools (e.g., Celonis, Disco, EDS, Fujitsu, Minit, myInvenio, Perceptive, PPM, QPR, Rialto, and SNP). These tools typically provide less functionality than ProM,

11.1 Process Mining Not Included!

talk about BI.

BI:The broad definition provided by Forrester is “BI is a set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information used to enable more effective strategic, tactical, and operational insights and decision- making”

The typical functionality provided by these products includes:

  1. ETL (Extract, Transform, and Load).
  2. Ad-hoc querying.
  3. Reporting.
  4. Interactive dashboards.
  5. Alert generation.

The mainstream BI products from vendors such as IBM, Oracle, SAP, and Mi- crosoft do not support process mining.

datamining tool :WEka

11.2 Different Types of Process Mining Tools

11.3 ProM: An Open-Source Process Mining Platform

talk about ProM

It is free

11.4 Commercial Software

12.Process Mining in the Large

about Big Event data

13.Analyzing “Lasagna Processes”

13.1 Characterization of “Lasagna Processes”

13.2 Use Cases

The goal of process mining is to improve operational processes. In order to judge whether process mining efforts are successful, we need to define Key Performance Indicators (KPIs).

three classes of KPIs: KPIs related to time (e.g., lead time, service time, waiting time, and synchronization time), KPIs re- lated to costs, and KPIs related to quality.

To evaluate suggested improvements, the effectiveness and efficiency of the as-is and to-be processes need to be quantified in terms of KPIs.

For Lasagna processes, process mining can result in one or more of the following improvement actions:

Redesign. Insights obtained using process mining can trigger changes to the pro- cess, e.g., sequential activities no longer need to be executed in a fixed order, checks may be skipped for easy cases, decisions can be delegated if more than 50 cases are queueing, etc. Fraud detected using process mining may result in ad- ditional compliance regulations, e.g., introducing the 4-eyes principle for critical activities. Adjust. Similarly, process mining can result in (temporary) adjustments. For ex- ample, insights obtained using process mining can be used to temporarily allocate more resources to the process and to lower the threshold for delegation. Intervene. Process mining may also reveal problems related to particular cases or resources. This may trigger interventions such as aborting a case that has been queuing for more than 3 months or disciplinary measures for a worker that re- peatedly violated compliance regulations. Support. Process mining can be used for operational support, e.g., based on his- toric information a process mining tool can predict the remaining flow time or recommend the action with the lowest expected costs.

13.3 Approach

1.Stage 0: Plan and Justify 2. Stage 1: Extract 3. Stage 2: Create Control-Flow Model and Connect Event Log 4. Stage 3: Create Integrated Process Model 5. Stage 4: Operational Support

The L∗ life-cycle model describing a process mining project consisting of five stages: plan and justify (Stage 0), extract (Stage 1), create control-flow model and connect event log (Stage 2), create integrated process model (Stage 3), and operational support (Stage 4)

13.4 Applications

talk about case of process mining

14.Analyzing “Spaghetti Processes”

14.1 Characterization of “Spaghetti Processes”

14.2 Approach

14.3 Applications

talk about case

part 6 Reflection

15.Cartography and Navigation

Process models can be seen as the “maps” describing the operational processes of organizations. Similarly, information systems can be looked at as “navigation sys- tems” guiding the flow of work in organizations. Unfortunately, many organiza- tions fail in creating and maintaining accurate business process maps. Often process models are outdated and have little to do with reality. Moreover, most information systems fail to provide the functionality offered by today’s navigation systems.

15.1 Business Process Maps

Process models can be seen as the “business process maps” describing the op- erational processes of organizations [138]. Unfortunately, accurate business process maps are typically missing. Process models tend to be outdated and not aligned with reality. Moreover, unlike geographic maps, process models are typically not well understood by end users.

  1. Map Quality
  2. Aggregation and Abstraction
  3. Seamless Zoom
  4. Size, Color, and Layout
  5. Customization

16.Epilogue

16.1 Process Mining as a Bridge Between Data Mining and Business Process Management

Process mining is an important tool for modern organizations that need to manage non-trivial operational processes. On the one hand, there is an incredible growth of event data. On the other hand, processes and information need to be aligned per- fectly in order to meet requirements related to compliance, efficiency, and customer service. The digital universe and the physical universe are amalgamating into one universe where events are recorded as they happen and processes are guided and controlled based on event data.

16.2 Challenges

  1. Process discovery is probably the most important and most visible intellectual challenge related to process mining. As shown, it is far from trivial to construct a process model based on event logs that are incomplete and noisy.

  2. Another challenge is the notion of concept drift,

  3. conformance checking is not well supported by today’s com- mercial process mining tools.there is a need for better performing conformance checking techniques.

  4. Process mining heavily depends to the ability to extract suitable event logs. The scope and granularity of an event log should match the questions one would like to answer. Unfortunately, in some information systems event data are just a byproduct for debugging or scattered over many tables. Some systems also “forget” events, e.g., when a record is updated, the old values are simply overwritten.

  5. Another challenge is produce process models that have a quality and understand- ability comparable to geographic maps.