this chapter is a introduction of data science and process mining
nowdays ,data science become a new and important disciplines.It can be viewed as an amalgamation of some classican discipline ,such as statistic ,data mining ,database 。
we need to combined Existing approach to turn abundantly available data into value for individual ,organization ,society.
this book focus on the analysis of behavior based on event data
Process mining techniques use event data to discover processes, check compliance, analyze bottlenecks, compare process variants, and suggest improvements
key word :event data
problem:One of the main challenges of today’s organizations is to extract information and value from data stored in their information systems.
about internet of event :
Process mining aims to exploit event data in a meaningful way, for example, to provide in- sights, identify bottlenecks, anticipate problems, record policy violations, recom- mend countermeasures, and streamline processes. This explains our focus on event data.
definition of Data science :
Data science is an interdisciplinary field aiming to turn data into real value. Data may be structured or unstructured, big or small, static or streaming. Value may be provided in the form of predictions, automated decisions, mod- els learned from data, or any type of data visualization delivering insights. Data science includes data extraction, data preparation, data exploration, data transformation, storage and retrieval, computing infrastructures, var- ious types of mining and learning, presentation of explanations and pre- dictions, and the exploitation of results taking into account ethical, social, legal, and business aspects.
Data scientists assist organizations in turning data into value.A data scientist can answer a variety of data-driven questions. such as :
the ingredient contribution to data science :
data science is quite broad and located at the intersection of existing disciplines. It is difficult to combine all the different skills needed in a single person.
Process science is an umbrella term for the broader discipline that combines knowledge from information technology and knowledge from management sciences to improve and run operational processes
Mainstream data science approaches tend to be process agnostic. Data mining, statistics and machine learning techniques do not consider end-to-end process models. Process science approaches are process- centric, but often focus on modeling rather than learning from event data.
Process mining only recently emerged as a subdiscipline of both data science and process science, but the corresponding techniques can be applied to any type of operational processes (organizations and systems). Example applications include: analyzing treatment processes in hospitals, improving customer service processes in a multinational corporation, understanding the browsing behavior of customers using a booking site, analyzing failures of a baggage handling system, and improv- ing the user interface of an X-ray machine. What all of these applications have in common is that dynamic behavior needs to be related to process models.
The goal of process mining is to use event data to extract process-related information, e.g., to automatically discover a process model by observing events recorded by some enterprise system.
Process mining is both data-driven and process-centric: Using a combination of event data and process models a wide range of conformance and performance questions can be answered
then this book talk about Process models.A Petri net modeling the handling of compensation requests
he same process modeled in terms of BPMN
The BPM life-cycle showing the different uses of process models
process models play a dominant role in the (re)design and configuration/implementation phases, whereas data plays a dominant role in the enactment/monitoring and diagnosis/requirements phases.
Until recently, there were few connections between the data produced while executing the process and the actual process design.
Process mining is a relative young research discipline that sits between machine learning and data mining on the one hand and process modeling and analysis on the other hand. The idea of process mining is to discover, monitor and improve real processes (i.e., not assumed processes) by extracting knowledge from event logs readily available in today’s systems.
Positioning of the three main types of process mining: discovery, conformance, and en- hancement:
this part takl about example
α-algorithm
One of the key elements of process mining is the emphasis on establishing a strong relation between a process model and “reality” captured in the form of an event log.
Three ways of relating event logs (or other sources of information containing example behavior) and process models: Play-In, Play-Out, and Replay
The process mining spectrum is quite broad and extends far beyond process discov- ery and conformance checking. Process mining also connects data science and pro- cess science
Business Process Management (BPM) is the discipline that combines approaches for the design, execution, control, measurement and optimization of business pro- cesses.
Process mining can be best related to BPM by looking at the so-called BPM life-cycle.
this chapter also introduce more about BPM.
Data mining techniques aim to analyze (often large) data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner
Like process mining, data mining is data-driven.
unlike process mining, mainstream data mining techniques are typically not process-centric. Process models expressed in terms of Petri nets or BPMN dia- grams cannot be discovered or analyzed in any way by the main data mining tools.
Lean Six Sigma is a methodology that combines ideas from lean manufacturing and Six Sigma. The idea is to improve performance by systematically removing waste.
Lean principles originate from the Japanese manufacturing industry. The Toyota Production System (TPS) is a well-known example of a lean manufacturing ap- proach.
TPS are to eliminate “muri” (overburdening of people and equipment), “mura” (unevenness in operations), and “muda” (waste). The emphasis is on waste (“muda”) reduction.
seven type of waste : 1. Transportation waste 2. Inventory waste 3. Motion waste 4. Unnecessary waiting 5. Over-processing waste 6. Overproduction waste 7. Defects
A typical Lean Six Sigma project follows the so-called DMAIC approach con- sisting of five steps:
Business Process Reengineering (BPR) is a management approach
There is no clear definition for BI. On the one hand, it is a very broad term that includes anything that aims at providing actionable information that can be used to support decision making. On the other hand, vendors and consultants tend to conveniently skew the definition towards a particular tool or methodology.
The focus is on querying and reporting combined with simple visualization techniques showing dashboards and scorecards. Some systems provide data mining capabilities or support Online An- alytical Processing (OLAP)
Under the BI umbrella, many fancy terms have been introduced to refer to rather simple reporting and dashboard tools.
Process mining complements Complex Event Processing (CEP). CEP combines data from multiple sources to infer events or patterns that suggest higher-level events. The goal of CEP is to identify meaningful events (such as opportunities or threats) and respond to them as quickly as possible
corporate governance, risk, and com- pliance.
conformance checking can be used to reveal deviations, defects, and near incidents, it is a valuable tool to check compliance and manage risks.
Automated Business Process Discovery (ABPD)
Business Process Intelligence (BPI)
Workflow Mining (WM)
“four V’s of Big Data”: Volume, Velocity, Variety, and Ve- racity
These reflect the typical characteristics of some of the exciting new data sources interesting for analysis. Big Data does not focus on a particular type of analysis and is not limited to process-related data.
transition systems, Petri nets, BPMN, C-nets, EPCs, YAWL, and process trees
business processes have become more complex, heavily rely on information systems, and may span multiple organizations. There- fore, process modeling has become of the utmost importance.
BPM and operations management have in common that making a good model is “an art rather than a science”. Creating models is therefore a difficult and error-prone task. Typical errors include:
These are just some of the problems organizations face when making models by hand. Only experienced designers and analysts can make models that have a good predictive value and can be used as a starting point for a (re)implementation or redesign. An inadequate model can lead to wrong conclusions. Therefore, we advo- cate the use of event data. Process mining allows for the extraction of models based on facts. Moreover, process mining does not aim at creating a single model of the process. Instead, it provides various views on the same reality at different abstrac- tion levels.
It is not easy to make good process models. Yet, they are important. Fortunately, pro- cess mining can facilitate the construction of better models in less time. Process dis- covery algorithms like the α-algorithm can automatically generate a process model.
α-algorithm produces a Petri net, it is easy to convert the result into a BPMN model, BPEL model, or UML Activity Diagram
A transition system having one initial state and one final state
The most basic process modeling notation is a transition system. A transition system consists of states and transitions.
above figure shows a transition system consisting of seven states. It models the handling of a request for compensation within an airline as described in Sect.
The states are represented by black circles.There is one initial state labeled s1 and one final state labeled s7. Each state has a unique label.Transitions are represented by arcs.Each transition connects two states and is labeled with the name of an activity.Multiple arcs can bear the same label
definition of Transition Systems :
The sets Sstart and Send are defined implicitly. In principle, S can be infinite. However, for most practical applications the state space is finite. In this case the transition system is also referred to as a Finite-State Machine (FSM) or a finite-state automaton.
The transition system depicted in Fig. 3.1 can be formalized as follows: S = {s1, s2, s3, s4, s5, s6, s7}, Sstart = {s1}, Send = {s7}, A = {register request, ex- amine thoroughly, examine casually, check ticket, decide, reinitiate request, rejectrequest, pay compensation}, and T = {(s1, register request, s2), (s2, examine ca- sually, s3), (s2, examine thoroughly, s3), (s2, check ticket, s4), (s3, check ticket, s5), (s4, examine casually, s5), (s4, examine thoroughly, s5), (s5, decide, s6), (s6, reinitiate request, s2), (s6, pay compensation, s7), (s6, reject request, s7)}.
Transition systems are simple but have problems expressing concurrency suc- cinctly. Suppose that there are n parallel activities, i.e., all n activities need to be executed but any order is allowed. There are n! possible execution sequences.
Petri nets are the oldest and best investigated process modeling language allowing for the modeling of concurrency.
A marked Petri net:
definition of petri net :
The Petri net shown above can be formalized as follows: P = {start, c1, c2, c3, c4,c5,end},T ={a,b,c,d,e,f,g,h},andF ={(start,a),(a,c1),(a,c2),(c1,b), (c1,c), (c2,d), (b,c3), (c,c3), (d,c4), (c3,e), (c4,e), (e,c5), (c5,f), (f,c1), (f, c2), (c5, g), (c5, h), (g, end), (h, end)}.
Definition of Firing rule:
Definition of Labeled Petri net:
Definiton of Reachability graph:
The reachability graph of the marked Petri net shown in Fig
Three Petri nets: (a) a Petri net with an infinite state space, (b) a Petri net with only one reachable marking, (c) a Petri net with 7776 reachable markings
When modeling business processes in terms of Petri nets, we often consider a sub- class of Petri nets known as WorkFlow nets (WF-nets)
Definition of Workflow net :
Definition of Soundness:
YAWL is both a workflow modeling language and an open-source workflow system.The acronym YAWL stands for “Yet Another Workflow Language”
YAWL notation
Process model using the YAWL notation
Business Process Modeling Notation (BPMN)
BPMN is supported by many tool vendors and has been standardized by the OMG
BPMN notation:
Process model using the BPMN notation:
EPC notation:
Process model using the EPC notation:
Causal nets are a representation tailored towards process mining. A causal net is a graph where nodes represent activities and arcs represent causal dependencies.
Definition of casual net :
definition of process tree:
mainstream approaches for model-based analysis: verification and performance analysis
Verification is con- cerned with the correctness of a system or process. Performance analysis focuses on flow times, waiting times, utilization, and service levels.
skip this part
The goal of process mining is to answer questions about operational processes. Examples are:
What really happened in the past? Why did it happen? What is likely to happen in the future? When and why do organizations and people deviate? How to control a process better? How to redesign a process to improve its performance?
fragment of event log
Structure of event logs:
(Mining eXtensible Markup Language). MXML
(eXtensible Event Stream) XES is the successor of MXML.
data quality is very important for process mining
event data to process model
alpha algorithm
this part introduce alpha algorithm
Representational Bias Noise and Incompleteness
Overview of the challenges that process discovery techniques need to address
typical characteristics of process discovery algorithms.
Causal Nets
Both the representational bias provided by causal nets and the usage of frequencies makes the approach much more robust than most other approaches.
The α-algorithm and techniques for heuristic and fuzzy mining provide process models in a direct and deterministic manner. Evolutionary approaches use an it- erative procedure to mimic the process of natural evolution.
在 Petri 网的背景下,研究人员一直在研究所谓的综合问题,即从对其行为的描述中构建系统模型。 基于状态的区域可用于从过渡系统构建 Petri 网。 基于语言的区域可用于从前缀封闭的语言构建 Petri 网。 使用基于语言的区域的综合方法可以直接应用于事件日志。 要应用基于状态的区域,首先需要创建一个过渡系统。
The inductive mining framework is highly extendible and allows for many variants of the basic approach. The “fam- ily” of inductive mining techniques includes members that can handle infrequent behavior and deal with huge models and logs while ensuring formal correctness cri- teria such as the ability to rediscover the original model (in the limit). The results returned by these techniques can easily be converted to other notations ranging from Petri nets and BPMN models to process calculi and statecharts. Inductive mining is currently one of the leading process discovery approaches due to its flexibility, for- mal guarantees and scalability.
introduction the history of process mining
Conformance checking relates events in the event log to activities in the process model and compares both. The goal is to find commonalities and discrepancies between the modeled behavior and the observed behavior. Conformance checking is relevant for business alignment and auditing. For example, the event log can be replayed on top of the process model to find undesirable deviations suggesting fraud or inefficiencies. Moreover, conformance checking techniques can also be used for measuring the performance of process discovery algorithms and to repair models that are not aligned well with reality.
if a case does not fit, the approach does not create a corresponding path through the model. We would like to map observed behavior onto modeled behavior to provide better diagnostics and to relate also non- fitting cases to the model.
Conformance checking can be used for improving the alignment of business pro- cesses, organizations, and information systems. As shown, replay techniques and footprint analysis help to identify differences between a process model and the real process as recorded in the event log. The differences identified may lead to changes of the model or process. For example, exposing deviations between the model and process may lead to better work instructions or changes in management. Confor- mance checking is also a useful tool for auditors that need to make sure that pro- cesses are executed within the boundaries set by various stakeholders.
Timestamps and frequencies of activities can be used to identify bottlenecks and diagnose other performance related problems.
Organizational mining focuses on the organizational perspective
1.Social Network Analysis 2.Discovering Organizational Structures 3. Analyzing Resource Behavior
The presence of timestamps enables the discovery of bottlenecks, the analysis of service levels, the monitoring of resource utilization, and the prediction of remaining processing times of running cases.
Decision mining aims to find rules explaining such choices in terms of characteristics of the case
step :
1.Step 1: obtain an event log 2.Step 2: create or discover a process model. 3.Step 3: connect events in the log to activities in the model. 4.Step 4: extend the model. Step 4a: add the organizational perspective.Step 4b: add the time perspective.Step 4c: add the case perspective.Step 4d: add other perspectives. 5.Step 5: return the integrated model.
Today, however, many data sources are updated in (near) real-time and sufficient computing power is available to analyze events when they occur. Therefore, process mining should not be restricted to off-line analysis and can also be used for online operational support.
Refined process mining framework:
This can be seen as conformance checking “on-the-fly”.
A recommendation is always given with respect to a specific goal.
There are three challenges when dealing with concept drift
Change point detection. Did the process change? If so, when did it change? Change localization and characterization. What has changed? *Changeprocessdiscovery.Howtocaptureandpredict“second-order”dynamics?
Process discovery, conformance checking, social network analysis, or- ganizational mining, clustering, decision mining, prediction, and recommendation are all supported by ProM plug-ins. However, the usability of the hundreds of avail- able plug-ins varies and the complexity of the tool may be overwhelming for end- users. In recent years, several vendors released dedicated process mining tools (e.g., Celonis, Disco, EDS, Fujitsu, Minit, myInvenio, Perceptive, PPM, QPR, Rialto, and SNP). These tools typically provide less functionality than ProM,
talk about BI.
BI:The broad definition provided by Forrester is “BI is a set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information used to enable more effective strategic, tactical, and operational insights and decision- making”
The typical functionality provided by these products includes:
The mainstream BI products from vendors such as IBM, Oracle, SAP, and Mi- crosoft do not support process mining.
datamining tool :WEka
talk about ProM
It is free
about Big Event data
The goal of process mining is to improve operational processes. In order to judge whether process mining efforts are successful, we need to define Key Performance Indicators (KPIs).
three classes of KPIs: KPIs related to time (e.g., lead time, service time, waiting time, and synchronization time), KPIs re- lated to costs, and KPIs related to quality.
To evaluate suggested improvements, the effectiveness and efficiency of the as-is and to-be processes need to be quantified in terms of KPIs.
For Lasagna processes, process mining can result in one or more of the following improvement actions:
Redesign. Insights obtained using process mining can trigger changes to the pro- cess, e.g., sequential activities no longer need to be executed in a fixed order, checks may be skipped for easy cases, decisions can be delegated if more than 50 cases are queueing, etc. Fraud detected using process mining may result in ad- ditional compliance regulations, e.g., introducing the 4-eyes principle for critical activities. Adjust. Similarly, process mining can result in (temporary) adjustments. For ex- ample, insights obtained using process mining can be used to temporarily allocate more resources to the process and to lower the threshold for delegation. Intervene. Process mining may also reveal problems related to particular cases or resources. This may trigger interventions such as aborting a case that has been queuing for more than 3 months or disciplinary measures for a worker that re- peatedly violated compliance regulations. Support. Process mining can be used for operational support, e.g., based on his- toric information a process mining tool can predict the remaining flow time or recommend the action with the lowest expected costs.
1.Stage 0: Plan and Justify 2. Stage 1: Extract 3. Stage 2: Create Control-Flow Model and Connect Event Log 4. Stage 3: Create Integrated Process Model 5. Stage 4: Operational Support
The L∗ life-cycle model describing a process mining project consisting of five stages: plan and justify (Stage 0), extract (Stage 1), create control-flow model and connect event log (Stage 2), create integrated process model (Stage 3), and operational support (Stage 4)
talk about case of process mining
talk about case
Process mining is an important tool for modern organizations that need to manage non-trivial operational processes. On the one hand, there is an incredible growth of event data. On the other hand, processes and information need to be aligned per- fectly in order to meet requirements related to compliance, efficiency, and customer service. The digital universe and the physical universe are amalgamating into one universe where events are recorded as they happen and processes are guided and controlled based on event data.
Process discovery is probably the most important and most visible intellectual challenge related to process mining. As shown, it is far from trivial to construct a process model based on event logs that are incomplete and noisy.
Another challenge is the notion of concept drift,
conformance checking is not well supported by today’s com- mercial process mining tools.there is a need for better performing conformance checking techniques.
Process mining heavily depends to the ability to extract suitable event logs. The scope and granularity of an event log should match the questions one would like to answer. Unfortunately, in some information systems event data are just a byproduct for debugging or scattered over many tables. Some systems also “forget” events, e.g., when a record is updated, the old values are simply overwritten.
Another challenge is produce process models that have a quality and understand- ability comparable to geographic maps.