UniTO/tesi/galla/tesi.org

#+TITLE:    An automata-based implementation of a symbolic CTL* model checker
#+AUTHOR:   Francesco Galla'
#+EMAIL:    galla@di.unito.it
#+DATE:

#+LANGUAGE: en

#+LaTeX_CLASS: article
#+LaTeX_HEADER: \usepackage[utf8]{inputenc}
#+LaTeX_HEADER: \usepackage{algorithm}
#+LaTeX_HEADER: \usepackage{algpseudocode}
#+LaTeX_HEADER: \usepackage{amsmath,amssymb,amsthm}
#+Latex_HEADER: \newtheorem{definition}{Definition}
#+LaTeX_HEADER: \usepackage{graphicx}
#+LaTeX_HEADER: \usepackage{listings}
#+LaTeX_HEADER: \usepackage{color}

#+EXPORT_SELECT_TAGS: export
#+EXPORT_EXCLUDE_TAGS: noexport
#+OPTIONS: H:2 toc:nil \n:nil @:t ::t |:t ^:{} _:{} *:t TeX:t LaTeX:t
#+STARTUP: showall

\begin{abstract}

This dissertation presents the implementation of a symbolic CTL* model
checking algorithm based on multi-valued decision diagrams (MDDs).
Given a Petri Net model and a CTL* proposition, the algorithm is capable
of identifying LTL sub-formulae, translate them to Büchi automata and
compute the synchronized product of each LTL formula with the model.
MDDs are used to encode the Petri Net and such composition, provably
lowering both system memory and time required to manipulate its graph
of reachable states with respect to explicit model checking tecniques.
By combining the sets of satisfying states for each LTL sub-formula
according to the temporal quantifiers preceding them, the algorithm is
capable of producing the boolean satisfaction result of the CTL* formula
and a counterexample or witness run describing such outcome.
Timed test runs executed againts a large set of models and specifications of
LTL, CTL and CTL* temporal logics show competitive performance results
with respect to other tools which process CTL* by translating each
formula to $\mu$-calculus.  This algorithm has been implemented as one
of the free and open source programs composing the GreatSPN framework
for formal verification of systems.

\end{abstract}

# Model checking is a formal verification technique intended to analyze
# properties of system designs. Model checking is based upon a formal
# model describing a system and a specification provided using a
# temporal logic formula such as LTL, CTL or CTL*.
# The algorithmic analysis of the Kripke structure representing the
# composition of the model and the specification requires the
# manipulation of the graph of its reachable states, which can be
# arbitrarely large in size depending on model characteristics and
# formula provided. Frequently, explicit enumeration of reachable states
# of a complex model requires a large amount of memory and time.
# Tecniques based on symbolic representation of such reachability graphs
# exploit the abstractions provided by decision diagrams, provably
# allowing for a sensible reduction in time and memory consumption while
# ensuring equivalent results with respect to explicit tecniques.  CTL*
# is a temporal logic defined as a superset of Linear Temporal Logic
# (LTL) and Computational Tree Logic (CTL). CTL* is motivated by the
# incomparable expressiveness of LTL and CTL, which historically caused
# a proliferation of model checking tools geared towards the former or
# the latter logic, each one presenting its own limitations. The
# theoretical literature formalizing symbolic CTL* model checking
# algorithms describes two different approaches. One exploits the
# translation of CTL* formulae to $\mu$-calculus and is currently used
# by all model checking tools we were able to find and inspect.  The
# other identifies LTL sub-formulae contained into a given CTL*
# proposition, obtains the sets of states satisfying such formulae using
# a LTL model checkers and combines them according to the temporal
# quantifier preceding each sub-formula to obtain a satisfaction result
# and a counter-example or witness run on the given model.  This
# dissertation describes the implementation of a symbolic CTL* model
# checker which uses such LTL-based tecnique at its core. Competitive
# performance results were obtained by testing this algorithm against
# other research tools which use the $\mu$-calculus tecnique. This
# algorithm has been implemented as one of the programs composing the
# GreatSPN framework for formal verification of systems based on Petri Nets.


* Introduction

** Model Checking

Model checking is a formal verification technique intended to analyze properties
of system designs. Given a formal model and a specification, the objective of
model checking is to decide whether the behavior of the model satisfies the
specification or not. The model is usually represented as a Kripke structure or
by a high-level formalism that can be transformed into a Kripke structure.
Examples of these formalisms are Petri Nets and Process algebra.

The specification is provided using a temporal logic expression, that is, a
formula that expresses a temporal and logical statement. Temporal logics are
modal logics geared towards the description of the temporal ordering of events.
It is important to clarify that these logics do not consider precise timing
requirements of activities or events, but reason about their
/abstract temporal order/. For this reason, they are particularly useful when
applied to concurrent systems in which all components proceed in a
lock-step fashion over a discrete time domain. The system behavior is assumed to
be observable at integral time points and each time point identifies a snapshot
of all variables of the system, called /state/.

Two brands of temporal logics have been proposed over the years for specifying
the properties of reactive systems. Linear Temporal Logic \cite{Pnueli77} is
based on /linear time/, therefore considering every moment in time as having a
unique possible future. Computational Tree Logic \cite{ClarkeE81} is defined
upon /branching time/: it pictures the structure of time as a tree,
allowing each moment in time to split into different possible futures.
From now on we will refer to these two logics using their well-known acronyms
LTL and CTL respectively.

# IMAGE: ltl vs ctl

The difference between LTL and CTL is rooted in their satisfaction relations,
which are conceptually different. LTL is said to be path-based, since a
system /S/ satisfies a LTL formula \phi if for all initial paths of /S/,
paths starting in an intial state $s_{\textit{0}}$ satisfy \phi.
Conversely, CTL is said to be state-based, since a system /S/ satisfies
a CTL formula \phi if and only if \phi holds in all initial states of /S/.

Furthermore, the expressiveness of LTL and CTL temporal logics is incomparable
\cite{Lamport80}. Conversely, CTL is particularly useful to express the
/possibility/ of existence of a specific path of execution of a model, that
is, the occurrence of an event happening on one branch but not necessarily all
of them. This concept cannot be expressed using a formalism based on linear time
such as LTL, which describes executions of a system, not the way those
executions are organized in a branching tree. On the contrary, CTL cannot
express situations in which the same behavior may occur on distinct branches at
distinct times, while the ability of LTL to describe individual paths is
more convenient in this case. In practice, the LTL formula *FG*\phi is not
expressible in CTL, while the formula *AFAG*\phi is not expressible in LTL.
Given the shortcomings of both these temporal logics, a superset of LTL and CTL
called CTL* \cite{EmersonH86} has been introduced. \\

** A brief history of LTL model checking

Model checking LTL properties comes down to checking language emptyness of the
syncronized product between the Kripke structure representing the model and a
formalism which can represent a LTL formula while being translatable to a
Kripke structure, namely a Büchi automaton. This /automata-theoretic approach/
\cite{Vardi95} treats the synchronized product as a transition system
\cite{Keller76} whose state graph can be analyzed using tecniques
classified in two main categories:

- /Explicit/ methods process the state graph of the synchronized product using
  graph traversal algorithms.
- /Symbolic/ methods represent the state graph using /decision diagrams/ and
  usually apply fixed point algorithms to the set of states to find the strongly
  connected components (SCCs) of the transition system.

Explicit methods are based on graph traversal algorithms but are often limited
by the complexity of the LTL model checking problem, which is constrained by the
size of the state space of the model, the size of the underlying automaton used
to represent the formula and the combined size of the two Kripke structures in
a transition system. More specifically, the model checking problem for LTL is
known to be PSPACE-complete \cite{Sistlac85}.
Historically, the first LTL model checkers were explicit: a notable example is
SPIN \cite{Holzmann97}, which takes advantage of an optimized version of Tarjan
DFS algorithm for finding strongly connected components in a graph,
called /Nested Depth First Search/ \cite{HolzmannPy96}.

In practice, model checkers applied to complex, real-world sistems have to
face the /state space explosion/ problem: the exponential growth in
the number of variables of the state graph dimension. Given that, in general,
a system with /n/ variables over a domain of /k/ possible values requires
at least $n^{\textit{k}}$ states in the reachability set, it is
understandable how even a simple model might necessitate a large reachability
graph. Furthermore, dealing with real-valued variables, which have infinite
possible assignments, results in a reachability graph with infinitely many
states.

This problem has encouraged the development of various tecniques which have
proven to be successful in mitigating the state explosion. Explicit model
checkers have introduced /on-the-fly/ state-space construction
to avoid storing the whole state graph. The most basic /on-the-fly/ algorithm
\cite{FernandezMJJ92} stores states which have already been visited in memory
and is therefore able to check for cycles in the reachability graph while
generating it through DFS. SPIN uses /on-the-fly/ state graph construction
combined with /partial order reduction/ \cite{Peled96}, a tecnique which reduces
the size of the reachability graph by exploiting commutativity of concurrently
executed transitions which result in the same state.

** The origin of symbolic model checking

Efforts towards state space minimization have been particularly successful in
developing clever representation of the state graph. /Symbolic/ model checking
represent the system model and the LTL formula using set of states and set
of transitions. These sets can be represented as solutions to
logical equations, using decision diagrams to represent this
state space implicitly.
Since syntactically small equations can represent large set of states,
this tecnique ultimately avoids building the state graph explicitly,
thus saving space in memory. Above all, Ordered Binary Decision Diagrams (OBDDs)
provide a canonical form for boolean formulae which can be substantially more
compact than conjunctive or disjuntive normal form and efficient algorithms have
been implemented for manipulating them. McMillan was the first to introduce
the use of OBDDs to represent the state space of a model, developing a CTL
model checking tool called SMV \cite{SymbMC}.
Symbolic model checking is considered to be one of the biggest breakthrough in
the history of model checking for its impact on the state explosion problem
\cite{Clarke08}.

Symbolic model checking was initially applied to CTL because of the
significantly lower complexity of the model checking problem with
respect to LTL: CTL model checking is known to be P-Complete and its
time complexity is bilinear in the size of the model and of the
formula \cite{ClarkeES86}. After the introduction of SMV, further research work
increased the capabilities of decision diagrams.  By introducing
Multi-valued decision diagrams (MDDs) , tools were able to represent
integral and real-valued functions, thus enhancing the applications of
symbolic strategies to formal verification.  In recent years, the
LTSmin tool \cite{KantLMPBD15} developed by the University of Twente
employs the SYLVAN multi-core MDDs library \cite{DijkP17} to speed up
symbolic analysis algorithms for CTL model checking.  A different
solution was adopted by an extended version of SMV, NuSMV
\cite{CimattiCGGPRST02}, which mantained the specification language of
McMillan's tool while improving it by introducing LTL model checking
and
Sat-based Bounded model checking \cite{BiereCCZ99}, which exploits propositional
satisfiability without using BDDs to represent the state graph.
This approach was chosen because working with decision diagrams does not
always guarantee an improvement over explicit model checking
tecniques due to the often time-consuming procedure of selecting a variable
ordering for all variables in the system, which is a known NP-complete problem
\cite{BolligW96}.

Until the present day, our GreatSPN framework used multi-valued, multi-terminal
decision diagrams (MTMDDs) provided by the Meddly library \cite{BabarM10} to
perform CTL model checking on Petri Nets \cite{BabarBDM10}. Meddly is possibly
the only open-source library to implement MTMDDs, Edge-valued MDDs (EVMDDs)
while providing state-of-the-art algorithms to manipulate them.
To determine the optimal variable ordering for the MDDs used to represet the
state graph, GreatSPN uses a set of algorithms based on different heuristics
which are run during the state space generation procedure \cite{AmparoreDBGM17}.
As we are going to discuss later in this dissertation, GreatSPN is now capable
of CTL* model checking by reducing symbolic LTL model checking to CTL model
checking with fairness constraints, as demonstrated in a notorious article
\cite{ClarkeGH97} by Clarke et al.

* Background

** Linear Temporal Logic

LTL is a propositional temporal logic with linear time model, meaning that it
considers a single realized future behavior of a system, that is, a single
path in a Kripke structure.

#+ATTR_LATEX: :environment definition
#+BEGIN_definition
/Syntax of LTL/. The formal syntax of LTL is given by the following grammar in
Backus-Naur form (BNF), where /a/ \in /AP/ is an atomic proposition.

#+BEGIN_center
/\phi ::= true | a | \not\phi | \phi \wedge \phi | *X*\phi | \phi *U* \phi/
#+END_center

#+END_definition

Using boolean connectors such as \not and \wedge allows LTL to be treated as
a propositional logic. Other boolean connectives such as disjunction \vee,
implication \rightarrow, equivalence \leftrightarrow and the exclusive or (xor)
operator \oplus can be derived as for any other propositional logic.
This LTL grammar is defined using two basic temporal modalities: *X* (next) and
*U* (until). Using those, we can derive two essential temporal modalities *F*
(eventually) and *G* (always), as follows:

#+BEGIN_center
*F*\phi = /true/\textbf{U}\phi \\
*G*\phi = \not\textbf{F}\not\phi
#+END_CENTER

We now present a list of the temporal modalities which will be used at a later
time in this dissertation.

- *G*\phi: "always" (now and forever in the future \phi is true)
- *F*\phi: "eventually" (eventually in the future \phi is true)
- *X*\phi: "next" (in the next time step \phi is true)
- \psi\textbf{U}\phi: "until" (\psi is true until \phi is true)

By combining the aforementioned temporal operators we obtain other, more complex
modalities. A typical example is the specification which requires a property to
be true /infinitely often/, *GF*\phi.


\paragraph{LTL Semantics}
LTL semantics is defined for /infinite words/ \sigma over the alphabet
2\textsuperscript{AP}. The satisfiability rules are shown below:

- \(\sigma \vDash true\)
- \(\sigma \vDash a\) iff \(a \in A_{0}\)
- \(\sigma \vDash \phi_{1} \wedge \phi_{2}\) iff \(\sigma \vDash \phi_{1}\) and \(\sigma \vDash \phi_{2}\)
- \(\sigma \vDash \neg\phi\) iff \(\neg (\sigma \vDash \phi)\)
- \(\sigma \vDash \textbf{X}\phi\) iff \(\sigma[1...] \vDash \phi\)
- \(\sigma \vDash \phi_{1}\textbf{U}\phi_{2}\) iff \(\exists j \geq 0, \sigma[j...] \vDash \phi_{2}\) and \(\sigma[i...] \vDash \phi_{1}\) for all \(0 \leq i < j\)

A LTL formula \phi is said to be /valid/ with regard
to a Kripke structure /M/ if it holds for all paths of /M/. It is /satisfiable/
if it holds for some path in /M/.

** CTL*

We briefly introduced CTL* while dealing with the shortcomings of LTL and CTL.
CTL* is a branching temporal logic which extends CTL following a proposal by
Emerson and Halpern. Being based on the concept of branching time, CTL* is able
to represent the possibility of existence of a determinate behavior in a tree of
execution, using CTL path quantifiers *E* (for some path) and *A* (for all
paths) to specify whether the required behavior must be verified for some
execution of our system or all possible ones.

CTL* allows path quantifiers to be arbitrarely nested with linear temporal
operators *G*, *F*, *X* and *U*. In contrast, CTL only supports linear
temporal operators if they are immediately preceded by a path quantifier.
In a similar fashion as CTL, the syntax of CTL* distinguishes between
/state/ and /path/ formulae. CTL* path formulae are defined as LTL formulae,
whith the only difference that here state formulae can be used as atoms.

#+BEGIN_definition
/Syntax of CTL*/. The formal syntax of /CTL*/ is made of state formulae \Phi and
path formulae \phi. The syntax of /CTL*/ state formulae \Phi is defined over the
set /AP/ of atomic propositions:

#+BEGIN_center
/\Phi ::= true | a | \not\phi | \phi \wedge \phi | *E*\phi/
#+END_center

The syntax of CTL* path formulae \phi is given by the following grammar, where
\Phi is a state formula:

#+BEGIN_center
/\phi ::= \Phi | \not\phi | \phi \wedge \phi | *X*\phi | \phi *U* \phi/
#+END_center

#+END_definition

As previously seen for LTL, the syntax of CTL* can be treated as any
propositional logic, therefore all boolean connectives can be derived from
\not and \wedge. The missing temporal modalities *F* and *G* descend from
*X* and *U* as it was the case for LTL, while the path quantifier *A* can be
obtained from the following equivalence:

#+BEGIN_center
*A*\phi = \not\textbf{E}\not\phi
#+END_CENTER

\paragraph{CTL* Semantics}
Let a \in /AP/ be an atomic proposition, $TS = (S, Act, \rightarrow, I, AP, L)$
be a transition system without terminal states, state /s/ \in /S/, \Phi and \Psi
be CTL* state formulae and \phi, \phi\textsubscript{1} and \phi\textsubscript{2}
be CTL* path formulae.

\paragraph{State formulae: semantics}

- \(s \vDash a\) iff \(a \in \textit{L(s)}\)
- \(s \vDash \Phi \wedge \Psi\) iff \(s \vDash \Phi\) and \(s \vDash \Psi\)
- \(s \vDash \neg\Phi\) iff \(\neg (s \vDash \Phi)\)
- \(s \vDash \textbf{E}\phi\) iff \(\sigma \vDash \phi\) for some \(\sigma \in \textit{Paths(s)}\)

\paragraph{Path formulae: semantics}

- \(\sigma \vDash \Phi\) iff \(s_{0} \vDash \Phi\)
- \(\sigma \vDash \phi_{1} \wedge \phi_{2}\) iff \(\sigma \vDash \phi_{1}\) and \(\sigma \vDash \phi_{2}\)
- \(\sigma \vDash \neg\phi\) iff \(\neg (\sigma \vDash \phi)\)
- \(\sigma \vDash \textbf{X}\phi\) iff \(\sigma[1...] \vDash \phi\)
- \(\sigma \vDash \phi_{1}\textbf{U}\phi_{2}\) iff \(\exists j \geq 0, \sigma[j...] \vDash \phi_{2}\) and \(\sigma[i...] \vDash \phi_{1}\) for all \(0 \leq i < j\)

** Decision Diagrams

Decision diagrams are directed, acyclic graphs used to represent functions over
variables with finitely many possible assignments. They were originally
studied by Bryant \cite{Bryant86} as a representation of boolean functions
in the form of Binary Decision Diagrams (BDDs).
We focus on a generalization of BDDs called Multi-value Decision Diagrams (MDDs)
\cite{MDD}, whose variable domain can be arbitrarely large and which can be
used to represent functions of integral and real-valued variables.

#+BEGIN_definition
/Multi-valued Decision Diagram/. A multi-valued desicion diagram, or MDD, is an
acyclic graph in which each node represents a function over variables with
finitely many possible assignments, of the form:

#+BEGIN_center
$f : S_{K} \times ... \times S_{1} \rightarrow \{ 0, \ldots, m - 1 \}$
#+END_center

Each one of the sets $S_{k}$ is considered to be finite on an
arbitrarely large domain. We can write:

#+BEGIN_center
$S_{k} = \{ 0, 1, \ldots, n_{k} - 1 \}$
#+END_center

#+END_definition

An MDD is composed of two types of nodes: /terminal/ nodes and /non-terminal/
nodes.

- Terminal nodes are labeled with values from the set $\{0,1,\ldots,m-1\}$,
  representing the constant function:

#+BEGIN_center
$g (x_{K},\ldots,x_{1}) = a$
#+END_center

- Non-terminal nodes are labeled with one of the function variables
  $x_{k}$ and contain $n_{k}$ arcs to other nodes.

A non terminal node labeled with $x\textsubscript{k}$ has an outgoing arc
corresponding to value $v$ which goes to a node representing the function:

#+BEGIN_center
$f_{x_{k}=v}(x_{K},\ldots,x_{1}) \equiv f(x_{K},\ldots,x_{k+1},v,x_{k-1},\ldots,x_{1})$
#+END_center

\paragraph{Reduced ordered MDDs}
In the contest of model checking, MDDs are useful to encode the state space
of a model as a boolean, integral or real-valued function.
To fit this purpose, MDDs must be in /canonical/ form, meaning that
for any given function and variable ordering, there must be exactly /one/
representation of that function as MDD. For this to be true, we have to impose
two constraints on MDDs: that they are /ordered/ and /reduced/ (ROMDDs).

#+BEGIN_definition
/Ordered MDDs/. An MDD is ordered if all paths through the MDD visit
non-terminal nodes accortding to the same variable ordering.
#+END_definition

#+BEGIN_definition
/Reduced Ordered MDDs/. A reduced, ordered MDD is an ordered MDD that contains
no duplicate nodes and no redundant nodes.

- Two nodes are *redundant* if all of its outgoing arcs point to the same node.
- Two terminal nodes are *duplicates* if they have the same label, while two
  non-terminal nodes are duplicates if they have the same variable label and
  the same outgoing arcs for each value.
#+END_definition

# ESEMPI MDD, GraphViz plot di RS (modello semplice, pochi stati)

** Automata over infinite words

We've described how Linear Temporal Logic provides a language to describe the
temporal order of a series of events. When applied to formal verification,
these sequences of events can be interpreted as computations of the program.
A computation is a potentially infinite sequence of program states: each state
is described by a finite set of atomic propositions, which will be referred to
as a nonempty /alphabet/. Therefore a computation can be treated as an
infinite word over the alphabet of truth assignments to the atomic propositions
of a given alphabet.

This reasoning suggests that a LTL specification can be thought of as a
description of a /language/ over some alphabet. This language is made of
infinite /words/, which represent program computations. We can exploit this
equivalence of computations and words to connect linear temporal logic to
automata theory applied on infinite words. As we are going to show, automata
over infinite words depict a suitable formalism to represent LTL specifications.
More precisely, given any propositional temporal formula, one can construct a
/finite automaton/ over infinite words that accepts precisely the computations
satisfied by the formula \cite{VardiW94}. This chapter will present an
introduction to automata theory and describe how LTL model checking employs a
particular class of automata over infinite words, called /Büchi automata/.

# CTL* model checking

# * Emerson-Lei & alternatives
\bibliographystyle{/usr/share/texmf-dist/bibtex/bst/base/acm}
\bibliography{tesi}