% Created 2020-02-13 Thu 18:00 % Intended LaTeX compiler: pdflatex \documentclass[11pt]{article} \usepackage[utf8]{inputenc} \usepackage[T1]{fontenc} \usepackage{graphicx} \usepackage{grffile} \usepackage{longtable} \usepackage{wrapfig} \usepackage{rotating} \usepackage[normalem]{ulem} \usepackage{amsmath} \usepackage{textcomp} \usepackage{amssymb} \usepackage{capt-of} \usepackage{hyperref} \usepackage[utf8]{inputenc} \usepackage{algorithm} \usepackage{algpseudocode} \usepackage{amsmath,amssymb,amsthm} \newtheorem{definition}{Definition} \usepackage{graphicx} \usepackage{listings} \usepackage{color} \author{Francesco Galla'} \date{} \title{An automata-based implementation of a symbolic CTL* model checker} \hypersetup{ pdfauthor={Francesco Galla'}, pdftitle={An automata-based implementation of a symbolic CTL* model checker}, pdfkeywords={}, pdfsubject={}, pdfcreator={Emacs 26.3 (Org mode 9.1.9)}, pdflang={English}} \begin{document} \maketitle \begin{abstract} This dissertation presents the implementation of a symbolic CTL* model checking algorithm based on multi-valued decision diagrams (MDDs). Given a Petri Net model and a CTL* proposition, the algorithm is capable of identifying LTL sub-formulae, translate them to Büchi automata and compute the synchronized product of each LTL formula with the model. MDDs are used to encode the Petri Net and such composition, provably lowering both system memory and time required to manipulate its graph of reachable states with respect to explicit model checking tecniques. By combining the sets of satisfying states for each LTL sub-formula according to the temporal quantifiers preceding them, the algorithm is capable of producing the boolean satisfaction result of the CTL* formula and a counterexample or witness run describing such outcome. Timed test runs executed againts a large set of models and specifications of LTL, CTL and CTL* temporal logics show competitive performance results with respect to other tools which process CTL* by translating each formula to $\mu$-calculus. This algorithm has been implemented as one of the free and open source programs composing the GreatSPN framework for formal verification of systems. \end{abstract} \section{Introduction} \label{sec:org0edf2f0} \subsection{Model Checking} \label{sec:orge90340a} Model checking is a formal verification technique intended to analyze properties of system designs. Given a formal model and a specification, the objective of model checking is to decide whether the behavior of the model satisfies the specification or not. The model is usually represented as a Kripke structure or by a high-level formalism that can be transformed into a Kripke structure. Examples of these formalisms are Petri Nets and Process algebra. The specification is provided using a temporal logic expression, that is, a formula that expresses a temporal and logical statement. Temporal logics are modal logics geared towards the description of the temporal ordering of events. It is important to clarify that these logics do not consider precise timing requirements of activities or events, but reason about their \emph{abstract temporal order}. For this reason, they are particularly useful when applied to concurrent systems in which all components proceed in a lock-step fashion over a discrete time domain. The system behavior is assumed to be observable at integral time points and each time point identifies a snapshot of all variables of the system, called \emph{state}. Two brands of temporal logics have been proposed over the years for specifying the properties of reactive systems. Linear Temporal Logic \cite{Pnueli77} is based on \emph{linear time}, therefore considering every moment in time as having a unique possible future. Computational Tree Logic \cite{ClarkeE81} is defined upon \emph{branching time}: it pictures the structure of time as a tree, allowing each moment in time to split into different possible futures. From now on we will refer to these two logics using their well-known acronyms LTL and CTL respectively. The difference between LTL and CTL is rooted in their satisfaction relations, which are conceptually different. LTL is said to be path-based, since a system \emph{S} satisfies a LTL formula \(\phi\) if for all initial paths of \emph{S}, paths starting in an intial state \(s_{\textit{0}}\) satisfy \(\phi\). Conversely, CTL is said to be state-based, since a system \emph{S} satisfies a CTL formula \(\phi\) if and only if \(\phi\) holds in all initial states of \emph{S}. Furthermore, the expressiveness of LTL and CTL temporal logics is incomparable \cite{Lamport80}. Conversely, CTL is particularly useful to express the \emph{possibility} of existence of a specific path of execution of a model, that is, the occurrence of an event happening on one branch but not necessarily all of them. This concept cannot be expressed using a formalism based on linear time such as LTL, which describes executions of a system, not the way those executions are organized in a branching tree. On the contrary, CTL cannot express situations in which the same behavior may occur on distinct branches at distinct times, while the ability of LTL to describe individual paths is more convenient in this case. In practice, the LTL formula \textbf{FG}\(\phi\) is not expressible in CTL, while the formula \textbf{AFAG}\(\phi\) is not expressible in LTL. Given the shortcomings of both these temporal logics, a superset of LTL and CTL called CTL* \cite{EmersonH86} has been introduced. \\ \subsection{A brief history of LTL model checking} \label{sec:orgd1d295c} Model checking LTL properties comes down to checking language emptyness of the syncronized product between the Kripke structure representing the model and a formalism which can represent a LTL formula while being translatable to a Kripke structure, namely a Büchi automaton. This \emph{automata-theoretic approach} \cite{Vardi95} treats the synchronized product as a transition system \cite{Keller76} whose state graph can be analyzed using tecniques classified in two main categories: \begin{itemize} \item \emph{Explicit} methods process the state graph of the synchronized product using graph traversal algorithms. \item \emph{Symbolic} methods represent the state graph using \emph{decision diagrams} and usually apply fixed point algorithms to the set of states to find the strongly connected components (SCCs) of the transition system. \end{itemize} Explicit methods are based on graph traversal algorithms but are often limited by the complexity of the LTL model checking problem, which is constrained by the size of the state space of the model, the size of the underlying automaton used to represent the formula and the combined size of the two Kripke structures in a transition system. More specifically, the model checking problem for LTL is known to be PSPACE-complete \cite{Sistlac85}. Historically, the first LTL model checkers were explicit: a notable example is SPIN \cite{Holzmann97}, which takes advantage of an optimized version of Tarjan DFS algorithm for finding strongly connected components in a graph, called \emph{Nested Depth First Search} \cite{HolzmannPy96}. In practice, model checkers applied to complex, real-world sistems have to face the \emph{state space explosion} problem: the exponential growth in the number of variables of the state graph dimension. Given that, in general, a system with \emph{n} variables over a domain of \emph{k} possible values requires at least \(n^{\textit{k}}\) states in the reachability set, it is understandable how even a simple model might necessitate a large reachability graph. Furthermore, dealing with real-valued variables, which have infinite possible assignments, results in a reachability graph with infinitely many states. This problem has encouraged the development of various tecniques which have proven to be successful in mitigating the state explosion. Explicit model checkers have introduced \emph{on-the-fly} state-space construction to avoid storing the whole state graph. The most basic \emph{on-the-fly} algorithm \cite{FernandezMJJ92} stores states which have already been visited in memory and is therefore able to check for cycles in the reachability graph while generating it through DFS. SPIN uses \emph{on-the-fly} state graph construction combined with \emph{partial order reduction} \cite{Peled96}, a tecnique which reduces the size of the reachability graph by exploiting commutativity of concurrently executed transitions which result in the same state. \subsection{The origin of symbolic model checking} \label{sec:orgd26054c} Efforts towards state space minimization have been particularly successful in developing clever representation of the state graph. \emph{Symbolic} model checking represent the system model and the LTL formula using set of states and set of transitions. These sets can be represented as solutions to logical equations, using decision diagrams to represent this state space implicitly. Since syntactically small equations can represent large set of states, this tecnique ultimately avoids building the state graph explicitly, thus saving space in memory. Above all, Ordered Binary Decision Diagrams (OBDDs) provide a canonical form for boolean formulae which can be substantially more compact than conjunctive or disjuntive normal form and efficient algorithms have been implemented for manipulating them. McMillan was the first to introduce the use of OBDDs to represent the state space of a model, developing a CTL model checking tool called SMV \cite{SymbMC}. Symbolic model checking is considered to be one of the biggest breakthrough in the history of model checking for its impact on the state explosion problem \cite{Clarke08}. Symbolic model checking was initially applied to CTL because of the significantly lower complexity of the model checking problem with respect to LTL: CTL model checking is known to be P-Complete and its time complexity is bilinear in the size of the model and of the formula \cite{ClarkeES86}. After the introduction of SMV, further research work increased the capabilities of decision diagrams. By introducing Multi-valued decision diagrams (MDDs) , tools were able to represent integral and real-valued functions, thus enhancing the applications of symbolic strategies to formal verification. In recent years, the LTSmin tool \cite{KantLMPBD15} developed by the University of Twente employs the SYLVAN multi-core MDDs library \cite{DijkP17} to speed up symbolic analysis algorithms for CTL model checking. A different solution was adopted by an extended version of SMV, NuSMV \cite{CimattiCGGPRST02}, which mantained the specification language of McMillan's tool while improving it by introducing LTL model checking and Sat-based Bounded model checking \cite{BiereCCZ99}, which exploits propositional satisfiability without using BDDs to represent the state graph. This approach was chosen because working with decision diagrams does not always guarantee an improvement over explicit model checking tecniques due to the often time-consuming procedure of selecting a variable ordering for all variables in the system, which is a known NP-complete problem \cite{BolligW96}. Until the present day, our GreatSPN framework used multi-valued, multi-terminal decision diagrams (MTMDDs) provided by the Meddly library \cite{BabarM10} to perform CTL model checking on Petri Nets \cite{BabarBDM10}. Meddly is possibly the only open-source library to implement MTMDDs, Edge-valued MDDs (EVMDDs) while providing state-of-the-art algorithms to manipulate them. To determine the optimal variable ordering for the MDDs used to represet the state graph, GreatSPN uses a set of algorithms based on different heuristics which are run during the state space generation procedure \cite{AmparoreDBGM17}. As we are going to discuss later in this dissertation, GreatSPN is now capable of CTL* model checking by reducing symbolic LTL model checking to CTL model checking with fairness constraints, as demonstrated in a notorious article \cite{ClarkeGH97} by Clarke et al. \section{Background} \label{sec:org194bb81} \subsection{Linear Temporal Logic} \label{sec:org93349b6} LTL is a propositional temporal logic with linear time model, meaning that it considers a single realized future behavior of a system, that is, a single path in a Kripke structure. \begin{definition} \emph{Syntax of LTL}. The formal syntax of LTL is given by the following grammar in Backus-Naur form (BNF), where \emph{a} \(\in\) \emph{AP} is an atomic proposition. \begin{center} \emph{\(\phi\) ::= true | a | \textlnot{}\(\phi\) | \(\phi\) \(\wedge\) \(\phi\) | \textbf{X}\(\phi\) | \(\phi\) \textbf{U} \(\phi\)} \end{center} \end{definition} Using boolean connectors such as \textlnot{} and \(\wedge\) allows LTL to be treated as a propositional logic. Other boolean connectives such as disjunction \(\vee\), implication \(\rightarrow\), equivalence \(\leftrightarrow\) and the exclusive or (xor) operator \(\oplus\) can be derived as for any other propositional logic. This LTL grammar is defined using two basic temporal modalities: \textbf{X} (next) and \textbf{U} (until). Using those, we can derive two essential temporal modalities \textbf{F} (eventually) and \textbf{G} (always), as follows: \begin{center} \textbf{F}\(\phi\) = \emph{true}\textbf{U}\(\phi\) \\ \textbf{G}\(\phi\) = \textlnot{}\textbf{F}\textlnot{}\(\phi\) \end{center} We now present a list of the temporal modalities which will be used at a later time in this dissertation. \begin{itemize} \item \textbf{G}\(\phi\): "always" (now and forever in the future \(\phi\) is true) \item \textbf{F}\(\phi\): "eventually" (eventually in the future \(\phi\) is true) \item \textbf{X}\(\phi\): "next" (in the next time step \(\phi\) is true) \item \(\psi\)\textbf{U}\(\phi\): "until" (\(\psi\) is true until \(\phi\) is true) \end{itemize} By combining the aforementioned temporal operators we obtain other, more complex modalities. A typical example is the specification which requires a property to be true \emph{infinitely often}, \textbf{GF}\(\phi\). \paragraph{LTL Semantics} LTL semantics is defined for \emph{infinite words} \(\sigma\) over the alphabet 2\textsuperscript{AP}. The satisfiability rules are shown below: \begin{itemize} \item \(\sigma \vDash true\) \item \(\sigma \vDash a\) iff \(a \in A_{0}\) \item \(\sigma \vDash \phi_{1} \wedge \phi_{2}\) iff \(\sigma \vDash \phi_{1}\) and \(\sigma \vDash \phi_{2}\) \item \(\sigma \vDash \neg\phi\) iff \(\neg (\sigma \vDash \phi)\) \item \(\sigma \vDash \textbf{X}\phi\) iff \(\sigma[1...] \vDash \phi\) \item \(\sigma \vDash \phi_{1}\textbf{U}\phi_{2}\) iff \(\exists j \geq 0, \sigma[j...] \vDash \phi_{2}\) and \(\sigma[i...] \vDash \phi_{1}\) for all \(0 \leq i < j\) \end{itemize} A LTL formula \(\phi\) is said to be \emph{valid} with regard to a Kripke structure \emph{M} if it holds for all paths of \emph{M}. It is \emph{satisfiable} if it holds for some path in \emph{M}. \subsection{CTL*} \label{sec:orgad39744} We briefly introduced CTL* while dealing with the shortcomings of LTL and CTL. CTL* is a branching temporal logic which extends CTL following a proposal by Emerson and Halpern. Being based on the concept of branching time, CTL* is able to represent the possibility of existence of a determinate behavior in a tree of execution, using CTL path quantifiers \textbf{E} (for some path) and \textbf{A} (for all paths) to specify whether the required behavior must be verified for some execution of our system or all possible ones. CTL* allows path quantifiers to be arbitrarely nested with linear temporal operators \textbf{G}, \textbf{F}, \textbf{X} and \textbf{U}. In contrast, CTL only supports linear temporal operators if they are immediately preceded by a path quantifier. In a similar fashion as CTL, the syntax of CTL* distinguishes between \emph{state} and \emph{path} formulae. CTL* path formulae are defined as LTL formulae, whith the only difference that here state formulae can be used as atoms. \begin{definition} \emph{Syntax of CTL*}. The formal syntax of \emph{CTL*} is made of state formulae \(\Phi\) and path formulae \(\phi\). The syntax of \emph{CTL*} state formulae \(\Phi\) is defined over the set \emph{AP} of atomic propositions: \begin{center} \emph{\(\Phi\) ::= true | a | \textlnot{}\(\phi\) | \(\phi\) \(\wedge\) \(\phi\) | \textbf{E}\(\phi\)} \end{center} The syntax of CTL* path formulae \(\phi\) is given by the following grammar, where \(\Phi\) is a state formula: \begin{center} \emph{\(\phi\) ::= \(\Phi\) | \textlnot{}\(\phi\) | \(\phi\) \(\wedge\) \(\phi\) | \textbf{X}\(\phi\) | \(\phi\) \textbf{U} \(\phi\)} \end{center} \end{definition} As previously seen for LTL, the syntax of CTL* can be treated as any propositional logic, therefore all boolean connectives can be derived from \textlnot{} and \(\wedge\). The missing temporal modalities \textbf{F} and \textbf{G} descend from \textbf{X} and \textbf{U} as it was the case for LTL, while the path quantifier \textbf{A} can be obtained from the following equivalence: \begin{center} \textbf{A}\(\phi\) = \textlnot{}\textbf{E}\textlnot{}\(\phi\) \end{center} \paragraph{CTL* Semantics} Let a \(\in\) \emph{AP} be an atomic proposition, \(TS = (S, Act, \rightarrow, I, AP, L)\) be a transition system without terminal states, state \emph{s} \(\in\) \emph{S}, \(\Phi\) and \(\Psi\) be CTL* state formulae and \(\phi\), \(\phi\)\textsubscript{1} and \(\phi\)\textsubscript{2} be CTL* path formulae. \paragraph{State formulae: semantics} \begin{itemize} \item \(s \vDash a\) iff \(a \in \textit{L(s)}\) \item \(s \vDash \Phi \wedge \Psi\) iff \(s \vDash \Phi\) and \(s \vDash \Psi\) \item \(s \vDash \neg\Phi\) iff \(\neg (s \vDash \Phi)\) \item \(s \vDash \textbf{E}\phi\) iff \(\sigma \vDash \phi\) for some \(\sigma \in \textit{Paths(s)}\) \end{itemize} \paragraph{Path formulae: semantics} \begin{itemize} \item \(\sigma \vDash \Phi\) iff \(s_{0} \vDash \Phi\) \item \(\sigma \vDash \phi_{1} \wedge \phi_{2}\) iff \(\sigma \vDash \phi_{1}\) and \(\sigma \vDash \phi_{2}\) \item \(\sigma \vDash \neg\phi\) iff \(\neg (\sigma \vDash \phi)\) \item \(\sigma \vDash \textbf{X}\phi\) iff \(\sigma[1...] \vDash \phi\) \item \(\sigma \vDash \phi_{1}\textbf{U}\phi_{2}\) iff \(\exists j \geq 0, \sigma[j...] \vDash \phi_{2}\) and \(\sigma[i...] \vDash \phi_{1}\) for all \(0 \leq i < j\) \end{itemize} \subsection{Decision Diagrams} \label{sec:orgb1ebae6} Decision diagrams are directed, acyclic graphs used to represent functions over variables with finitely many possible assignments. They were originally studied by Bryant \cite{Bryant86} as a representation of boolean functions in the form of Binary Decision Diagrams (BDDs). We focus on a generalization of BDDs called Multi-value Decision Diagrams (MDDs) \cite{MDD}, whose variable domain can be arbitrarely large and which can be used to represent functions of integral and real-valued variables. \begin{definition} \emph{Multi-valued Decision Diagram}. A multi-valued desicion diagram, or MDD, is an acyclic graph in which each node represents a function over variables with finitely many possible assignments, of the form: \begin{center} \(f : S_{K} \times ... \times S_{1} \rightarrow \{ 0, \ldots, m - 1 \}\) \end{center} Each one of the sets \(S_{k}\) is considered to be finite on an arbitrarely large domain. We can write: \begin{center} \(S_{k} = \{ 0, 1, \ldots, n_{k} - 1 \}\) \end{center} \end{definition} An MDD is composed of two types of nodes: \emph{terminal} nodes and \emph{non-terminal} nodes. \begin{itemize} \item Terminal nodes are labeled with values from the set \(\{0,1,\ldots,m-1\}\), representing the constant function: \end{itemize} \begin{center} \(g (x_{K},\ldots,x_{1}) = a\) \end{center} \begin{itemize} \item Non-terminal nodes are labeled with one of the function variables \(x_{k}\) and contain \(n_{k}\) arcs to other nodes. \end{itemize} A non terminal node labeled with \(x\textsubscript{k}\) has an outgoing arc corresponding to value \(v\) which goes to a node representing the function: \begin{center} \(f_{x_{k}=v}(x_{K},\ldots,x_{1}) \equiv f(x_{K},\ldots,x_{k+1},v,x_{k-1},\ldots,x_{1})\) \end{center} \paragraph{Reduced ordered MDDs} In the contest of model checking, MDDs are useful to encode the state space of a model as a boolean, integral or real-valued function. To fit this purpose, MDDs must be in \emph{canonical} form, meaning that for any given function and variable ordering, there must be exactly \emph{one} representation of that function as MDD. For this to be true, we have to impose two constraints on MDDs: that they are \emph{ordered} and \emph{reduced} (ROMDDs). \begin{definition} \emph{Ordered MDDs}. An MDD is ordered if all paths through the MDD visit non-terminal nodes accortding to the same variable ordering. \end{definition} \begin{definition} \emph{Reduced Ordered MDDs}. A reduced, ordered MDD is an ordered MDD that contains no duplicate nodes and no redundant nodes. \begin{itemize} \item Two nodes are \textbf{redundant} if all of its outgoing arcs point to the same node. \item Two terminal nodes are \textbf{duplicates} if they have the same label, while two non-terminal nodes are duplicates if they have the same variable label and the same outgoing arcs for each value. \end{itemize} \end{definition} \subsection{Automata over infinite words} \label{sec:orgeaf1a4d} We've described how Linear Temporal Logic provides a language to describe the temporal order of a series of events. When applied to formal verification, these sequences of events can be interpreted as computations of the program. A computation is a potentially infinite sequence of program states: each state is described by a finite set of atomic propositions, which will be referred to as a nonempty \emph{alphabet}. Therefore a computation can be treated as an infinite word over the alphabet of truth assignments to the atomic propositions of a given alphabet. This reasoning suggests that a LTL specification can be thought of as a description of a \emph{language} over some alphabet. This language is made of infinite \emph{words}, which represent program computations. We can exploit this equivalence of computations and words to connect linear temporal logic to automata theory applied on infinite words. As we are going to show, automata over infinite words depict a suitable formalism to represent LTL specifications. More precisely, given any propositional temporal formula, one can construct a \emph{finite automaton} over infinite words that accepts precisely the computations satisfied by the formula \cite{VardiW94}. This chapter will present an introduction to automata theory and describe how LTL model checking employs a particular class of automata over infinite words, called \emph{Büchi automata}. \bibliographystyle{/usr/share/texmf-dist/bibtex/bst/base/acm} \bibliography{tesi} \end{document}