A Scala library for language processing.
Documentation HomeUp: User Manual, Prev: Relations, Next: Pretty-printing
IMPORTANT NOTE: This page describes Kiama 1.x. Much of it also applies to Kiama 2.x, but not all. Please consult the 2.x release notes for the main differences. We are currently writing comprehensive documentation for 2.x that will eventually replace these pages.
This page provides an overview of Kiama’s support for abstract state machines. Abstract state machines are used in the following examples.
ISWIM RISC
The dynamic semantics of programming languages are often defined in terms of formal machines. These machines are defined by instruction sets that specify the operations that the machine can perform, a machine state that describes the information that is maintained as the machine executes, and evaluation rules that specify how each type of instruction affects the machine state.
For example, languages such as Java and C# are executed on abstract machines implemented by the Java Virtual Machine or the .Net runtime. Functional programming languages are often defined by machines that perform stepwise reduction of function applications or more complex processes such as graph reduction for lazy evaluation.
Kiama’s abstract state machine library provides a way to specify these kinds of machines in a high-level way. Abstract state machines were originally developed to provide a formal notation for generalisations of finite state machines. The idea is that instead of the state of the machine being a single discrete value, it can comprise multiple values of arbitrary structured data.
Kiama’s abstract state machine library allows you to implement these kinds of machines in a high-level way. Abstract state machines were originally developed to provide a theory for generalisations of finite-state machines. In a finite-state machine the states are single discrete atomic values. In abstract state machines the states comprise multiple state items, each of which can be a structured value.
Abstract state machines have been used to formally specify many dynamic processes and their refinement to implementations. In Kiama we are primarily interested in using them to implement abstract machines in a way that closely matches their formal specification, particularly for language prototyping.
Kiama provides a form of abstract state machine that is quite similar to those seen in formal treatments. The rest of this page describes how to write abstract state machines in Kiama. The running example is the simple RISC machine from the RISC example, which is based on the machine described in Wirth’s book Compiler Construction.
The base class of a Kiama abstract machine is org.bitbucket.inkytonik.kiama.machine.Machine
.
package org.bitbucket.inkytonik.kiama.machine
abstract class Machine (val name : String, emitter : Emitter = new Emitter)
The name
argument is used to identify the machine in trace messages.
The emitter
is used to produce output and defaults to standard
output.
To define a new machine, you inherit from Machine
. In the RISC
example, the machine is implemented by the RISC
class which is
constructed from the code sequence that is to be run, a console from
which input should be read and an emitter to which output should be
sent.
import org.bitbucket.inkytonik.kiama.machine.Machine
import org.bitbucket.inkytonik.kiama.util.Console
import org.bitbucket.inkytonik.kiama.util.Emitter
class RISC (code : Code, console : Console, emitter : Emitter)
extends Machine ("RISC", emitter)
The Code
type is a sequence of machine instructions. The actual
instructions are essentially as defined by Wirth and are typical of
RISC machines. They are defined here by case classes in the usual
Scala way. We show a few typical examples; see the code for a complete
list.
type Code = Seq[Instr]
type RegNo = Byte
abstract class Instr
case class MOV (a : RegNo, b : Int, c : RegNo) extends Instr
case class ADD (a : RegNo, b : RegNo, c : RegNo) extends Instr
case class CMPI (b : RegNo, im : Int) extends Instr
case class LDW (a : RegNo, b : RegNo, im : Int) extends Instr
case class BEQ (val label : Int) extends Branch
case class RET (c : RegNo) extends Instr
MOV
is a move from register c
to register a
with a shift c
.
ADD
is a register to register addition. CMPI
sets condition codes
based on a comparison between a register value and an immediate value.
LDW
is a load from a memory location to register a
; the memory
address is given by the second and third arguments. BEQ
performs a
displacement branch if the previous comparison indicated that the two
compared values were equal. Finally, RET
returns from a sub-routine
call using the return address stored in the given register, or halts
the program if it is the main subroutine that is returning.
The state of an abstract state machine comprises distinct named structured values that we call state items. Each state item is initially undefined and may become defined as the machine executes.
The state items are updated by the transition rules of the machine. The effect of executing the transition rules is that zero or more state items are updated.
A crucial property of abstract state machines is that the updates to state items made in one step are not visible until the next execution step. In other words, the updates happen simultaneously at the end of the step. If a particular state item is updated more than once by a single step, the updates are required to be consistent, in that they must all update the state item to the same value. The machine will abort if inconsistent updates are performed.
Execution continues stepwise in this fashion until a step in which no updates are performed, at which time the machine halts. An update that happens to not change the value of a state item is still an update as far as machine termination goes.
In the following we will define an abstract machine to execute RISC
machine code sequences. The main state items of the machine are a
register file called R
and a memory called Mem
.
The register file is indexed by the number of the register. We assume 32 registers, so the indexes will be 0 to 31. Most of the registers are used for general purposes, but some are special: 28 is the program counter, 29 is the frame pointer, 30 is stack pointer and 31 is the link register that holds return addresses during subroutine calls.
The memory consists of words addressed by integer addresses and is assumed to be arbitrarily large for the purposes of this example.
As well as the registers and memory, the machine has two condition
codes Z
and N
, which are set by the comparison instructions and
tested by the conditional branches. Z
is set if two compared values
are equal, while N
is set if the first value is strictly less than
the second value.
The simplest state items of an abstract machine are values of type
State[T]
, where T
is the type of the underlying value. The State
constructor has a parameter sname
which is used to identify the
state item in trace messages.
class State[T] (sname : String)
We can therefore define the Z
and N
state items as follows.
val Z = new State[Boolean] ("Z")
val N = new State[Boolean] ("N")
The machine also has a simple Boolean state item halt
that indicates
whether it should keep executing or not.
val halt = new State[Boolean] ("halt")
Simple state items are sufficient for many purposes, but it is common
to want parameterised state where many state values are collectively
named and indexed by a parameter. Kiama provides ParamState
to
implement parameterised state items.
class ParamState[T,U] (val sname : String) extends State[HashMap[T,U]] (sname)
A parameterised state item is implemented as a simple state item whose value is a map.
In the RISC machine, the register file is parameterised by the register number.
val R = new ParamState[RegNo,Int] ("R")
Aliases are declared for the special purpose registers such as the
program counter (PC
) and the link register used for return addresses
in subroutine calls (LNK
). These definitions use a function call
notation to access a particular component of the parameterised state
item R
.
val PC = R (28)
val FP = R (29)
val SP = R (30)
val LNK = R (31)
The memory of the RISC machine is also represented by a parameterised state item, where the parameter in this case is the memory address.
val Mem = new ParamState[Int,Int] ("Mem")
Initialisation of state is performed by the init
transition rule.
Machine
provides a default definition that does nothing. Most
machines override this definition to properly initialise their state
items. We say that uninitialised state items are undefined and any
attempt to use their values causes the machine to abort.
Transition rules are implemented by regular Scala functions or methods, using a special assignment notation to indicate state updates. Here is the initialisation for the RISC machine that sets the program counter to the beginning of the code sequence, ensures that register zero is zero, clears the condition codes, and sets the halt flag to false.
override def init {
PC := 0
R (0) := 0
Z := false
N := false
halt := false
}
When a rule wants to record a state update for the current step, it
uses the :=
operator instead of a normal Scala assignment statement.
The left-hand side of an update can be a simple state item or a
parameterised state item indexed by an appropriate value. The
right-hand side of an update can be any expression with a value of the
appropriate type.
As discussed above, the recorded updates do not actually happen until the end of the current step. All uses of a state item within a single step will see the same value. The updates are recorded for consistency checking at the end of the step and, if the updates are consistent, the affected state items will be updated at that point before the next step begins.
To preserve the formal properties of a machine, all state changes that
affect the operation of the transition rules should be performed via
updates of State
or ParamState
values, rather than by side-effects
on Scala variables.
Every abstract machine must define a main
transition rule. It is
called once for each step of the machine. main
can perform state
updates itself, or more usually, it calls other functions or methods
that implement different aspects of the machine’s semantics.
In the RISC machine the main rule is
def main {
if (!halt)
execute (code (PC))
}
This rule checks that the machine is not supposed to halt, and if so,
runs the execute
rule on the instruction at the current program
counter location. Recall that PC
is the alias for the part of the
register state associated with index 28. The state implementation is
hidden, so that the value can be accessed as if PC
was just a
regular integer value.
The execute
rule takes the current instruction as a parameter and
calls other rules that handle each of the different classes of machine
instruction.
def execute (instr : Instr) {
arithmetic (instr)
memory (instr)
control (instr)
inputoutput (instr)
}
As defined here, execute
does not care which of the other functions
actually handles the instruction. In fact, zero or more of the
functions may do so, depending on how the machine semantics has been
decomposed. All that matters is that the state updates performed by
the functions are consistent.
In the decomposition used in the example, each of the rules
arithmetic
, memory
, control
and inputoutput
deals with a
subset of the instruction types and ignores the other types. Since
every side-effect that can affect the operation of the machine is
performed using a deferred update on a state item, the order in which
the functions are called does not matter.
To execute arithmetic instructions, the arithmetic
rule matches on
the instruction type and updates the appropriate state. We show the
cases for MOV
, ADDI
and CMPI
.
def arithmetic (instr : Instr) {
instr match {
case MOV (a, b, c) => R (a) := R (c) << b
case ADD (a, b, c) => R (a) := R (b) + R (c)
case CMPI (b, im) => Z := R (b) =:= im
N := R (b) < im
...
case _ =>
}
Note again the use of the :=
operator to record updates. The
right-hand sides are more complex than seen earlier in the init
rule. In almost all cases, standard Scala operators can be used to
calculate with state item values. A notable exception is equality
comparison which should be performed using the =:=
operator, which
compares the underlying state values, instead of the normal ==
,
which compares the state items themselves.
Here are some more cases from the rules to illustrate how other
typical instructions are handled. From memory
:
case LDW (a, b, im) => R (a) := Mem ((R (b) + im) / 4)
From control
:
case b : BEQ if (Z) => PC := PC + b.disp
case RET (c) => PC := R (c)
if (R (c) =:= 0) halt := true
Note how the halt
state item is set to true here if the return
address is zero, indicating that the main subroutine is returning.
This change will be detected by the main
rule.
The basic functionality shown above is sufficient to define deterministic abstract state machines. We plan to add some support for non-determinism in a future version of the library.
Normally updates are performed automatically by the machine at the end
of the init
rule and then at the end of each step. Sometimes it is
useful to be able to force updates to happen immediately. Call the
performUpdates
method to get this effect. One use case for
performing updates immediately is when the init
rule needs to
initialise some state items using the values of other state items. The
latter items should be initialised first, then performUpdates
called, then the dependent state items should be initialised.
You can set the debug
flag of a machine to true to obtain a trace of
steps and state updates as the machine runs.
override def debug = true
Here is beginning part of the debug trace for the RISC machine showing the initialisation and the first three steps of executing a program. The library prints the state updates and step indicators, while the machine itself prints the executed instructions.
RISC.R(28) := 0
RISC.R(0) := 0
RISC.Z := false
RISC.N := false
RISC.halt := false
RISC step 0
RISC exec: ADDI(29,0,0)
RISC.R(29) := 0
RISC.R(28) := 1
RISC step 1
RISC exec: ADDI(30,0,12)
RISC.R(30) := 12
RISC.R(28) := 2
RISC step 2
RISC exec: RD(1)
RISC.R(28) := 3
Up: User Manual, Prev: Attribution, Next: Pretty-printing