A Scala library for language processing.
Documentation HomeUp: User Manual, Prev: Machines, Next: Messaging
IMPORTANT NOTE: This page describes Kiama 1.x. Much of it also applies to Kiama 2.x, but not all. Please consult the 2.x release notes for the main differences. We are currently writing comprehensive documentation for 2.x that will eventually replace these pages.
This page provides an overview of Kiama’s support for pretty printing. Pretty printing is used in the following examples.
Fragments from these examples are used as illustrations below.
More information about Kiama pretty-printing can be found via the Research page.
Kiama’s pretty printing library is based on the paper “Linear, bounded, functional pretty-printing” by Swierstra and Chitil (Journal of Functional Programming, 19 (1), 1) which details various pretty printing libraries for Haskell. We use the continuation-based form of pretty printer discussed in this paper.
Swierstra and Chitil’s work is descended from earlier libraries by Hughes and Wadler. We have also incorporated combinators from the Haskell PPrint library by Leijen.
Pretty printing is the task of producing a pleasing output representation from some data structure. In the Kiama context, we usually mean the task of turning an abstract syntax tree representing some software artefact into the textual syntax for that artefact. For example, a code generator might create an abstract syntax tree that represents the code to be generated. Pretty printing would turn that tree into the generated source code.
The simplest form of this process is to simply traverse the tree in some fashion, printing the textual equivalent of nodes as they are encountered and inserting punctuation and other special characters where needed. This process is usually called unparsing since it performs the opposite translation to that performed by a parser.
However, we usually want more control than an unparser can provide. For example, we might want to ensure that statement blocks are indented to a level deeper than the statement in which they occur. Or we might want to ensure that no line of the output is longer than a certain length. Obeying these kinds of constraints are what distinguishes a pretty printer from an unparser.
It is quite difficult to perform pretty printing during a single simple traversal of a tree. Consequently, pretty printing libraries often adopt a document model view. The idea is that we construct a value that represents the document to be created as the result of the pretty printing. The document embodies the constant text that must be present, optional parts, and the constraints if any on the final output.
Kiama’s pretty printing library is in the org.bitbucket.inkytonik.kiama.output.PrettyPrinter
trait. You must mix this trait into whichever class or object needs to perform
pretty printing. For example, if you have
object P extends org.bitbucket.inkytonik.kiama.output.PrettyPrinter
then within P
or via P
’s public methods, you can access the pretty
printing functionality.
Documents are represented by the type PrettyPrinter.Doc
. Pretty printing
amounts to constructing a Doc
value then converting it to a string using
the pretty
method.
def pretty (d : Doc, w : Width = defaultWidth) : Layout
(Layout
is a type alias for String
.) pretty
takes the given
document and a maximum output width measured in characters. It returns
the pretty printed form of the document as a string, which can then be
output or sent elsewhere as needed.
defaultWidth
is 75, but can be overridden when the PrettyPrinter
trait is mixed in.
The pretty printing library provides ways to make basic documents and ways to combine documents into more complex documents.
The simplest document is an empty one, created by the empty
combinator:
def empty : Doc
text
turns a string into a document. The string should not contain
newlines if they are intended to be significant to the pretty printer.
implicit def text (t : String) : Doc
text
is implicit so you can use string literals as documents in most
cases, and we will do so in the following.
Documents can be combined in a simple way using the <>
method of
Doc
which creates a document whose content is the concatenation of
its receiver and the argument.
class Doc {
...
def <> (e : Doc) : Doc
...
}
(Actually, methods including <>
are defined in a DocOps
trait
that is mixed in to a specific implementation of Doc
, but the
difference does not matter to a client of the library.)
Thus, the document
"Hello" <> empty <> " world!"
creates a document that will be pretty printed as “Hello world!
”.
A fundamental operation for this kind of pretty printing library is grouping. The idea is that we group a document when it has multiple ways of being printed and we want the pretty printer to consider these different ways to pick the best.
Use the group
combinator to indicate a group.
def group (d : Doc) : Doc
The constraint that we want to satisfy in order to pick the best output is to make sure the pretty printed version fits in the output width. A pretty print result that fits in the output width is preferred over one that doesn’t.
As a result of this constraint, a group will be formatted either horizontally (if it fits on the current line) or vertically (if it doesn’t fit). Groups occurring within a horizontal group will also be formatted horizontally, but groups occurring within a vertical group are not constrained in this way.
The difference between horizontal formatting and vertical formatting
is expressed by line breaks, expressed in their simplest form by
the line
combinator.
def line : Doc
The idea is that we indicate where the potential line breaks go by
putting line
documents there. If a line break occurs within a group,
it will be replaced by a space if horizontal formatting is being used
for that group, or by a newline if vertical formatting is being used.
A line break occurring outside any group always results in a newline.
For example, the document
"Hello" <> line <> "world!"
will result in
Hello
world!
if it is not within a group.
In contrast, the document
group ("Hello" <> line <> "world!")
will be printed as “Hello world!
” if that text would fit on the
current line, or as
Hello
world!
if there is insufficient room.
linebreak
is the same as line
except that it is replaced by
nothing if horizontal formatting is being used for the group in which
it appears.
def linebreak : Doc
line
and linebreak
are just special cases ofline (repl)
where
repl
is an arbitrary string replacement.
For example, line ("; ")
will be replaced by a semicolon followed by
a space, if the newline is omitted.
def linebreak : Doc
It is common to want to indent pretty printed output. By default, no
indentation is inserted, so documents must be created to handle the
indentation explicitly. The nest
combinator makes it easy to deal
with indentation in a general way.
def nest (d : Doc, j : Indent = defaultIndent) : Doc
The result of nest (d, j)
is a document that has the same content as
d
, except that it is indented j
spaces further than the current
indentation level. The indentation level is zero when pretty printing
of a document begins.
defaultIndent
is four, but can be overridden when the
PrettyPrinter
trait is mixed in.
Indentation interacts closely with the presence or absence of line breaks, since indentation is only inserted after a line break. For example, the document
"Hello" <> nest (line <> "world!")
will produce “Hello world!
” if the line break is omitted, but
Hello
world!
if it is not omitted. Note that the line
has to be inside the
nest
for the indentation that follows it to be affected.
The combinators discussed so far form the basic primitives of all documents. The library provides an extensive collection of other combinators as short-hand notations for common situations.
This page summarises the kinds of other combinators and lists some instances of each kind. See the PrettyPrinter trait for a complete list.
First up is string
which is like text
except that the string is
allowed to contain newlines. Each newline is turned into an invocation
of line
.
def string (s : String) : Doc
char
is the same as string
but just for a single character.
implicit def char (c : Char) : Doc
char
is implicit so you can use string literals as documents in most
cases.
Single characters have combinators so that they
can be made easier to read. These include comma
and dot
, plus bracketing
characters such as lparen
and rparen
(but also see the surround
and
enclose
combinators described in the next section).
def comma : Doc
def dot : Doc
def lparen : Doc
def rparen : Doc
Explicit spaces can be produced using the space
and spaces
combinators. The latter produces exactly the number of spaces given by
its integer argument, or an empty document if the argument is not
positive.
def space : Doc
def spaces (n : Int) : Doc
The softline
and softbreak
combinators produce optional line
breaks that are individually grouped.
def softline : Doc
def softbreak : Doc
In addition to the basic concatenation method <>
, there are
varieties called <+>
, <@>
, <@@>,
</>
, and <\>
that include a
space
, line
, linebreak
, softline
and softbreak
,
respectively, between the concatenated documents.
class Doc {
...
def <+> (e : Doc) : Doc
def <@> (e : Doc) : Doc
def <@@> (e : Doc) : Doc
def </> (e : Doc) : Doc
def <\> (e : Doc) : Doc
...
}
The surround
combinator surrounds a document d
with two
occurrences of another document b
, using <>
for concatenation.
def surround (d : Doc, b : Doc) : Doc
Combinators such as dquotes
(for double quotes) and squotes
(for single quotes) are special cases of surround
.
enclose
is a form of surround
where the document on the left might
be different to the one on the right.
def enclose (l : Doc, d : Doc, r : Doc) : Doc
braces
and parens
are commonly used to enclose a document in
braces or parentheses, respectively.
def braces (d : Doc) : Doc
def parens (d : Doc) : Doc
value
turns an arbitrary value into a document using its toString
method and string
.
def value (v : Any) : Doc
Sequences (instances of Seq
) can be pretty printed using a collection
of combinators of which hsep
and vsep
are typical.
hsep
concatenates the items in the sequence using <+>
, while
vsep
uses <@>
.
def hsep (ds : Seq[Doc]) : Doc
def vsep (ds : Seq[Doc]) : Doc
There are many other variants, such as ones that allow the separator
to be specified, and lterm
that prints lists where the elements
are individually terminated rather than separated.
PrettyPrintable
traitThe library comes with a variant of pretty
that accepts an instance
of the PrettyPrintable
trait instead of a document. This variant can
help to simplify some pretty printers.
def pretty (p : PrettyPrintable) : Layout
trait PrettyPrintable {
def toDoc : Doc
}
This version of pretty
calls the object’s toDoc
method to obtain
the document to be printed. By default, toDoc
just calls the value
combinator on the object. There is also an implicit conversion from
Any
to PrettyPrintable
that uses value
to implement toDoc
.
The idea is that PrettyPrintable
is mixed in to domain classes that
implement the toDoc
method. Values of those domain classes can then
be pretty printed directly without having to explicitly call a method
to convert them to a document. The implicit conversion allows any
value to be pretty printed very easily.
The list
combinator pretty prints a List
in Scala notation,
inserting grouping, nesting and line breaks as needed. For example,
the document
list (List (1, 2, 3))
results in “List(1, 2, 3)
” if it fits on the current line, or
List(
1,
2,
3)
if it doesn’t.
list
can be customised in many ways. Here is the full signature:
def list[T] (l : List[T], prefix : String = "List",
elemToDoc : T => Doc = (x : T) => value (x),
sep : Doc = comma,
sepfn : (Seq[Doc], Doc) => Doc = lsep) : Doc =
prefix
is the string that is printed at the beginning of the list
(default: “List”). elemToDoc
provides a way to convert the list
elements to documents (default: using the value
method). sep
is used to separate each element (default: a comma). Finally,
sepfn
is used to format the elements with the separator (default:
lsep
).
For example, the alternative layout for our example list
List(1
, 2
, 3
)
can be achieved using
list (List (1, 2, 3), sepfn = lsep2)
if the available line width is not sufficient.
See the API documentation
for your Kiama release for detailed descriptions of lsep
and
lsep2
.
The any
combinator provides a convenient way to pretty print any
value, most commonly instances of case classes, but also common
data structures such as tuples, vectors, and maps.
For example, given the classes
abstract class Exp
case class Num (i : Int) extends Exp
case class Add (l : Exp, r : Exp) extends Exp
the document
any (Add (Num (1), Num (2)))
might be pretty-printed as
Add (Num (1), Num (2))
or
Add (
Num (
1),
Num (
2))
depending on line width.
As a more complex example, consider
pretty printing
for the Imperative language. We implement a function show
that
converts any imperative language abstract syntax tree node into a
document. show
uses showbin
that abstracts documents for
parenthesised applications of binary operators. This is an
example of a functional-style pretty printer, where the pretty
printing logic is kept outside of the data that is to be printed.
(See the next example for a more object-oriented style.)
object PrettyPrinter extends org.bitbucket.inkytonik.kiama.output.PrettyPrinter {
import AST._
def pretty (t : ImperativeNode) : String =
super.pretty (show (t))
def show (t : ImperativeNode) : Doc =
t match {
case Num (d) => value (d)
case Var (s) => s
case Neg (e) => parens ("-" <> show (e))
case Add (l, r) => showbin (l, "+", r)
case Sub (l, r) => showbin (l, "-", r)
case Mul (l, r) => showbin (l, "*", r)
case Div (l, r) => showbin (l, "/", r)
case Null () => semi
case Seqn (ss) => braces (nest (line <> ssep (ss map show, line)) <> line)
case Asgn (v, e) => show (v) <+> "=" <+> show (e) <> semi
case While (e, b) => "while" <+> parens (show (e)) <> nest (line <> show (b))
}
def showbin (l : ImperativeNode, op : String, r : ImperativeNode) : Doc =
parens (show (l) <+> op <+> show (r))
}
The program
{ i = 10; count = 0; while (i) { count = count + 1; i = 1 + i; } }
is represented by the abstract syntax tree
Seqn (
List (
Asgn (Var ("i"), Num (10.0)),
Asgn (Var ("count"), Num (0.0)),
While (
Var ("i"),
Seqn (
List (
Asgn (
Var ("count"),
Add (Var ("count"), Num (1.0))),
Asgn (Var ("i"), Add (Num (1.0), Var ("i"))))))))
This abstract syntax tree text was produced using show
and
product
. Notice how the second last Asgn
node has been broken over
two lines (breaking at the comma) because it would not fit in the
default output width, whereas the last one has not been broken, since
it fits. The other nodes in the while loop above the assignments have
been broken in a similar fashion.
Omitting product
, we get
{
i = 10.0;
count = 0.0;
while (i)
{
count = (count + 1.0);
i = (1.0 + i);
}
}
A more object-oriented approach to pretty printing is to add the
conversion logic to the data classes. In this case we make the pretty
printable data implement the PrettyPrintable
class of the pretty
printer and override the toDoc
method to provide custom conversion.
First, we define a pretty printer and import its members into the current scope.
object MyPrettyPrinter extends org.bitbucket.inkytonik.kiama.output.PrettyPrinter
import MyPrettyPrinter._
Now consider two classes that implement a special kind of pair whose components are a special kind of integer.
class MyPair (val l : MyInt, val r : MyInt) extends PrettyPrintable {
override def toDoc : Doc =
brackets (l.toDoc <> ":" <> r.toDoc)
}
class MyInt (val i : Int) extends PrettyPrintable {
override def toDoc : Doc =
value (i) <> "!"
}
The MyPair
and MyInt
classes are PrettyPrintable
and their
toDoc
methods provide the desired custom conversion for those
types.
The program can create a MyPair
and pretty print it as follows.
val l = new MyPair (new MyInt (1), new MyInt (2))
println (pretty (l))
The result is as follows.
[1!:2!]
Expression-oriented languages use parentheses to control the application of operators. We can print such programs with parentheses around every expression and be sure that the printed program will respect the operator precedence and associativity that was intended.
Alternatively, we can just put in parentheses where they are needed.
For example, if we are printing 1 + 2 + 3
then parentheses are not
needed at all, but in (1 + 2) * 3
we need the parentheses around
1 + 2
but not another set around the whole expression.
Kiama includes the org.bitbucket.inkytonik.kiama.output.ParenPrettyPrinter
trait that
makes it easy to define pretty-printers that perform minimal
parenthesisation. The module uses algorithms published by Norman Ramsey
in his paper
“Unparsing expressions with prefix and postfix operators”
(Software: Practice and Experience, 28 (12), 1998).
See the Oberon0 example for an illustration.
The module org.bitbucket.inkytonik.kiama.output.Filters
provides functions that may be
usefully applied after pretty-printing.
Sometimes the pretty-printed output is designed to go in fixed space.
The filters keepMaxChars
and keepMaxLines
can be used to discard
excess characters or lines. E.g., keepMaxLines (5) (s)
will return
the first five lines of s
or the whole of s
if s
contains fewer
than five lines.
Similarly, maxWords
filters to a maximum word limit.
Finally, keepMaxIndent (n, s)
filters s
by replacing all runs of
consecutive lines that have indentation at least n
spaces with "..."
indented by n
spaces. Thus, keepMaxIndent
can be used to summarise s
by
omitting information nested below a certain level. An optional extra argument
can be used to specify alternatives to "..."
.
Up: User Manual, Prev: Machines, Next: Messaging