Ztrategic - A library for Strategic Programming with Attribute Grammars in Haskell
Overview:
Ztrategic
What is Ztrategic
We strongly recommend reading our publication "Zipping Strategies and Attribute Grammars" regarding this repository for more context and explanation; nevertheless, we provide a tutorial based on previous work in this README. Ztrategic is a library built on top of the Zipper
data structure, and it allows for data traversal, manipulation and reduction, regardless of the underlying data structure. For more complex transformations, it is possible to integrate Attribute Grammars with Ztrategic.
Ztrategic is a standalone library and it is intended to be simple to use - just import it, respect any needed boilerplate, and define your transformations.
Getting Started
Let us consider a data type for the representation of Let
expressions:
data Let = Let List Exp
data List
= NestedLet Name Let List
| Assign Name Exp List
| EmptyList
data Exp = Add Exp Exp
| Sub Exp Exp
| Neg Exp
| Var Name
| Const Int
type Name = String
A Let
expression contains a List
of declarations followed by an Exp
to be evaluated. A List
can be an EmptyList
or either a NestedLet
(that is, another Let
block nested inside our current block) or an Assign
(which is a declaration of a variable); any of these are followed by a List
containing the rest of the declarations.
We denote that this is not an optimal definition of a Let
block: we could use Haskell lists instead of explicit recursion on List
, and Let
values themselves could be inside Exp
. Nevertheless, this data type is heterogenous (we have data types Let
, List
and Exp
intertwined) and therefore not exactly easy to handle in classic Haskell fashion.
This tutorial will break on your machine because there is no definition of StrategicData
for data type Let
. We discuss this later here, but in the meanwhile you can skip this problem by adding the following line to your module:
instance StrategicData Let
Type-Preserving Strategies
Let us now consider a series of arithmetic expression optimizations:
1) add(e, const(0)) -> e
2) add(const(0), e) -> e
3) add(const(a), const (b)) -> const (a+b)
4) sub(a, b) -> add(a, neg(b))
5) neg(neg(e)) -> e
6) neg (const(a)) -> const (-a)
7) var (id) | (id, just(e)) ∈ env -> e
Let us consider only the first 6 rules at first. Using regular Haskell code, these rules are easy to implement but they are burdensome: None of these rules really care about the Let
and List
data types, but we would be forced to write code to recurse through them anyway. For now, let us write only a function to handle Exp
values, returning Just
the result when appliable and Nothing
otherwise:
expr :: Exp -> Maybe Exp
expr (Add e (Const 0)) = Just e
expr (Add (Const 0) e) = Just e
expr (Add (Const a) (Const b)) = Just (Const (a+b))
expr (Sub a b) = Just (Add a (Neg b))
expr (Neg (Neg e)) = Just e
expr (Neg (Const a)) = Just (Const (-a))
expr _ = Nothing
This expr
function contains all the work that is, in fact, relevant for our use case. In regular Haskell code, the rest of the code can be considered boilerplate / copy rules, only recusing through data structures with no real work. Instead, we will be using a strategy to apply expr
to our data type.
There are two types of strategies in our library:
- Type-Preserving (TP) strategies preserve the type of the data structure, changing only its contents. They are useful for transforming and manipulating data structures. They can be thought of as map operations.
- Type-Unifying (TU) strategies produce a new value out of data from the input data structure. They are useful for gathering data out of and/or analysing data structures. They can be thought of as reduce operations.
For this, we will use a Type-Preserving (TP) strategy. From the several strategies available for this, we will choose the innermost strategy. This strategy will apply a transformation as many times as possible until a fixed point is reached. Because applying a transformation in part of an expression can enable new transformations in different parts of expressions, we we might need to re-apply expr
to previously-processed nodes, so this strategy handles re-application for us.
The full code to apply expr
with an innermost strategy is as follows:
opt :: Zipper Let -> Maybe (Zipper Let)
opt t = applyTP (innermost step) t
where step = failTP `adhocTP` expr
There is a lot to unpack here. For starters, our input is a Zipper Let
and our output is a Maybe (Zipper Let)
. This is not a problem, as we can convert to and from Zipper
with the functions toZipper
and fromZipper
easily (these are from the Data.Generics.Zipper
module from the syz
package).
Then, we define a step
worker function. This will be the function to be applied to every single node on the data type we are traversing. Note that this function can be applied to all nodes, even primitive data types such as Int
or Char
. We define this function by defining its default behaviour, which we define as failTP
here, meaning that by default this worker function fails silently. We add more behaviours to it with the composing function adhocTP
and its variations. We then add expr
to our worker function. The resulting step
function will apply expr
if possible (when visiting an Exp
node), otherwise applying failTP
and therefore failing.
We then define what strategy we intend to use. We opt for innermost
, and thus our strategy will be innermost step
, meaning to apply step
in a innermost
way. (See more strategies here) This means that step
will be applied as many times as possible, until it fails on all nodes. It is important that the step
function respects this and fails when no work is to be done; if it does not fail, then the innermost
strategy has no way to know it should stop, resulting in an endless loop. Choosing to use the failing function failTP
or the identity function idTP
as a default behaviour must depend on what kind of strategy is to be used.
Finally, we apply this strategy with the applyTP
function. This strategy will apply the optimization rules we have defined as many times as possible until no optimizations are left. We only express transformations on the Exp
data type, and we ignore the Let
and List
datatypes entirely, instead letting the strategic machinery handle them for us.
Type-Unifying Strategies
Let us now take a look at Type-Unifying strategies. These traverse a data structure and produce a new result out of it, and therefore they are perfect for reduce operations. We will define a function to count the number of assigned variables in a Let
.
There is a very important consideration when using Type-Unifying strategies. We can produce any result using them, but it must be an instance of the Semigroup
Monoid
class so that intermediate results can be concatenated into a single result. If you also need to select only part of the results and/or to stop the strategy halfway through the traversal, your result must be an instance of Alternative
MonadPlus
for the selection of only one alternative. If you don't understand what this means or are unsure on what to do, assuming you need to produce various a
elements as an output, you can use a [a]
as an output for the former behaviour only or a Maybe [a]
for the latter (the alternative behaviour for lists is concatenation, not selection of a sublist, while the alternative behaviour for Maybe
values is the selection of the left element). If you are also unsure of this, always use Maybe [a]
.
Returning to our example, let us define the worker function. It will gather all the declared variables in a list, keeping the considerations above in mind:
countAssigns :: List -> Maybe [String]
countAssigns (Assign s _ _) = Just [s]
countAssigns (NestedLet s _ _) = Just [s]
countAssigns _ = Nothing
There is not much to unpack here. We count both Assign
and NestedLet
nodes, as they are both declarations of variables. For each, we return the name of the variable itself. As per the considerations above, we could return only a [String]
, with no real difference for this example.
Next, we apply countAssigns
to all nodes of our data. For this, we will be using a full_tdTU
strategy, meaning full, top-down, type-unifying (See more strategies here. This will traverse the data structure exactly once, from the top to the bottom, applying our worker function whenever possible:
declarations :: Zipper Let -> Int
declarations t = let result = applyTU (full_tdTU step) t
step = failTU `adhocTU` countAssigns
in case result of
Nothing -> 0
Just l -> length l
As before, we define a step
function, failing by default and applying countAssigns
whenever possible. Note the TU
suffix on the fail
and adhoc
functions. Our result
will be of type Maybe [String]
, which will be Nothing
if we do not gather any results, or Just l
in which l
is a list of declared names; we count them with length
.
Strategic Data
The StrategicData
module is an optimization baked into the Ztrategic
library. For a data structure to be traversable by this library, it must be an instance of StrategicData
, and for most cases it is enough to use the default implementation. For this, we just include the line
instance StrategicData <datatype>
anywhere in our code. However, we can instead decide we want to skip certain nodes in our traversal. Let us recall the declarations
strategy we define above. We are counting declarations, but we are traversing all nodes - even nodes we know are completely irrelevant for us! For example, we are traversing into String
values, which is completely useless. Due to Haskell's implementation of strings being a list of characters, which are in turn defined as a character appended to a list of characters, String
values are deceptively deep and nested structures. The string Test
contains 5 strings in it, specifically Test
, est
, st
, t
and the empty string, and it also contains each of the characters that make up the string, totalling 9 actual nodes visited by a strategy when the Test
string is found (We prove this in module Examples.Let.LetMisc
).
TODO: continue this
API Reference
Type-Preserving
Strategies:
- full_tdTP - Full, top-down, type-preserving strategy. Will fail if any node fails, so take care to only fail if the strategy needs to be interrupted. Use
idTP
to signal no changes on a node.
- full_buTP - Full, bottom-up, type-preserving strategy. Will fail if any node fails, so take care to only fail if the strategy needs to be interrupted. Use
idTP
to signal no changes on a node.
- once_tdTP - Once, top-down, type-preserving strategy. Will traverse through fails until it succeeds in applying a worker function once, then it will stop. Use
failTP
to signal a node is not changed by default.
- once_buTP - Once, bottom-up, type-preserving strategy. Will traverse through fails until it succeeds in applying a worker function once, then it will stop. Use
failTP
to signal a node is not changed by default.
- stop_tdTP - Stop, top-down, type-preserving strategy. Will stop traversing into the subtrees when it succeeds in applying a worker function once. This is different from the
once_
strategies as several subtrees can be branched into and traversed partially - it does not compute only once, instead it stops traversing downwards when it succeeds. Use failTP
to signal a node is not changed by default.
- stop_buTP - Stop, bottom-up, type-preserving strategy. Will stop traversing into the subtrees when it succeeds in applying a worker function once. This is different from the
once_
strategies as several subtrees can be branched into and traversed partially - it does not compute only once, instead it stops traversing downwards when it succeeds. Use failTP
to signal a node is not changed by default.
- innermost - Applies the given worker function as many times as possible, starting from the innermost node. This will loop infinitely if we do not signal fail properly when no changes are to be made. Use
failTP
to signal a node is not changed by default.
- outermost - Applies the given worker function as many times as possible, starting from the outermost node. This will loop infinitely if we do not signal fail properly when no changes are to be made. Use
failTP
to signal a node is not changed by default.
- breadthFirst - Full, top-down, type-preserving strategy. It traverses breadth-first instead of the depth-first behaviour of the previous strategies. Use it with caution as this strategy is particularly volatile, and changing the shape of the data structure being traversed breaks this strategy. Use
idTP
to signal no changes on a node.
- full_uptdTP - Full, top-down, type-preserving strategy. Similar to
full_tdTP
but it makes use of the advantages of Zippers to only affect nodes above the starting node.
- full_upbuTP - Full, bottom-up, type-preserving strategy. Similar to
full_buTP
but it makes use of the advantages of Zippers to only affect nodes above the starting node.
- once_uptdTP - Full, top-down, type-preserving strategy. Similar to
once_tdTP
but it makes use of the advantages of Zippers to only affect nodes above the starting node.
- once_upbuTP - Full, bottom-up, type-preserving strategy. Similar to
once_buTP
but it makes use of the advantages of Zippers to only affect nodes above the starting node.
- mutations - Given a function
tr
that changes one node on a data structure, generates all possible combinations of data structures similar to the input, in which one node was changed by tr
. The type of this strategy is an outlier and it reads mutations :: a -> (n -> Maybe n) -> [a]
, meaning the input is a data structure and the aforementioned tr
function, and the output is a list of variations of the input.
Composition:
- adhocTP - Compose two functions. Apply the rightmost if possible, and if the types don't match, apply the leftmost one.
- adhocTPSeq - Compose two functions. Apply the rightmost and then the leftmost in sequence. This combinator makes it possible to compose two functions that operate on the same type, but this is generally less efficient (but possibly more expressive) than using
adhocTP
and a single worker function.
- adhocTPZ - Similar to
adhocTP
, but the worker function has access to a Zipper
pointing to the current node. This enables the usage of Attribute Grammars and/or nested strategies.
- adhocTPZSeq - Similar to
adhocTPSeq
, but the worker function has access to a Zipper
pointing to the current node. This enables the usage of Attribute Grammars and/or nested strategies.
Type-Unifying
Strategies:
- full_tdTU - Full, top-down, type-unifying strategy. Will traverse all nodes on a data structure, collecting whichever data we require and storing it into a structure. Use
failTP
to skip irrelevant nodes.
- full_buTU - Full, bottom-up, type-unifying strategy. Will traverse all nodes on a data structure, collecting whichever data we require and storing it into a structure. Use
failTP
to skip irrelevant nodes.
- once_tdTU - Once, top-down, type-unifying strategy. Will traverse all nodes on a data structure, until a node succeeds in producing data, in which case the traversal stops and said data is returned. Use
failTP
to skip irrelevant nodes.
- once_buTU - Once, bottom-up, type-unifying strategy. Will traverse all nodes on a data structure, until a node succeeds in producing data, in which case the traversal stops and said data is returned. Use
failTP
to skip irrelevant nodes.
- stop_tdTU - Stop, top-down, type-unifying strategy. Will stop traversing into the subtrees when it succeeds in applying a worker function once. This is different from the
once_
strategies as several subtrees can be branched into and traversed partially - it does not compute only once, instead it stops traversing downwards when it succeeds. Use failTP
to skip irrelevant nodes.
- stop_buTU - Stop, bottom-up, type-unifying strategy. Will stop traversing into the subtrees when it succeeds in applying a worker function once. This is different from the
once_
strategies as several subtrees can be branched into and traversed partially - it does not compute only once, instead it stops traversing downwards when it succeeds. Use failTP
to skip irrelevant nodes.
- full_uptdTU - Full, top-down, type-unifying strategy. Similar to
full_tdTU
but it makes use of the advantages of Zippers to only affect nodes above the starting node.
- full_upbuTU - Full, bottom-up, type-unifying strategy. Similar to
full_buTU
but it makes use of the advantages of Zippers to only affect nodes above the starting node.
- once_uptdTU - Once, top-down, type-unifying strategy. Similar to
once_tdTU
but it makes use of the advantages of Zippers to only affect nodes above the starting node.
- once_upbuTU - Once, bottom-up, type-unifying strategy. Similar to
once_buTU
but it makes use of the advantages of Zippers to only affect nodes above the starting node.
- foldr1TU - Folds relevant nodes right-to-left into a single result. Shorthand for
foldr1
after a strategy.
- foldl1TU - Folds relevant nodes left-to-right into a single result. Shorthand for
foldl1
after a strategy.
- foldrTU - Folds relevant nodes right-to-left into a single result, using supplied value as a starting value. Shorthand for
foldr
after a strategy.
- foldlTU - Folds relevant nodes left-to-right into a single result. using supplied value as a starting value. Shorthand for
foldl
after a strategy.
Composition:
- adhocTU - Compose two functions. Apply the rightmost if possible, and if the types don't match, apply the leftmost one.
- adhocTUZ - Similar to
adhocTU
, but the worker function has access to a Zipper
pointing to the current node. This enables the usage of Attribute Grammars and/or nested strategies.
TODO: Finish small tutorial.
Repository Layout
Here we showcase the different files in our repository and how they are organized.
Library
This folder contains all the relevant files of the core Ztrategic library - these are what makes Ztrategic work, and will be imported throughout any relevant examples using this library.
Ztrategic.hs
The main Ztrategic module. There are similar implementations for memoized versions in the Memo folder.
StrategicData.hs
A module that defines the StrategicData
class, which is mandatory for the main data type being traversed by Ztrategic. We use it to define terminal symbols that can be skipped by the traversals. A correct usage of this class can net relevant performance improvements.
TODO: add 2 examples, one for never skip, one for skip strings.
Attribute Grammars
Combining AGs with Strategies