Last several months in my spare time I was working on implementation of Rodrigo's vision about new simplified layer of frontend AST for Boo. I used existing unit tests for Boo Parser as base for unit tests for my new project which has working name TinyAST. I added my project to my branch of the project Boo-Extensions on GitHub.
Couple days ago I finished the first phase of the project. I would raither call it "Alpha version" than "Proof of Concept" because I was able to implement about 90% of Boo syntax using the new approach. During this phase I wanted to find an answers to the following questions:
- Which AST classes should be added to the new API?
- Which syntax features could be implemented using the new approach?
- How simple the parser for the new AST could be?
- How to integrate the new API with existing compiler pipeline?
- How to do conversion from TinyAst to Boo AST?
- How to deal with unsupported syntax?
Below are the answers in the corresponding order.
Which AST classes should be included to the new API?
Currently I came up with the following set of AST classes which is almost identical to Rodrigo's one
ast Form = \
Identifier(Name as string, IsKeyword as bool, IsSymbol as bool) | \
Brackets(Form, Type as BracketsType) | \
Literal(Value as object, astLiteral as LiteralExpression) | \
Infix(Operator as Identifier, Left, Right) | \
Prefix(Operator, Operand, IsPostfix as bool) | \
Tuple(Forms as (Form)) | \
Pair(Left, Right, IsMultiline as bool, Doc as string) | \
Block(Forms as (Form))
Which syntax features could be implemented using the new approach?
This is the list of syntax features which are currently not supported:
- Enumerable type shortcut
- Generic placeholders
- Goto and goto labels
- Hash literals
- Regular expressions
- Slicing
- Timespan literals
How simple the parser for new AST could be?
Currently the new parser includes about 50 rules excluding rules related to integration with compiler pipeline, tokens and literals. This is the list of the rules:
form_stmt, block, begin_block, begin_block_with_doc, form, infixr closure_separation, inline_block, prefix_expression, pair, single_line_pair, tuple, pair_expression, brakets_prefix, assignment, high_pr_pair, infix or_expression, infix and_expression, prefix_keyword not_expression, infix membership_expression, infix identity_test_expression, infix isa_expression, infix comparison, infix bitwise_or_expression, infix bitwise_xor_expression, infix bitwise_and_expression, infix term, infix factor, infix bitwise_shift_expression, prefix_symbol signalled_expression, prefix_symbol ones_complement_expression, infix exponentiation_expression, infixr as_operator, infix cast_operator, postfix_operator, infix member_reference, prefix_symbol splice, prefix at_operator, high_priority_prefix, hp_tuple, prefix_of_brackets, infixr of_operator, sticky_brackets, atom = exp_in_brackets | prefix_of_brackets | identifier | literal, identifier, exp_in_brackets, paren_brackets, qq_brackets, square_brackets, curly_brackets
The source code of the parser is in the file TinyAstParser.boo.
I think that the next big challenge is to make the parser simpler by creating more special macroses which will help to declare rules inside OMeta macros. I believe all rules could be formalized using notions token, keyword, atom, infix, prefix and mechanism overloading default operator priorities.
How to integrate the new API with existing compiler pipeline?
To integrate the new API the following has been done:
- Standard parser step is replaced with the new one.
- The new parser converts everything except namespaces and imports to special macros tinyAst and add it to module's globals. AST objects are stored in the macros annotation.
- Reference to tinyAst assembly is added to the compiler references so MacroExpansion step can find macros tinyAst.
- Macros tinyAst converts new front end AST to standard AST using TinyAstEvaluator.
- New class Module is created with overriden function ToCodeString(). It is necessary for testing purposes. Also a new class TinyAstPrinterVisitor has been created to support printing of code provided by method ToCodeString().
How to do conversion from TinyAst to Boo AST?
I used Boo.OMeta to transform object graph of TinyAst to object graph of Boo AST. Several minor changes has been done to the Boo.OMeta to improve ObjectMatching.
After spending several months experimenting with object graph transformations and keeping in mind that scenario I working on is pretty complex I can assert that Boo.OMeta is very powerful tool for this purpose. It is actually subject for a separate article. The code doing transformations is in class TinyAstEvaluator. When I started creating transformation rules for a object graph (or tree) I always had to deal with tree structures. It required to create hard to read rules which mimic structure of the tree they try to recognise. For example this is the rule for statement for:
stmt_for = Prefix(
Operator: FOR,
Operand:
Pair(
Left:
Infix(
Operator: IN,
Left: declaration_list >> dl,
Right: rvalue >> r
),
Right:
block >> body
)
) \
, or_block >> orBlock \
, then_block >> thenBlock ^ newForStatement(dl, r, body, orBlock, thenBlock)
Finally I came up with a set of auxiliary rules which now allows to create production rules in a regular maner. For example this is how class definition is implemented using auxiliary rules here, next, prefix, optional_prefix_operand, prefix_or_rule, optional, nothing.
class_def = --attributes_line >> att, here >> i, inline_attributes >> in_att, member_modifiers >> mod, prefix[CLASS], class_body >> body \
, optional_prefix_operand[super_types] >> superTypes, prefix_or_rule[id] >> className, optional[generic_parameters] >> gp, nothing \
, next[i] ^ newClass([att, in_att], mod, className, gp, superTypes, body)
Rules here, next and nothing can be used in any object pattern matching scenario. Other rules were created specially for TinyAst object tree but this approach can be used for other scenarios.
Dealing with unsupported syntax
This subject is not currently completely resolved but there are already some rules which can be used to deal with the problem. In case the statement cannot be parsed by TinyAstParser, it could call original Boo.OMeta.Parser using function callExternalParser which is currently implemented in TinyAstEvaluator. This approach was used to deal with string interpolation during evalutaion of TinyAst tree:
booparser_string_interpolation = $(callExternalParser("string_interpolation", "Boo.OMeta.Parser.BooParser", input)) \
| $(callExternalParser("string_literal", "Boo.OMeta.Parser.BooParser", input))
I think that original parser could be always called in parallel with TinyAst parser and results could be compared in some way to figure out if there was a problem with parsing. Also result of original parser execution could be stored in annotations of Form to be used in case evaluator can't recognise the statement. And of course as an option the problematic syntax also could be removed from the language to improve its metaprogramming features.
To do
- Create examples of use cases for new AST. Now when minimal functionality of the new API is implemented it is possible to test it in real world scenarios.
- Simplify TinyAst parser or even create a framework allowing developers easily create parsers for the new API.
- 1,000,000 other things
Conclusions
I hope this project showed some perspectives of Rodrigo's Idea. I think that it could significantly improve metaprogramming features of Boo. Currently code is not polished and sometimes it looks like a battlefield but I hope it is possible to get the idea. It is only a first step and it would be great to be sure that it is done in right directions so opinion of Boo comrades would be very helpfull.
Special thanks to Rodrigo who created such a powerfull framework as Boo.OMeta using Alessandro Warth's OMeta specification.