Monolith Design 3
My first design for Monolith involved a completely extensible compiler – projects could have a ‘custom syntax file’ which added patterns to the parser and linked them to compile-time code generation/transformation, much like MetaLua’s. This was Monolith Design 1(known as Epsilon at the time).
After researching D, Scala and ANTLR and realizing how difficult it would be to build an extensible context-free grammar parser in C++, I decided to downscope the design. Monolith Design 2 would have a static grammar and would focus on its type system and memory management. This seemed like a reasonable move at the time – Scala achieves great expressiveness without any metaprogramming, and D’s syntax implies that metaprogramming can be added as an afterthought. However, it continued to bother me that this design wouldn’t be nearly as elegant as Lisp, and that the fixed syntax would still impair the design of DSLs somewhat.
I took two architectures of Design 2 to the proof-of-concept stage, where an AST would be transformed into LLVM calls that compile an executable. Unfortunately, during both attempts, Life took over and prevented me from finishing them.
Since being forced to abandon Design 2, I’ve continued contemplating how to make an ideal programming language – a language at least as expressive and safe as Scala, as flexible as Lisp and as convenient as Python or Factor. Design 3 emerged as an attempt to completely abandon the C-style syntax and instead pursue a “Lisp without parentheses” design. Instead of being one of the countless variations of Lisp, D3 will have static typing from end to end and will use several word/operator precedence rules, most of which can be redefined. The basic structure will still be of a 1:1 mapping of words to AST nodes, and of AST nodes being able to enumerate/serialize themselves much like the Lisps can. Departing from Lisp, these AST nodes will be able to attach data to values(such as typing), override the name lookup within their domain, manipulate the current function/class context, etc. An example follows:
func get_total_length:
(strs: list string) //arguments go on the first line of a function body
returns int //other descriptions(such as return value) also go at the top of a function body
return sum mapcall(strs, &length)
Let me break down the last line into its separate AST nodes:
- list string – as generic instantiation is treated with the same syntax as function calls, this is valid. ‘list string’, ‘list(string)’ and ‘(list string)’ are identical
- return – emits the LLVM ‘ret’ opcode and adds the type of its operand to the function’s accumulated ret-type inference list
- sum – mentioning method names automatically executes them, greedily consuming the next value (usually to the end of the line, unless encased in parentheses)
- mapcall – takes the first argument as an iterable for the ‘map’ function. In its second argument, it mixes in the first argument’s context so that child members can be referred to without the ‘strs.’ or ‘list<string>::’ part
- &length – the & is a necessary flag to prevent the parser from assuming that the method is executed
The net result of the ASTs running on this is roughly equivalent to calling this inside the compiler:
context.addWord("get_total_length", WordPrecedence.FUNC, context.createFunc((strs: list(string)), int, { return sum(map($1, list(string).&length)) }))
Words such as mapcall can easily be created as though they were Lisp macros, and they have (nearly) full control over how the remainder of the sentence is parsed.
Another cool feature: class members, etc. also map to words which can be overridden. This means that you could make a function that safely has a different return type depending on what argument is supplied. Obviously, if the argument supplied was non-const, then it would have to return either the supertype of all possible return values, or a type disjunction. This basically removes all the hackery of template metaprogramming, allowing fully featured metaprogramming while maintaining static type safety.
Overall, this form of metaprogramming allows a huge amount of convenience around “glue” code, for example: generating your ORM at runtime, emulating higher-level language’s more advanced type systems(I’m looking at you, Haskell), and embedding another language within Monolith scripts almost seamlessly. My only worry is that between the freedom to omit dots and parentheses, the lack of delimiters marking what code is treated as a closure, and the fact that you can even replace the = operator if you desire, the code may look like garbage in the hands of unskilled coders… But then again, what languages other than VB.Net and Python actually have heavy enough syntax rules to not be horrific in the hands of newbs?
Stay tuned for my next article, where I’ll go over the details of the syntax and precedence rules.