achester88
open main menu

Creating NIL Lang

/ 7 min read

screenshot

Background

It all stared about 3 weeks ago when I first heard about hackclub’s arcade program (if you haven’t already check it out here) and discovered the langjam hackathon1. I played around with the idea of joining for a while and ultimately decided to, despite the fact that the jam had already been going on for 3 weeks by that point.

NIL was designed from the jump to be a hard to write, esoteric compiled programming language, so using the experience gained writing my first programing language lang, I followed jauhien’s iron-kaleidoscope guide2 (which proved to be a terrible, terrible mistake).

Building NIL

Everybody starts of Lexwhere

I Started off with the lexer following the guide, but tweaking the code to fit my need of parsing line in reverses, only using parsing the code surrounded by /* */ characters, etc, etc. Its mainly just loops through the input by line in reverse and then uses regex on detect language items, classify them accordingly before finally wrapping then up with the source code position in a Rust Struct. The tokens were then collected and passed as a Vec to the Parser.

Parser, Passing, Pausing

The Parser. Oh, the Parser, this is where it all started. Still dutifully following the iron kaleidoscope guide, I got started on the Parser while my first language used a pure single function recursive parser iron kaleidoscope used a recursive descent parser. Already in a somewhat unfamiliar territory, made the choice to remove the REPL part of iron kaleidoscope, meaning all of the parser code need to be modified to remove NotComplete logic.

Similar to the Lexer, the Parser loops though a Vec of grammar tokens determining the token type and passing the list on tokens to the proper function to be processed. This allows the functions it passes to then operate with certain assumptions based on the designed syntax.

For example:

;{output} (5 + 5)

This code would first hit the main function loop before being passed to the parse_expression function to be wrapped as an anonymous function (done for the sake of simply while working with LLVM), then to the parse_expr function and so on and so on. The result of this little dance then makes its way all the way back to the main function to be added to the AST(Abstract Syntax Tree) and passed to the IR builder. While this method does lead to a large codebase, it makes changing, expanding and modifying the language much, much easier down the road.

F*CK LLVM

Step 3 in the iron kaleidoscope is where my plans for a complied language all fell apart. It was supposed to be simple, install iron-llvm then write some code to convert the AST to LLVM IR. So I added the iron-llvm dependency, naively expecting a simple and straight forward process. After trying to compile the project (and thus iron-llvm) with cargo, I was hit with a simple error that would go on to haunt my dreams.

Couldn’t execute llvm-config. Error: No such file or directory (os error 2)

Ok. Not that bad I though, one simple error. Shouldn’t take more than 5 min to fix

So I took the obvious next step and googled llvm-config, and after some digging discovered that duh I need to install llvm. So one dnf install later I tried building again, only to once again be hit with:

Couldn’t execute llvm-config. Error: No such file or directory (os error 2)

Ok time for some more digging, and about 15min later I learned that the Fedora llvm package did not include the development features and as such, I would need to build LLVM from source. As it was already late at the time, I decided to follow LLVM guide to building from source with ninja and let it build overnight to (hopeful) wake up to a working iron-llvm version to get back to work. Of course, in the typical development, I work half-complete LLVM built and a full disk. It seemed like I had underestimated both the size of the LLVM build and my 40GB Linux Partition. Simple, I though, I’ll just clean some room and build the min size release.

Look, I’ll save you (and me) the trouble of reliving that cycle, but after many (many) installation attempts and a whole lot of file deleting, I was able to get a working version of LLVM to build. So with a work development version of LLVM, I once again tried to build nil, and once again:

Couldn’t execute llvm-config. Error: No such file or directory (os error 2)

Great, just Great.

It was at this point I took a closer look at the iron kaleidoscope repo and realized that both the guide and the iron-llvm package hadn’t been touched in 8 years (9 at the time of writing). Still, I dug deeper and took a look at iron-llvm’s GitHub repo with the hopes of finding an installation guide or at least some docs. It was then I felt true despair when I found nothing but a link back the useless iron kaleidoscope guide and some llvm documentation. After even more investigating, I finally found track down the documentation for the llvm-sys crate and was able to finally get passed the llvm-config error by setting the LLVM_SYS_39_PREFIX env and adding the LLVM bin directory to PATH.

Finally, After at this point several days, I was going to be able to move on with nil and finally get it running.

Nope. I was hit with at least 100 errors from LLVM with 0 leads as to how to fix it. So I made the very hard decision to throw away all of my work over the past couple of days on getting iron kaleidoscope to build and swap to an interpreter in order to meet the deadline of the jam.

Eval for you, Eval for me

Since the iron kaleidoscope guide was for a compiler, I had to build NIL’s Evaluate from scratch, which thanks to the powerful Parser proved to be a relatively easy task. Using similar logic to the Parser, the Evaluator looped over the AST determined the type of each node and passed the tree down to a corresponding function. When the Evaluator finally reached a function call or operation, it first evaluated the arguments. Then it referred to a special forms hashmap connecting names/symbols to inbuilt rust functions such which took in said arguments, preformed an operation and returned a value. Of course, for user defined variables and functions, the process was different.

Zooming in on Scope

With in the NIL Language Scope was represented as a struct two corresponding fields for variables and user defined functions. The function field contains a simple HashMap mapping the names of functions to an AST to be executed when called. The variable field is a Vec of HashMaps mapping variable names to their current value. A Vec is used in order to allow for different scope levels, for example, when the Evaluator runs the then part of an if statement a new HashMap is created for the new scope and removed after the then is evaluated. While easy to implement, this turned out to be very slow when recursion came in the picture, but by this point time was running out so ¯\(ツ)

Version 1.0!

This brings us to the Version 1.0 release, as of now, Error Handing is only half down with only the Lexer (and part of the Parser) giving out “good” errors. Besides that, all other features of the language are working, so feel free to try it out right now Getting Start with NIL

Deranged Maniac Author’s Note

Here we are all done, This is my first time writing something like this, so please forgive the well… all of it. Jokes aside if you did mange to get though all of that, thank you for reading, and I’ll try to improve with the next post (and add more pictures).

References

[1] Langjam Website
[2] Iron Kaleidoscope Guide