This is the third installment in creating a brainfuck (BF) compiler using F# and FParsec. Previous posts discussed the parsing and the IL generation. Now it it time to pull it all together into something useful.
Series:
Part 1 (Parsing)
Part 2 (IL Generation)
Part 3 (Compiler)
Part 4 (Optimization)
Using Paket, here is a sample paket.dependencies file.
1 | source https://nuget.org/api/v2 |
Here is the standard boilerplate for loading assemblies and generally getting ready to go.
1 | System.IO.Directory.SetCurrentDirectory(__SOURCE_DIRECTORY__) |
I’ll start with the parsing code. I include it here for completeness sake, but there isn’t much new here. From a logic perspective, I added Comment support into the parser (basically any char that isn’t a valid character). I also wrapped the parsing code in a module for organizational purposes.
1 | // Language operators (used in parsing and emitting) |
This is the start of the new IL emitting code. Sticking with the standard definition, the memory array will be of size 30,000. LocalBuilder
is an IL local variable object. To protect me from myself, I leverage F# types to ensure that I’m using the correct variable in the correct place. For those not familar with F#, this will become more obvious where the variables are used. Simply, the type safety this brings is immensely useful and offers me a piece of mind. As a sidenote, BF defines the use of a program counter. I don’t actually need a program counter to keep track of where I am, so it’s excluded from my implementation. It’s here, but commented out primarily for reference-sake.
1 | module Emitter = |
There will some variable creation along the way. To be consistent I create some functions to consistently implement that process. ilGenerator is basically the function the code is attached to. Creating a scalar is straight forward. Creating an array is a bit more involved. I need to create the variable, then call Newarr
with the element type and array size to initialize the array. Once complete, I return the created variable
1 | // Create IL local variable |
Both the add and sub perform similar actions, add a value to a local variable. As a quick refresher, there will be lots of pushing onto the stack, performing actions, and popping results off a stack. As a result, these functions follow a pattern that will show up repeatably in this post. LdLoc
pushes a local variable value onto the stack. Ldc_I4
pushes a 32bit int onto the stack; in this case the amount to be added/subtracted to/from the local variable value. Add
and Sub
pop the top 2 values off the top of the stack, perform their respective actions, and push the result onto the stack. StLoc
pops a value off the stack and stores it in the local variable.
1 | // Add int to IL local variable |
The functions emitGt and emitLt increment and decrement the data pointer by leveraging the previously defined add/sub functions. This is also an important place to mention type safety (See, I said I’d get back to that). I can ensure that I don’t ever accidently send anything other than the data pointer into these functions. That can simply be done with a (dataPointer:DpPointer)
, but I want to do a bit more; I want to use the LocalBuilder
value within the type. If you’re not familar with the syntax, (DP(dataPointer):DpPointer)
effectively ensures that dataPointer is of type DpPointer, and extracts the LocalBuilder instance into the dataPointer variable. This is a nice construct to not only ensure I send in a LocalBuilder, but that I send in the correct LocalBuilder. The additional level of security is one of the strengths type safety brings to the table.
1 | // Increment dataPointer |
emitPlus and emitMinus increment and decrement the value at data[dataPointer], respectively. This is similar to before, the primary difference is how to get/set an array value. Also note that again I use F# types to ensure I am only ever working with the data[] and dataPointer variables. To get a value from an array onto the stack I push the array variable reference on the stack Ldloc <array variable>
, then the index Ldloc <index>
, then a command to push the value at array[index] onto the stack Ldelem_I4
. To set a value of an array the process is similar. Push the array reference, index, and value on the stack, then call a command to pop the top value of the stack into array[index], Stelem_I4
. Beyond getting values in and out of arrays, I just have to do the appropriate add/subtract to the value before doing the set.
1 | // Increment value in data[dataPointer] |
Now it is time to tackle the IO portion of BF. emitComma reads a character from stdin and saves it into data[dataPointer]. To do this, I need to read the keypress from the console and convert it to a char. In F# I would do let (keypressChar:char) = Console.ReadKey().KeyChar
. Using straight IL, I call the ReadKey
and get_KeyChar
functions. After that, it is a matter of saving the value into the data array as an int. emitPeriod displays the value at data[dataPointer] to stdout as a char. There isn’t a lot new here, push the value from the array onto the stack, then call Console.Write
. Of possible interest here is the parameter type char. If I make the parameter type be typeof<int>
, it would instead print the actual int value in the data array. Although not what I want as final output, its a useful note for debugging.
1 | // Read char from stdin into data[dataPointer] |
Now that all the ActionOps are implemented, I make a generic emitOp function. There isn’t anything particularly exciting here.
1 | // Generate IL for an operator |
Time to implement loop blocks, program, and ast processor. These are mutually recursive functions. For those new to F#, this is accomplished by let rec foo = <foo code> and bar <bar code>
syntax. It is a necessary construct when foo needs to call bar, and vice versa.
The loop is a list of operators, with an associated predicate to determine if the list should continue to execute. To do this the loop leverages labels for jumping between sections of code (the predicate section and the end of loop). This is done in a two step process. One, call DefineLabel()
to create the label, this allows it to be referenced. Two, MarkLabel
says where in the code the label belongs. As you can imagine, I put the predicate label at the begining of the predicate, and the loop-end label after the loop’s block of code.
Tackling the predicate logic; the looping works as follows, “Continue loop as long as data[dataPointer] != 0”. To test this, I push the value from the array onto the stack, then a 0. Ceq
does the compare, pushing the result onto the stack and Brtrue
branches out of the loop if the value equals 0.
Once the branching logic is complete, all that’s left is processing the list of ops. This is a good segue to emitOpList. Both a loop and program are a list of operators. This is simply a matter of iterating over the list and feeding each one into processAst.
Even though processAst is the controlling function for the emitting of the AST -> IL, it looks pretty tame to everything else so far. But that’s because the hard stuff has already been done. processAst provides a match that handles the 3 types of operators; and their respective functions take care of the rest of the work.
1 | // Generate IL for a loop |
Taking a breather from all that new stuff, I head back into familiar territory. This is the code from my second post in the series; emitting an IL program. The only change of significance is that I now insert the BF logic. This consists of creating the necessary variables and generating IL based on the AST. First, the variables: data, dataPointer, and programCounter. As mentioned above I don’t need programCounter, so it’s here for reference only. I use my previously defined “create variable” functions for this. I also wrap each variable in it’s respective type (MEM, DP, PC). This ensures that as I use them, I don’t send the wrong variable to the wrong function. Second, the IL generation with processAst.
1 | // Remove non-alphanumeric, to avoid object naming issues |
Tying the parser and emitter is done in the compile function. FParsec provides a Success/Failure when parsing a string. In this case, if success I emit a compiled program, otherwise I display the parsing error. I also filter out the comments prior to sending it to emitter.
1 | module Compiler = |
Here is a compilation of a Hello World program.
1 | let programName = "HelloWorld" |
Quite exciting, it worked!
Taking it a bit further, I can compile bf files from the commandline. To support this I remove the programName and code variables, and replace them with the below block. This will compile all *.bf files provided as arguments.
1 | Environment.GetCommandLineArgs() |
The primary goal is complete. I can now compile BF to run in the CLR. There are a couple directions I could follow at this point. Next time, going deeper down the rabbit hole: optimization.