Continuing with the BF compiler, it’s time to look at how to create code targeting the CLR. As before, I will be using F# to generate the target MSIL.
Series:
Part 1 (Parsing)
Part 2 (IL Generation)
Part 3 (Compiler)
Part 4 (Optimization)
There are a couple aspects regarding IL generation. First, the how. System.Reflection
and System.Reflection.Emit
are the namespaces that contain the functionality of interest. Second, the what. A basic application shell is a good place to start. Once that’s in place, I’ll discuss the third part, the what inside the what. F# can be used to to emit IL into the sample application for expanded functionality.
The ultimate goal is it compile BF into MSIL. Before doing the fun stuff, it’s a good idea to step back and see what is involved in creating a basic application. Without any of the fancy things, what is the bare minimum I need to get something that runs. To do that my short-term target is to get a basic application that executes. Once this is in place I can start looking at custom IL code. The base application that I generate has the general structure:
Application Domain
Assembly
Module
Class (Program)
Method (Main)
- Code for main
The application generation code is short, so I’m just going to put it all together below. For now I bootstrap AppDomain creation by just using the current domain. Then I create the assembly and module. When I define the main class Program I want it to be in the Foo namespace. This is done by using the fully qualified name in the DefineType
call. For the sample, I will have just one method, Main. If I wanted to create multiple methods in the Program class, I could make additional MethodBuilder
instances attached to programType. Now that Main is created, I make that the entry point for the assembly. GetILGenerator
is a glimpse of things to come. This is how the IL creation happens. Since the generator is attached to the MethodBuilder for Main, the code is injected into that method definition. For the example it is a simple WriteLine
and return. Now that the Program class is completed, I create it. All that is left is to write the code to the file. Well, that was easy.
1 | System.IO.Directory.SetCurrentDirectory(__SOURCE_DIRECTORY__) |
And here is the program running. It’s not much to look at, but it is good for a starting framework.
Now that I’ve seen it run, it’s time to take a look into what was generated. For this I use JetBrains’ dotPeek decompiler. Below is a screen shot of the decompilation. The application is about as minimal as it gets, but the code is as expected. This is also pretty close to a bare C# console application. It has a Program class in the Foo namespace. There is a Main function with the WriteLine. Looks like things have worked as planned.
Time to take things up a small notch. I now want to add 5 + 37 and display the results. Before I get started I’m going to include a reference to the IL OpCodes. I also want to mention, this isn’t meant to be a deep dive into the CLR internals; it’s just a basic starter. But you should fine if you know that the you push things onto a stack to use them, and they get popped off as they are used. You also have access to local variables for more “persistent” storage. Remember those old Assembler classes from school? It’s kind of like that.
All of the following code will be inserted between the existing lines mainIl.EmitWriteLine("A small program")
and mainIl.Emit(OpCodes.Ret)
of the above code. The goal is to insert additional functionality to the Main method call.
First, push the numbers 5 and 37 onto the stack. Second, add the top two values of the stack (5 & 37) and push the result onto the stack (42).
1 | mainIl.Emit(OpCodes.Ldc_I4, 5) |
Important to remember, when something uses a value off the stack, it’s popped and gone forever. I want to do two things with my resulting 42, so I’ll emit a Dup
call. Now I have 2 42s at the top of my stack.
1 | mainIl.Emit(OpCodes.Dup) |
Now for the result, I’m going to pop the first 42 off the stack and print it’s integer representation. Then I’ll pop the second 42 off the stack and print it’s ASCII char representation (decimal 42 = *). When printing, the distinction is made by parameter type of the call. To parse the emit in more detail, what is happening? EmitCall(OpCodes.Call
: going to make a function call. typeof<Console>.GetMethod("Write", [| typeof<int> |])
: The function to call is Console.Write
, which has 1 input parameter (an int in this case). null
: the call has no output parameters. This matches with my understanding of the call when used in C#. Since the call takes 1 parameter, it will pop 1 value off the stack to meet it’s needs. This is the general pattern for function calling. You’ll see more of this in future posts.
1 | // Print the numeric value on the top of the stack (42) |
Take two, now with more awesomeness.
Taking a look at this decompiled version, the results are a bit more interesting. The middle panel shows the C# representation, 5 + 37 into a variable, then two Console.Write
s, one with a char cast. The right panel shows the IL, which looks remarkably like what I emitted. Again, it looks like things are working as planned.
This concludes part 2 of the series. I can now parse BF source code, generate IL, and create an exe. Next time I start putting these pieces together into something more interesting.