In the last post we took a look at how to manually remove invalid opcodes from an obfuscated assembly. We did this by decompiling the assembly, replacing with the nop opcode and then recompiling. We used this manual method of removing these as Mono.Cecil crashed at the sight of some of the invalid opcodes. In this post we take a look at a tiny "hack" to Mono.Cecil which allows us to do the same thing in an automated manner.

Recap: What needs fixing?

Please note: An assumption is made in this article that all invalid opcodes are single byte opcodes; this example does not cater for invalid double byte opcodes.

Well, to work out what needs fixing, we'll firstly write some code that we'll use to break Mono.Cecil (and for testing):

//Load the assembly
var assembly = AssemblyFactory.GetAssembly(
    @"D:\temp\Obfuscated\SimpleLibrary.dll");

//Output the il for each method in the assembly
foreach (TypeDefinition type in assembly.MainModule.Types)
{
    //Go through each method
    foreach (MethodDefinition def in type.Methods)
    {
        //Check the body
        if (def.HasBody)
        {
            //Get the CIL worker
            CilWorker worker = def.Body.CilWorker;

            //Chuck the bad instructions in here to avoid modifying the collection
            List<Instruction> instructionsToFix = new List<Instruction>();

            //Go through each instruction
            foreach (Instruction instr in def.Body.Instructions)
            {
                //TODO: Some how figure out if it is one to fix and add it to be fixed
            }

            //Go through the ones to fix and replace
            foreach (Instruction instr in instructionsToFix)
            {
                Instruction newInstr = worker.Create(OpCodes.Nop);
                worker.Replace(instr, newInstr);
            }
        }
    }
}

//Save the assembly
AssemblyFactory.SaveAssembly(assembly, @"D:\temp\Obfuscated\SimpleLibrary.new.dll");

This is some pretty basic code which simply goes through each type and each method inside an assembly and replaces all invalid opcodes with a nop.

When we run this code using the default version of Mono.Cecil we unfortunately come across an error:

Mono.Cecil didn't like an opcode Mono.Cecil didn't like an opcode

Now we know what we're fixing!

Getting the source

First of all, we need to get the source for Mono.Cecil to start working with it. Rather than get the entire Mono system, I decided to just check out the project that I needed via SVN:

svn co svn://anonsvn.mono-project.com/source/trunk/mcs/class/Mono.Cecil

Unfortunately the project won't compile by itself due to the .snk file being located in a directory one up from Mono.Cecil. For this example I simply turned off assembly signing to get this compiling, however please feel free to download the .snk file and place it in the appropriate location to have a fully signed version of Mono.Cecil.

Hacking Mono.Cecil

Now that we've got the source and it's compiling; let's hack it. Now, from the screenshot you'll see that the error is sourcing from the CodeReader class on line 207 (in my copy anyway). Taking a look in the code at that line we see the following switch statement:

if (cursor == 0xfe)
    op = OpCodes.TwoBytesOpCode [br.ReadByte ()];
else
    op = OpCodes.OneByteOpCode [cursor];

Instruction instr = new Instruction ((int) offset, op);
switch (op.OperandType) {
case OperandType.InlineNone :
    break;
...
case OperandType.InlineTok :
    MetadataToken token = new MetadataToken (br.ReadInt32 ());
    switch (token.TokenType) {
...
    default:
        throw new ReflectionException ("Wrong token: " + token);
    }
    break;
}

That's our error message alright; and it seems to be happening because it is going into OperandType.InlineTok. Hmmm... well, ideally we'd like to go into InlineNone due to not having any subsequent operand. As you can see, the OperandType comes from the variable op which is defined by the lines:

if (cursor == 0xfe)
    op = OpCodes.TwoBytesOpCode [br.ReadByte ()];
else
    op = OpCodes.OneByteOpCode [cursor];

Well, since we're only working with one byte op codes in this example, let's concentrate on that. The OpCodes.OneByteOpCode variable is actually an array which places each opcode as a position in the array according to it's byte code representation; for example: index 0 = 0x00 = nop, index 1 = 0x01 = break ... etc. In one of our previous articles, we placed several invalid opcode bytes throughout the code; all within a certain subset: 0xbe, 0xc0, 0xc1... etc. Therefore, our invalid opcodes should be at the specified index of OneByteOpCode; i.e. 190, 192, 193... etc.

Still following? Essentially to solve this problem we need to see what opcodes are being defined at these indexes in Mono.Cecil at runtime. Well, as we all know, a struct is never null therefore the object at each of those "unused" opcode indexes is an empty struct (i.e. all variables left uninitialised). Due to the way that the Mono.Cecil OpCode object works, this gives us a confusing result stating that the size of the OpCode is two bytes - even though it is in the one byte array (check out OpCode.Size property to see why).

No wonder it causes problems! So how do we fix this? Well, for a start we should initialise the array inside the OpCodes class to avoid this issue:

static OpCodes()
{
    //Start from first index to avoid nop
    for (int i = 1; i < OneByteOpCode.Length; i++)
    {
        //Check to see if it is listed as an arglist... but not one
        if (OneByteOpCode[i].Op2 == 0x00 && OneByteOpCode[i].Code != Code.Arglist)
        {
            OneByteOpCode[i] = new OpCode(0xff, (byte) i, Code.Unused, FlowControl.Next, OpCodeType.Primitive,
                                          OperandType.InlineNone, StackBehaviour.Pop0, StackBehaviour.Push0);
        }
    }
}

Basically we are looking for all OpCodes that haven't been initialised properly; that is those with Op2=0x0. We have to be careful however: both Nop and Arglist use an empty Op2 correctly - therefore we intentionally skip these ones. Now, if you copied and pasted this into your code it will complain about the variable Code.Unused. To make things cleaner I simply added a new option to the Code enum so that identification of invalid OpCodes is nice and easy. The reason I use the word "unused" is really so that it is inline with how ILDASM sees an invalid OpCode.

Before we finish hacking Mono.Cecil; there is one more "aesthetic" change that I thought I'd make. Technically, the change above fixes the issue for us; however being the pedantic guy that I am, I also wanted to fix the "ToString()" method so that it'd display "unused" instead of "arglist" when an invalid OpCode is present. Well, it actually isn't a hard aesthetic fix to make. Simple find the Name property in the OpCode class, and use the following:

  public string Name {
   get {
    int index = (Size == 1) ? Op2 : (Op2 + 256);
    return OpCodeNames.names [index] ?? "unused";
   }
  }

Now to test it all...

Testing our results

As you'll remember; I declared a new enum member: Code.Unused. It starts to come in use when we rewrite our testing program:

//Load the assembly
var assembly = AssemblyFactory.GetAssembly(
    @"D:\temp\Obfuscated\SimpleLibrary.dll");

//Output the il for each method in the assembly
foreach (TypeDefinition type in assembly.MainModule.Types)
{
    //Go through each method
    foreach (MethodDefinition def in type.Methods)
    {
        //Check the body
        if (def.HasBody)
        {
            //Get the CIL worker
            CilWorker worker = def.Body.CilWorker;

            //Chuck the bad instructions in here to avoid modifying the collection
            List<Instruction> instructionsToFix = new List<Instruction>();

            //Go through each instruction
            foreach (Instruction instr in def.Body.Instructions)
            {
                //Remove invalid opcode
                if (instr.OpCode.Code == Code.Unused)
                    instructionsToFix.Add(instr);
            }

            //Go through the ones to fix and replace
            foreach (Instruction instr in instructionsToFix)
            {
                Instruction newInstr = worker.Create(OpCodes.Nop);
                worker.Replace(instr, newInstr);
            }
        }
    }
}

//Save the assembly
AssemblyFactory.SaveAssembly(assembly, @"D:\temp\Obfuscated\SimpleLibrary.new.dll");

We use Code.Unused to test for an invalid opcode to replace. What are the results? Well, Reflector can now decompile the code as per usual (again):

Reflector now works ok again Reflector now works ok again

Conclusion

This week we took a look at "fixing" the problem with Mono.Cecil when we reached an invalid OpCode. Essentially to fix the problem in Mono.Cecil involved:

  • Creating a new enum member Code.Unused so that we can identify invalid opcodes
  • Initialising the static array with our invalid opcodes: OpCodes.OneByteOpCode. This helped provide us with accurate opcode descriptions in unused positions.
  • (Optional) Changing OpCode.Name to return an accurate friendly name for invalid opcodes.

Once Mono.Cecil could handle these Opcodes, we had no problem whatsoever writing an automated tool to "fix" the assembly for us. It certainly doesn't take much to reverse some of the "value added" obfuscation techniques does it!?

Next time

Well, that's all for this week. If you have any questions/suggestions/notes, then please let me know. Not sure what the next article will be about yet, however I'll be sure to make it something interesting (perhaps tamper proofing?). What are your thoughts?

kick it on DotNetKicks.com   Shout it

3 comments:

  1. Alex said...

    great article!

    as usual for all your articles!

  2. Anonymous said...

    I have been following your posts for some time now and I appreciate your work! Two things that I would like to see: Can you use mono.cecil to call a string decryption method in a loaded assembly? I think that would be better than having to copy the il code and "paste it" into your own assembly. Also I would like to see some discussion on code flow transformations.

    Keep up the good work!

  3. paulmason said...

    Thanks for your feedback. I'll keep these in mind and post about each in future articles!