Code Injection

Dec 4, 2022

I recently released an update to Bit Slicer that adds Code Injection support for arm64 or native Apple Silicon processes. Code injection allows injecting new code and logic into a running program. I will discuss how this functionality works in a debugger like Bit Slicer and what changes were made to enable this functionality for Apple Silicon.

How Code Injection Works

Bit Slicer has provided assembly-level Code Injection support for many years now for Intel or x86 processes.

First the user needs to find an unused region of memory or (more commonly for Bit Slicer) allocate a new region of memory to inject new code into. This is often called a “code cave” by game hackers.

Then the user finds an area of known used instructions and has Bit Slicer replace however many instructions are needed with a branch instruction that jumps to the new code cave. The user also inserts their custom code into this code cave, optionally copying the original instructions that were sacrificed for the jump.

The last instruction written in this code cave finally jumps back to the original code, following after the instruction that jumps to the code cave. An illustration of these steps is below:

Illustration of Code Injection

The result is that new code and logic can be added at runtime without overriding large essential portions of existing code.

Code Injection Using Branches

For Intel, Bit Slicer prefers to use a jmp instruction which occupies 5 bytes and 4 of those bytes are used for the relative offset to jump to which starts from the end of the jmp instruction.

For example, if the original instruction a user wants to overwrite is at 0x1000 and the new code cave is at 0x4000, then the original instruction can be overwritten as jmp 0x4000 and the offset at that location will be encoded as (0x4000 - (0x1000 + 0x5)) or simply 0x2FFB.

Note instructions vary in size on Intel, so Bit Slicer may end up overwriting multiple instructions depending on the chosen insertion point. Any remaining unused bytes from replaced instructions are overwritten with 1-byte NOP, or no-operation, instructions.

In x86_64 or 64-bit Intel processes a relative jmp will not suffice if the code cave is too far away and the signed offset cannot be represented using 32 bits. In this case, Bit Slicer falls back to executing an indirect jump which will need to use 3 instructions and overwrite 11 bytes of existing code instead of 5 from a relative jmp:

ORIGINAL_CODE:

; Branch to the code cave
; The original instruction(s) that were here were replaced by these in order to jump to the code cave
push rax
mov rax, NEW_CODE
jmp rax

BACK_TO_ORIGINAL_CODE:

pop rax
; ..rest of original code is here..

; ----------

NEW_CODE:

pop rax
; ..insert new code here..

; Now branch back to the original code
push rax
mov rax, BACK_TO_ORIGINAL_CODE
jmp rax

Overwriting more instructions, modifying the stack, and preserving registers that are spilled like this is less ideal than using a 5-byte relative jmp instruction. Overwriting more space requires more caution on choosing an adequate insertion point, especially if any of the instructions that are being overwritten encode relative offsets themselves.

Code Injection Using Breakpoints

On arm64, code injection using a relative branch instruction (b) can be done similarly with one important caveat: the relative offset can only hold up to 2²⁸ values rather than 2³² values for Intel. This is a big difference. For Intel, I was contempt with 32 bits because this was often large enough in my experiments and often did not require falling back to an indirect jump. Also instructions are variable size on Intel, so finding an insertion point that does not overwrite too many instructions can be feasible.

On arm64, however, 28 bits is often insufficient for jumping to a newly allocated block of code due to it being too far away. Performing an indirect jump using a 64-bit address while saving register state can occupy many more bytes and instructions than on Intel (around 6 instructions or 24 bytes by my calculations). Instructions on arm64 are always 4 bytes long which also makes finding a good insertion point more difficult.

Rather than implementing an approach again that would overwrite several instructions and spill several registers, I decided to go a different route. Bit Slicer now uses breakpoints and handles moving the instruction pointer to and from the code cave.

The way this approach works is as following: Bit Slicer overwrites an original instruction with a breakpoint instruction. When that instruction is executed, an exception for hitting a breakpoint is raised and Bit Slicer catches it. Bit Slicer then updates the instruction pointer register of the suspended thread to point to the code cave and resumes execution. Similarly, a breakpoint instruction is inserted at the end of the code cave so Bit Slicer can move execution back to the original code.

The advantages of this approach is that only one instruction needs to be overwritten and no registers need to be spilled to jump to a far away location. The disadvantages are that this requires debugger support and the program will incur a small performance penalty by having the debugger catch and handle the exception (however the debugger tries to be fast!). Bit Slicer shows a fake “emulated” branch instruction to the user when they look at the live disassembly of the app (similiar to how debuggers show the original instructions when they are overwritten with breakpoints).

Code Injection Example

At last, here is a simple example of using code injection in my game.

When players are knocked off the checkerboard stage, they normally move downwards on the z axis. The code in the game that handles this decreases the player’s z position variable repeatably at a frequent interval.

With code injection, I can add additional code here that also alters the player’s x location so that the player is falling in a more slanted manner.

Here is an example of the changes:

ORIGINAL_CODE:
fsub s0, s0, s1

; Branch to the code cave
; Note this might actually be a breakpoint underneath and be emulated
; This instruction was replaced and previously: str s0, [x8, 0x8]
b NEW_CODE

BACK_TO_ORIGINAL_CODE:
; This is an original instruction that remains untouched
ldur x8, [x29, -0x20]

; ------------

NEW_CODE:
; Copy over and execute the original instruction making the player’s z position decrease
str s0, [x8, 0x8]

; Preserve registers for s0 and s1 on the stack which will be used later
; Note the stack must be 16-byte aligned on Apple platforms
stp s0, s1, [sp, -0x10]!

; Subtract the player’s x position by 0.25 too
; (Note if the player's z position is a float at offset 8, then it's very likely the y position is at offset 4 and x offset is at offset 0)
ldr s0, [x8]
fmov s1, 0.25
fsub s0, s0, s1
str s0, [x8]

; Restore registers for s0 and s1
ldp s0, s1, [sp, 0x10]!

; Branch back to the original code
; Note this might actually be a breakpoint underneath and be emulated
b BACK_TO_ORIGINAL_CODE

This is just a simple example, but the possibilities are endless.

Conclusion

I went over how Bit Slicer provides assembly-level code injection and how I adapted the debugger to handle injecting code in arm64 more feasibly.

As a final note, this is not the only method to inject code. Some other ways not discussed here may be:

Injecting a compiled dylib when spawning the process using the DYLD_INSERT_LIBRARIES environment variable. This might include an initializer that runs when the library is loaded.
Creating and injecting a new thread that executes new code the user crafts.
Exploiting a bug in the targetted software.