I recently released an update to Bit Slicer that adds Code Injection support for arm64 or native Apple Silicon processes. Code injection allows injecting new code and logic into a running program. I will discuss how this functionality works in a debugger like Bit Slicer and what changes were made to enable this functionality for Apple Silicon.
Bit Slicer has provided assembly-level Code Injection support for many years now for Intel or x86 processes.
First the user needs to find an unused region of memory or (more commonly for Bit Slicer) allocate a new region of memory to inject new code into. This is often called a “code cave” by game hackers.
Then the user finds an area of known used instructions and has Bit Slicer replace however many instructions are needed with a branch instruction that jumps to the new code cave. The user also inserts their custom code into this code cave, optionally copying the original instructions that were sacrificed for the jump.
The last instruction written in this code cave finally jumps back to the original code, following after the instruction that jumps to the code cave. An illustration of these steps is below:
The result is that new code and logic can be added at runtime without overriding large essential portions of existing code.
For Intel, Bit Slicer prefers to use a jmp
instruction which occupies 5 bytes and 4 of those bytes are used for the relative offset to jump to which starts from the end of the jmp
instruction.
For example, if the original instruction a user wants to overwrite is at 0x1000
and the new code cave is at 0x4000
, then the original instruction can be overwritten as jmp 0x4000
and the offset at that location will be encoded as (0x4000 - (0x1000 + 0x5))
or simply 0x2FFB
.
Note instructions vary in size on Intel, so Bit Slicer may end up overwriting multiple instructions depending on the chosen insertion point. Any remaining unused bytes from replaced instructions are overwritten with 1-byte NOP, or no-operation, instructions.
In x86_64 or 64-bit Intel processes a relative jmp
will not suffice if the code cave is too far away and the signed offset cannot be represented using 32 bits. In this case, Bit Slicer falls back to executing an indirect jump which will need to use 3 instructions and overwrite 11 bytes of existing code instead of 5 from a relative jmp
:
Overwriting more instructions, modifying the stack, and preserving registers that are spilled like this is less ideal than using a 5-byte relative jmp
instruction. Overwriting more space requires more caution on choosing an adequate insertion point, especially if any of the instructions that are being overwritten encode relative offsets themselves.
On arm64, code injection using a relative branch instruction (b
) can be done similarly with one important caveat: the relative offset can only hold up to 228 values rather than 232 values for Intel. This is a big difference. For Intel, I was contempt with 32 bits because this was often large enough in my experiments and often did not require falling back to an indirect jump. Also instructions are variable size on Intel, so finding an insertion point that does not overwrite too many instructions can be feasible.
On arm64, however, 28 bits is often insufficient for jumping to a newly allocated block of code due to it being too far away. Performing an indirect jump using a 64-bit address while saving register state can occupy many more bytes and instructions than on Intel (around 6 instructions or 24 bytes by my calculations). Instructions on arm64 are always 4 bytes long which also makes finding a good insertion point more difficult.
Rather than implementing an approach again that would overwrite several instructions and spill several registers, I decided to go a different route. Bit Slicer now uses breakpoints and handles moving the instruction pointer to and from the code cave.
The way this approach works is as following: Bit Slicer overwrites an original instruction with a breakpoint instruction. When that instruction is executed, an exception for hitting a breakpoint is raised and Bit Slicer catches it. Bit Slicer then updates the instruction pointer register of the suspended thread to point to the code cave and resumes execution. Similarly, a breakpoint instruction is inserted at the end of the code cave so Bit Slicer can move execution back to the original code.
The advantages of this approach is that only one instruction needs to be overwritten and no registers need to be spilled to jump to a far away location. The disadvantages are that this requires debugger support and the program will incur a small performance penalty by having the debugger catch and handle the exception (however the debugger tries to be fast!). Bit Slicer shows a fake “emulated” branch instruction to the user when they look at the live disassembly of the app (similiar to how debuggers show the original instructions when they are overwritten with breakpoints).
At last, here is a simple example of using code injection in my game.
When players are knocked off the checkerboard stage, they normally move downwards on the z axis. The code in the game that handles this decreases the player’s z position variable repeatably at a frequent interval.
With code injection, I can add additional code here that also alters the player’s x location so that the player is falling in a more slanted manner.
Here is an example of the changes:
This is just a simple example, but the possibilities are endless.
I went over how Bit Slicer provides assembly-level code injection and how I adapted the debugger to handle injecting code in arm64 more feasibly.
As a final note, this is not the only method to inject code. Some other ways not discussed here may be:
DYLD_INSERT_LIBRARIES
environment variable. This might include an initializer that runs when the library is loaded.