# Sheb-Teth (LayerOne 2017 CTF)

[We get a binary!](Sheb-Teth.bin_pristine) I don't remember any more hints to it than just that. There was flavor text, but I've lost it.

If we `strings Sheb-Teth.bin` we'll see some helpfully embedded strings to guide us in the right direction:

<pre>
    [snip]
    ...
    [^_]
    [^_]
    <b>#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.
    . @emptymonkey            https://github.com/emptymonkey            2017-03-10 #
    #                                                                              .
    . Welcome to the LayerOne 2017 CTF "Sheb-Teth" RE challenge.                   #
    # There are six flags. flag_0, flag_1, ..., flag_5.                            .
    . Each flag is in a uuid format. e.g. c8da2132-382e-11e7-9862-507b9d8156b4     #
    # When prompted, enter the flag you would like to test.                        .
    . This program will tell you if the flag you entered is a valid flag.          #
    # (You know how to speak R'lyehian, right?!)                                   .
    . Enjoy!                                                                       #
    .#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#.#</b>
    uln sheb teth  n'ghaoth  sll'ha throd
    mglw'nafh fhthagn-ngah cf'ayak 'vulgtmm vugtlag'n
    mnahn' r'luh
    ...
    [snip]
</pre>

Bold added by me, for emphasis. It probably would have been a good idea to check out that github link, but I totally tunnel vision'd over it at first.

So, we know this binary is what we'll be getting used to for all six flags in Sheb-Teth.

## Flag 0

Continuing to look through `strings` output we'll find `f36c454e-3831-11e7-a8b5-3fb45ccb41a5`, and if we try to run the program with that as input... we'll find it quits with an error message:

    Unclean mortal! Do not return until you wield *true* power!!

.. wtf, right?

Opening this up in radare2 and looking at the start of `sym.main`:
(for those playing along at home, `aaa` to look for functions, then `pdf @ sym.main` to **p**rint the **d**isassembled **f**unction _at_ the **sym**bol **main**)

<div class="codebox">
#eval radare2 -q -A -c 'pdf @ sym.main' ./Sheb-Teth.bin_pristine | aha --no-header --stylesheet
</div>

in main it puts some strings in locals, which is fine, whatever... and then calls getuid(), followed by testing if the returned uid is 0 (eg, root). So if this program is not run as root, it'll write one `0x3d` byte string to stderr, the `Unclean mortal!` thing we saw before.

I dunno about you, but I don't want to run this as root. I don't know GDB too well, and didn't want to figure out how to script it, so the easier option is to patch<sup>[1](#patching_footnote)</sup> the binary. So I replaced the 0x74 at file offset 0x96b with 0xeb, which changes `je 0x804898e` into `jmp 0x804898e`.

With that out of the way, and the patch applied, we can run the program, enter the flag from before, and see it is, in fact, a flag.

+50.

(yes, you can easily see the string in the diassembly too)

.. and remember to actually submit the flag, or someone else on your team might do it first! I say, speaking from personal experience..

## Flag 1

No more flags in the strings output, it's about time to start looking a little more in-depth at this thing.

Going back to `main`, we can see the gist of this program is:

1. getuid() to check for root
1. printf() some lovecrafty text
1. fgets() to read a string
1. print a "\\n" after user input
1. strchr() the input for '\\n' (no idea why!)
1. `if (check_flag_0(input)) { puts("confirmed"); return }`
1. `if (check_flag_1(input)) { puts("confirmed"); return }`
1. `if (check_flag_2(input)) { puts("confirmed"); return }`
1. `if (check_flag_3(input)) { puts("confirmed"); return }`
1. `if (check_flag_4(input)) { puts("confirmed"); return }`
1. `if (check_flag_5(input)) { puts("confirmed"); return }`

which means we can look at each flag check independently, and hopefully none of the `check_flag_N` functions interact with each other or some global state. By luck and the benevolence of emptymonkey, they happen to not, and we can in fact look at the flag checking functions independently.

`pdf @ sym.check_flag_1` (most of it, anyway):

<div class="codebox">
#eval radare2 -q -A -c 'pdf @ sym.check_flag_1' ./Sheb-Teth.bin_pristine | aha --no-header --stylesheet
</div>

Immediately we can see that a bunch of data is being `mov`'d onto the stack, followed by a `calloc()`, a call to some `xorscura_decrypt()` function, and a `strncmp()`. Interestingly, the stack data is written one byte at a time, suggesting the compiler didn't want to optimize too hard - going four bytes at a time would save ~81 bytes.

In the interest of lazy, it's probably safe to assume that the decoded flag is being `strncmp()`'d with the user input, so it's time to learn how to use GDB a little!

`gdb Sheb-Teth.bin` is good to get started, then set a breakpoint at the `strncmp()` call of interest with `b *0x8048db6`. The `*` notation is GDB for specifying an address (rather than a symbol).

Now, `r` to run the program, enter some string as input, `c` to continue on and hope all goes as planned with us eventually hitting that breakpoint.

We do hit it, and at this point the program is paused immediately before calling `strncmp()`, with the stack set up just as it needs to be. `x/3x $esp` shows the three 32-bit integers from where `esp` points, going upwards with each successive one, which corresponds to `strncmp()` arguments in left to right order

* `0x0804c868` should be one string (which turns out to be the flag)
* `0x0804c008` should be another (which turns out to be ours)
* `0x00000024` is the `size_t n` parameter, the length to compare up to.

Flags are 36-character UUIDs, so `0x24` makes sense. `x/s 0x804c008` shows our initial input, so `x/s 0x804c868` should show us a flag - which it pleasantly does.

+100.

## Flag 2

`pdf @ sym.check_flag_2`

<div class="codebox">
#eval radare2 -q -A -c 'pdf @ sym.check_flag_2' ./Sheb-Teth.bin_pristine | aha --no-header --stylesheet
</div>

This is where things get more fun. The rest of the challenges involve some measure of anti-debugging, and here we can see:

1. `open()` `/proc/self/status`
1. store that file descriptor at `ebp + 0x18`
1. `read()` that file descriptor a bit
1. `read()` a little more
1. `read()` one last time into the buffer at `ebp + 0x59h`. Looking a little more closely shows all `nbyte` values for the *number* of bytes to read is 1 here, and before.
1. `close()` the file descriptor
1. `atoi()` the byte that was read into `ebp + 0x59h`
1. if `atoi()` returned 0, skip over `eldritch_function()`, otherwise call it.
1. same flag-checking mechanics from here as in `check_flag_1`

So, clearly, something is interesting about `/proc/self/status`. I'm not familiar with it, so it's time to take a look:

        Name:	vim
    State:	R (running)
    Tgid:	25100
    Ngid:	0
    Pid:	25100
    PPid:	11187
    TracerPid:	0
    Uid:	1000	1000	1000	1000
    Gid:	1000	1000	1000	1000
    [snip]

Most of these fields are clear enough, though I'm not sure about `Tgid`, `Ngid`, `PPid`, or `TracerPid`. It's probably safe to assume the `gid` are related to the group id of the `self` process, where `PPid` is probably the parent PID, leaving `TracerPid` as an unknown. I'm going to guess that if a debugger is attached to the process, TracerPid probably indicates what's *tracing* the process, so probably the pid of the debugger attached, with 0 for `there isn't a debugger attached`.

Going back to the code, the first `read()` use is in a loop counting down from 7, comparing each byte read to `0xa` (eg, `"\n"`), so it's seeking down to the 8th line. That would be the `Uid` line, which threw me initially. This is right after the `TracerPid` line which is significantly more interesting here. I'm quite certain this is actually a bug on most systems but *happens* to work because `Uid` is "supposed" to be 0 (run as root). I patched the `7` in `mov dword [local_10h], 7` to `6` so it looks in the right place and doesn't segfault when run as non-root outside a debugger.

Then, the second `read()` loop goes forward until it reads a `0x9` ("\\t") (or a null, but it won't). The fields in `/proc/self/status` are tab-delimited, so it's scanning forward to the first number.

The third `read()` reads the first digit of whatever number we're at, which will be 0 if the number is 0, or any non-0 digit otherwise, beause these values are not 0-padded.

So when the program `atoi()`'s whatever digit we read, it'll that number as a value in eax - if the intent was to read the number on the `TracerPid` line, this would check for a debugger. As things stand, this is a duplicate `uid` check for 0.. I think.

One way or the other, this ends up calling `eldritch_function` which ends with a fairly nonsensical `ret 0xcfd`, almost guaranteeing a segfualt when it returns. Which it, in fact, does.

Again the fan of binary patching, I replaced the `0x74` at `0xf38` in the binary with `0xeb`, changing the conditional jump into a constant jump, bypassing `eldritch_function()` no matter what.

Throw a breakpoint on the `strncmp` here at `0x8048ff0`, run again, and `x/3x $esp` and follow the two pointers there for `x/s 0x804c868`, which prints out another flag!

+200

## Flag 3

<div class="codebox">
#eval radare2 -q -A -c 'pdf @ sym.check_flag_3' ./Sheb-Teth.bin_pristine | aha --no-header --stylesheet
</div>

This uses `signal()`, which I'm not very familiar with either. A quick jaunt over to `man 2 signal` suggests that `signal()` will be called with a signal number, and a handler - some a function pointer. This makes some sense given radare is telling us that the arguments to `signal()` would be `5` and `sym.trap`, an address of the function in this binary named `trap`.

Additional interesting bits are the facts that `signal()` is called in a loop, and that `signal()` is immediately followed by an `int3`, which signals a breakpoint for an attached debugger.

So, what's special about `5`? `/usr/include/asm-generic/signal.h` defines signal numbers for us, showing `5` is the number for `SIGTRAP`. `man 7 signal` further informs us that `SIGTRAP` is a signal for `Trace/breakpoint trap`.

At this point the trick gets clearer: `signal()` is being used to set a handler on SIGTRAP, but if a debugger is attached *it* will get the breakpoint signal first, and probably not duplicate that along to the debugged process out of the box, so the handler (`trap()`) will never get called. Looks like we'll have to see what `trap()` itself does.

<div class="codebox">
#eval radare2 -q -A -c 'pdf @ sym.trap' ./Sheb-Teth.bin_pristine | aha --no-header --stylesheet
</div>

`trap()` does nothing but load a global variable, `elder_sign`, add to it, and store it back!

Skipping forward in `check_flag_3` a little, we can see `elder_sign` is used before calling `xorscura_compare()`, so `trap` being called the correct number of times for it all to line up is probably important.

I'd look into trying to send SIGTRAP to the debugee and scripting that in gdb, but it seems simpler to patch the binary again to behave in a nicer way. Instead of calling `trap()` in a roundabout way through signal handlers, let's patch the loop body to just.. call `trap()`. So that's exactly what I did! At 0x10e2 in the binary, I nop'd out until 0x10f5 with 0x90 to clear some space. I stopped writing out nops there because we'd overwrite the `add dword [ebp - 0xc], 1` that eventually trips the loop exit condition, which we want to preserve. Afterward, I wrote in `call sym.trap` at 0x10e2. Beause call is an instruction with a four byte offset, we need to figure out the relative distance to `trap()` from 0x80490ea. This comes out to 0xffffff38, for the whole instruction being `e838ffffff`. Writing that into the binary at 0x10ea should preserve the "call trap() once each loop" behavior, but not break under debugging.

### `xorscura_compare()`

With that rudeness out of the way, we can move on with debugging this program, which gets into `xorscura_compare()`, a function we hadn't seen until now.

<div class="codebox">
#eval radare2 -q -A -c 'pdf @ sym.xorscura_compare' ./Sheb-Teth.bin_pristine | aha --no-header --stylesheet
</div>

It looks like there are two main branches to this function, one of which is taken if a value in the struct passed in `local_14h` is null, the other taken if it's set. If it's set, `xorscura_compare_prng` is called, so I imagine that'll come into play for a later flag, with this value used in relation to the prng.

In the other case, which is what we see here, some `xor`'ing and `cmp`'ing is done. It's probably safe to say the flag is decoded and compared to user iput, so by putting a breakpoint on each `cmp` after a `xor`, we should be able to see flag bytes in registers.

There's an added complication, in that it bails out on the first mismatch. So let's patch a that out... the `jne 0x8049e30` at `0x8049e00` has got to go. That bails if the first bytes don't match. At 0x1e00, I replaced that with `6690`, a two-byte nop. Now we have to give the same treatment to the `jne 0x8049e30` at `0x8049e20`.

At this point, breakpoint'ing at `0x8049dfe` and `0x8049e1e` are sufficient to get us flag bytes, but we've accidentally made this function *always* return "true". That means `flag 4` and `flag 5` will be hard to get to. So let's fix that up by transmuting `xor eax, eax` at `0x80489e29` into `inc eax; inc eax`. That way it only returns 0 if `eax` happened to be -2, which is... exceedingly unlikely. That one just replacing the `31c0` at `0x1e29` in the binary with `4040`.

And it is at exactly this point we may realize that for flag 3, we actually follow the branch to `xorscura_compare_prng()`. Don't worry, the above patching comes in handy for later flags.

#### `xorscura_compare_prng()`

<div class="codebox">
#eval radare2 -q -A -c 'pdf @ sym.xorscura_compare_prng' ./Sheb-Teth.bin | aha --no-header --stylesheet
</div>

1. `calloc()` 0x1c bytes
1. `initstate_r` with that struct
1. do some stuff and maybe `random_r` with that struct
1. after `random_r`, bytes are decoded one at a time, in groups of four, each compared to a corresponding byte of user input
1. if any decoded byte does not match up, return 1.

Much like in `xorscura_compare()`, this will exit early if any bytes of the flag don't match, but it'd be nice for it to just print out all bytes of the flag.

nop'ing out conditional jumps at `0x8049cf6`, `0x8049d1c`, `0x8049d38`, and `0x8049d54` should do the trick, though we'll need to "fix" `xor eax, eax` at `0x8049d70` just as before, where `inc eax; inc eax` will do.

<div class="codebox">
#eval radare2 -q -e io.cache=true -c 'wx 666666666690 @ 0x8049cf6' -c 'wx 6690 @ 0x8049d1c' -c 'wc 6690 @ 0x8049d38' -c 'pd 51 @ 0x8049cd9' ./Sheb-Teth.bin_pristine | aha --no-header --stylesheet
</div>

Now with breakpoints at `0x8049cf4`, `0x8049d1a`, `0x8049d36`, and `0x8049d52` and `display/c $eax` we'll get a flag one byte at a time when run and debugged!

+300

## Flag 4

<div class="codebox">
#eval radare2 -q -A -c 'pdf @ sym.check_flag_4' ./Sheb-Teth.bin | aha --no-header --stylesheet
</div>


This function just *looks* annoying, what with the opens and `ptrace()` and all this.. _stuff_ going on. Also the `fork()`, meaning I'd have to learn how to have gdb debug child processes as well. All of that sounds annoying, but here's an idea: this still calls `xorscura_compare()`. And, it turns out `check_flag_5` *also* calls `xorscura_compare()`. So let's put this fork/ptrace business aside a moment and look at that.

Now, avoiding the anti-debug tricks is annoying. Wouldn't it be easier if the program just printed out the flag bytes?

How could we make that happen?

### `puts()` to the rescue

I noticed earlier that `puts()` is used in `main()`, and it has a dead-simple interface. Pass it a pointer, it prints characters until it reaches a null.

Flags here are available 1-4 bytes at a time, but tinkering with imports to get `putchar` exposed is absolutely not how I wanted to spend time - `puts()`'ing the flag one byte at a time is plenty good for CTF.

How can we cram a `puts()` call in though? Calls are large (5 byte) instructions, and we only have a few bytes to work with from the `cmp` and corresponding `jne`.

.. Or do we? If we assume inputs are all multiples of 4 (which they are, because flags are multiples of 4 and we'll just type in the right stuff), we can remove all the `lea eax, [ebx + N]; cmp edx, eax; je 0x8049da0`. That lets us go from 4 bytes to 11 bytes to work with. We can even discard the user input string entirely and consider `movzx ebp, byte [edi + ebx + 1]` space up for grabs too. That puts us at 16 bytes of code space to mess with for each byte of the flag we want to print, with four bytes we want to print.

#### *no matter what we need to be careful to not clobber registers*

So what registers are preserved by the ABI? That is, what do we have to worry about being trashed by putting in a random `puts()` or four?

`ebx`, `esi`, `edi`, `ebp`, and `esp`

At each of these `xor`, `edi`, `esi`, `edx`, `ecx`, `ebx`, and `eax` are used.

So we have to worry about `edx`, `ecx`, and `eax`. Push/pop are one byte each, call is five, for a total of 11 bytes if we can arrange the stack right. This isn't quite small enough.

Maybe we can move one of these values into a register that won't be trashed by `puts()`? We won't be using `edi` anymore, which *was* used for the user-input flag string. So let's patch the `mov edx, dword [esi]` into a `mov edi, dword [esi]` instead, saving us two bytes in future patches, making this insanity workable! In addition, we'll need to patch `cmp` at `0x8049d01`, `0x8049d21`, `0x8049d3d`, and `0x8049d59` to all use `edi` instead of `edx`. Respectively, they change to `39c7`, `39c7`, `39c7`, and `39fd`. The `mov` at `0x8049cfc` also will turn into `8b3e`.

For some reason, the compiler decided to put the last value of each loop in `edi` instead of `ebp`, but that doesn't matter because we'll overwrite that soon anyway.

What's the nicest way to get the current flag byte at a pointer to print, though? In these, probably prefixing a call `puts()` with `push eax`, `push esp`. `eax` already having the character to compare to for the flag, `esp` then being a pointer to it for `puts()`.

We can take advantage of `eax` being an argument to only push it once, meaning at the end of it we'll want write in something that calls `puts()` but also preserves registers that have values that will be used - `ebx`, `ecx`, and `esp`. `ebx` and `esp` are preserved by the ABI, so we only have to save `ecx` ourselves.  I went with something like `push ecx; push eax; push esp; call sym.puts; pop eax; pop eax; pop ecx`. This gives us a total of 11 bytes of code to insert, so let's see if that's small enough...

In `xorscura_compare_prng()` we'll want to patch at `0x8049ce9`, `0x8049d0e`, `0x8049d2a`, and `0x8049d46`. For all of these, we'll want to retain the `xor al, byte [...]` and move it up appropriately so the printed character is still decoded!

If we're just patching out the comparison, the `cmp` at `0x8049cf4`, `0x8049d1a`, `0x8049d38`, and `0x8049d54` aren't necessary anymore, nor are the `movzx` into `ebp` or `edx` for the user input string to check against. We can also remove the `jne` for when the strings are not equal too. That puts us at 4 + 3 + 2 + 6 = 15 bytes to work with in the first region, and 5 + 3 + 2 + 2 = 12 bytes in the others. The other comparisons against `ebx + 1`, `ebx + 2`, `ebx + 3` are to check bounds against input sizes - since inputs are multiples of four we don't *need* these, but not removing them means less work. So let's not. The patch itself is 11 bytes, so this fits nicely as long as we move some code around.

#### `0x8049ce9`

The compiler used `0f85` and `0f84` for `jne` and `je` by `0x8049ce9` because the jump targets are >128 bytes forward, so we have a little extra room to work with here, but it's not actually too important.

Here we'll want to rewrite right after the `movzx eax, byte [ecx + ebx]`, writing in:
    xor al, byte [esp + 0x1c]
    push ecx
    push eax
    push esp
    call sym.puts ; more on this in a moment
    pop eax       ; to fix up the stack
    pop eax       ; restore eax
    pop ecx       ; restore ecx

This assembles to `3244241c515054e8XXXXXXXX585859`, for 15 bytes. At `0x8049ce9` we have `0fb6141f3244241c0fbec039c20f8584000000`, 19 bytes, so let's throw `66666690` at the end for a `nop` to fill the extra space.

XXXXXXXX is an offset to the start of `puts()`. Or, more specifically, where the loader fixed up a `jmp` to go to `puts()`

I'm not sure where that would be, so the easiest thing I imagined was to look at what another `puts()` goes. Say, the one in `main` sounds good!

    0x08048b94      e867fbffff     call sym.imp.puts

The offset here is `0xfffffb67`, and the call destination is that offset plus the address of this instruction, plus the length of this instruction, which comes out to `0x8048b94 + 0xfffffb67 + 5`, for our `puts()` target being at `0x8048700`. So in our patch, we'll want `XXXXXXXX` to be `0x8048700 - (0x8049cf0 + 5)`, for `0xffffea0b`. Because little-endian, the whole instruction is written out as `e80beaffff`.

Now we have a patch! `3244241c515054e80beaffff58585966666690 @ 0x8049ce9`.

#### `0x8049d0e`

Same process holds here, too. Same patch fits, but the `nop` at the end will have to be differently sized to fit in the space - just `90` instead of the four byte version, here and for the next two.

We also need to be careful to xor against `esp + 0x1d` here, not `esp + 0x1c` again.

Repeating the process for computing a call destination, we get our patch: `3244241d515054e8e6e9ffff58585990 @ 0x8049d0e`

#### `0x8049d2a`

Again, next byte from `random_r()`, so `xor al, byte [esp + 0x1e]`

Same size region, so we just have to figure out the call destination here: `3244241e515054e8cae9ffff58585990 @ 0x8049d2a`

#### `0x8049d46`

Yadda yadda yadda, xor against next byte, spoiler alert: `3244241f515054e8aee9ffff58585990 @ 0x8049d46`

#### Aaaand patch!

Before:  
<div class="codebox">
#eval radare2 -c 'pd 51 @ 0x8049cd9' ./Sheb-Teth.bin_pristine | aha --no-header --stylesheet
</div>

After:  
<div class="codebox">
#eval radare2 -q -e io.cache=true -c 'wx 666666666690 @ 0x8049cf6' -c 'wx 6690 @ 0x8049d1c' -c 'wc 6690 @ 0x8049d38' -c 'wx 3244241c515054e80beaffff58585966666690 @ 0x8049ce9' -c 'wx 3244241d515054e8e6e9ffff58585990 @ 0x8049d0e' -c 'wx 3244241e515054e8cae9ffff58585990 @ 0x8049d2a' -c 'wx 3244241f515054e8aee9ffff58585990 @ 0x8049d46' -c 'pd 51 @ 0x8049cd9' ./Sheb-Teth.bin_pristine | aha --no-header --stylesheet
</div>

Wonderful - they are in fact correctly lined up calls to `puts()`, everything seems to line up... Now we should be able to run the program and, regardless of input, it'll print out flags.

Well, it does.

+400

## Flag 5

<div class="codebox">
#eval radare2 -q -A -c 'pdf @ sym.check_flag_5' ./Sheb-Teth.bin | aha --no-header --stylesheet
</div>

Interestingly, after the prior change, not much was necessary to make `flag-5` work out. Just run the program and copy out the flag. Almost.

Except that this check, it turns out, uses the comparison in `xorscura_compare` rather than `xorscura_compare_prng`, so we'll have to go back and patch that the same way.

<div class="codebox">
#eval radare2 -q -A -c 'pdf @ sym.xorscura_compare' ./Sheb-Teth.bin | aha --no-header --stylesheet
</div>

There's a `xor` in a loop and a `xor` not in a loop, which is curious. It turns out the first byte of the user input vs decoded flag is tested before even entering the loop! We can patch out the `jmp` at `0x8049e00` and let execution fall through into the loop with `eax` starting at 0, which will result in the first flag byte being computed twice. This lets us only patch the loop body, which is a bit easier.

`6690 @ 0x8049e00` does the trick here:

<div class="codebox">
#eval radare2 -q -A -e io.cache=true -c 'wx 6690 @ 0x8049e000' -c 'pdf @ sym.xorscura_compare' ./Sheb-Teth.bin_pristine | aha --no-header --stylesheet
</div>

Looking back to the loop body, we have `edx` being `xor`'d, so that's probably where flag bytes pass through. We can get rid of `movzx ebp, byte [edi + eax]`, `movsx edx, dl`, `cmp ebp, edx`, and `jne 0x8049e30` to make space for our patch, giving 4 + 3 + 3 + 2 + 2 = 14 bytes.

`ecx` holds the lenth of the flag, so we still have to save that, but `eax` is a counter here, and the flag byte is sitting in `edx`. To make matters worse, `ebx` is also used to hold the pointer into the xor'd flag string.

Since we won't be needing the user-input string, which *was* in `edi`, we can use `edi` for the flag string instead, letting the ABI keep it preserved for us.

Patching `58` at `0x8049de1` to `78` changes the destination of `mov ebx, dword [eax + 8]` to `edi`, `nop`'ing `mov edi, dword [eax + 4]` at `0x08049df0` keeps `edi` safe, and patching `03` at `0x8049e13` to `07` changes `movzx edx, byte [ebx + eax]` to `movzx edx, byte [edi + eax]`. Now we don't have to worry about preserving `ebx` through `puts()` calls.

This means 16 bytes of patch code, in a 14 byte hole. Thankfully, we can get two more bytes by replacing `add eax, 1` at `0x8049e22` with `inc eax`, which is just 40`. Now it's 16 bytes of code into a 16-byte hole, fitting perfectly:

    3214065151505254e8e1e8ffff5a5a5859 @ 0x8049e14

<div class="codebox">
#eval radare2 -q -A -e io.cache=true -c 'wx 6690 @ 0x8049e000' -c 'wx 3214065151505254e8e1e8ffff5a5a5859 @ 0x8049e14' -c 'pdf @ sym.xorscura_compare' ./Sheb-Teth.bin_pristine | aha --no-header --stylesheet
</div>

and now we've patched it up to print yet another flag when run!

For this one, run without sudo, the last flag prints as gibberish because `ptrace()` to poke around in memory silently fails, pointing at incorrect data for the flag. `sudo ./Sheb-Teth.bin`, however, will print out the last flag.

+500 and we're done!

At the end you'll have a binary that looks very much like [mine for this writeup](Sheb-Teth.bin_patched)

I absolutely glazed over the anti-debugging for flags 4 and 5, but that's entirely because I have no idea how to get gdb to trace properly through the forking and ptrace'ing - setting it to follow child processes simply didn't! Reading through that trickery and figuring out how it worked was quite entertaining, highly recommended as an exercise left to the reader.

PS: the binary I had in finishing the challenges was [a little different](Sheb-Teth.bin). My initial attempt included me wanting to debug into a segfault I caused by patching `xorscura_compare` wrong (I trashed a register that shouldn't have been.. oops), so I actually looked into the ptraces and mimicked the IPC via `POKEDATA` with a `mov`. I also intially `nop`'d out all the ptrace and fork nonsense after that, since it was throwing GDB off the scent. That's not necessary if we just get the patching right from the first time around, perhaps after knowing what *not* to do..

# BONUS:

I fixed up some issues in the binary after my patching, now [this binary](Sheb-Teth.bin_fancy) will print all of the flags when run, no longer prompting for input.

The only extra patching necessary is really to replace `strncmp()` calls with `puts()` and ensure the right parameter is passed for flags 0, 1, and 2. In `xorscura_compare()`, patching out comparisons as described above does the trick just fine.

Mind you, this writeup is *much* cleaner than the actual hacking I did at this binary during the CTF - I elided a lot of false starts and broken assembly and register fiddling.

---

<a name="patching_footnote">1</a>: When I say "patch", I do mean quite literally typing bytes into a hex editor. I personally use hexedit, but would happily accept suggestions for better terminal-based hex editors. Bless and HxD are also good, depending on your OS and preferences. Addresses like `0x804XXXX` here refer to addresses in the virtual address space of the process - to convert those to addresses in the file, for this particular binary, you would subtract the base address the file expects to be loaded at. For this program it's `0x804800`, which you can find through a little guesswork or looking at ELF headers with `objdump -p Sheb-Teth.bin`:

<pre class="codebox">
Program Header:
    PHDR off    0x00000034 vaddr 0x08048034 paddr 0x08048034 align 2**2
         filesz 0x00000100 memsz 0x00000100 flags r-x
  INTERP off    0x00000134 vaddr 0x08048134 paddr 0x08048134 align 2**0
         filesz 0x00000013 memsz 0x00000013 flags r--
    <b>LOAD off    0x00000000 vaddr 0x08048000 paddr 0x08048000 align 2**12
         filesz 0x00002f68 memsz 0x00002f68 flags r-x</b>
    LOAD off    0x00003000 vaddr 0x0804b000 paddr 0x0804b000 align 2**12
         filesz 0x000001a8 memsz 0x000001b8 flags rw-
 DYNAMIC off    0x0000300c vaddr 0x0804b00c paddr 0x0804b00c align 2**2
         filesz 0x000000e8 memsz 0x000000e8 flags rw-
    NOTE off    0x00000148 vaddr 0x08048148 paddr 0x08048148 align 2**2
         filesz 0x00000044 memsz 0x00000044 flags r--
EH_FRAME off    0x000028d4 vaddr 0x0804a8d4 paddr 0x0804a8d4 align 2**2
         filesz 0x000000a4 memsz 0x000000a4 flags r--
   STACK off    0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**4
         filesz 0x00000000 memsz 0x00000000 flags rw-
    LOAD off    0x00000000 vaddr 0x08048000 paddr 0x08048000 align 2**12
</pre>
The bolded entry here tells us that the region at file offset 0 maps to the virtual address 0x8048000, with the file and in-memory regions being the following 0x2f68 bytes. If you see a virtual address below `0x8048000 + 0x2f68`, this is the mapping that applies!

In radare2, you could instead `?p <virtual_addr>` to convert to a file offset, and not worry about these specifics.