summaryrefslogtreecommitdiff
path: root/source
diff options
context:
space:
mode:
authoriximeow <me@iximeow.net>2024-07-14 14:11:47 -0700
committeriximeow <me@iximeow.net>2024-07-14 14:11:47 -0700
commit2dc28a68f04c92adecefb05bbe8fa1ebb24d4189 (patch)
tree6b5f656d81de77f138cfa5d06eef5d116736de06 /source
parent4d7e9a5a41e769b2548cfae26c7472a01b3246e3 (diff)
normalize headings, include tocHEADmaster
Diffstat (limited to 'source')
-rw-r--r--source/blog/yax/avnera/notes.md59
1 files changed, 47 insertions, 12 deletions
diff --git a/source/blog/yax/avnera/notes.md b/source/blog/yax/avnera/notes.md
index a338a7b..bdc9777 100644
--- a/source/blog/yax/avnera/notes.md
+++ b/source/blog/yax/avnera/notes.md
@@ -1,3 +1,5 @@
+% learning an ISA by force of will
+
cpu instruction sets are one of my special interests. whitequark posted about a weird instruction set. so of course i asked for a copy of the binary. it indulged me! it's called `noes`, who knows why.
<pre class="codebox">
@@ -43,7 +45,38 @@ and so here is where i started:
i also often think about [this lovely writeup](https://www.robertxiao.ca/hacking/dsctf-2019-cpu-adventure-unknown-cpu-reversing/) from Robert Xiao on a similar problem presented as a Dragon CTF teaser challenge a few years ago. working from an unknown data encoding all the way out to an instruction set and high level behavior is certainly _possible_, but it's not an opportunity that comes up often. it sounds fun! so i decided to chew on `noes` with as little context as i could have - the opportunity doesn't come up too often!
-### which way is up?
+making heads or tails of the binary turned out to be quite a few words, which i've roughly broken up as:
+
+<ul>
+<li><a href="#which-way-is-up">which way is up?</a></li>
+<li><a href="#one-instruction-to-many-instructions">one instruction, to many instructions</a></li>
+<li><a href="#a-virtuous-cycle">a virtuous cycle</a></li>
+<li><a href="#control-flow">control flow!!</a></li>
+<li><a href="#loads-and-stores">loads and stores!!</a></li>
+<li><a href="#it-does-in-fact-have-an-alu">it does, in fact, have an ALU</a></li>
+<li><a href="#inc-and-dec-are-a-loops-best-friend">inc and dec are a loop’s best friend</a></li>
+<li><a href="#more-subtle-loads-or-stores">more subtle loads or stores?</a></li>
+<li><a href="#a-multiplier">a multiplier!</a></li>
+<li><a href="#whats-left">what’s left?</a></li>
+<li><a href="#whittling-down-the-last-few-opcodes">whittling down the last few opcodes…</a><ul>
+<li><a href="#f"><code>48..4f</code></a></li>
+<li><a href="#or-a-wild-guess-towards-58..5f"><code>59</code> … or a wild guess towards <code>58..5f</code>?</a></li>
+<li><a href="#where-possible"><code>60..67</code> … where possible</a></li>
+<li><a href="#ba"><code>ba</code></a></li>
+<li><a href="#section"><code>00..07</code></a></li>
+</ul></li>
+<li><a href="#mostly-done-whats-left-in-the-encoding-space">mostly done, what’s left in the encoding space?</a></li>
+<li><a href="#f-sub-or-cmp"><code>78..7f</code> … <code>sub</code> or <code>cmp</code>?</a></li>
+<li><a href="#what-is-a0">what is <code>a0</code>?</a></li>
+<li><a href="#what-are-c0..c7">what are <code>c0..c7</code>?</a></li>
+<li><a href="#but-wait-what-happened-with-jcc">but wait! what happened with <code>jcc</code>?</a></li>
+<li><a href="#last-thoughts">last thoughts</a></li>
+<li><a href="#conclusion">conclusion</a><ul>
+<li><a href="#summarized-materials">summarized materials</a></li>
+</ul></li>
+</ul>
+
+## which way is up?
even just at the bottom of this first window it's clear there's some kind of structure to this thing. but if it's code or data, who knows. i did luck out that the terminal size i happened to open `noes` with showed some structure, otherwise i'd have resorted to the same age-old trick of "resize the window until it looks right".
@@ -116,7 +149,7 @@ and so seeing `41f0`, `40f0`, `51f0`, `50f0`, `43f0`, `42f0`, `53f0`, `52f0`, an
this is a great start: there's some kind of structure, something that looks like a workable guess for how at least one instruction is strucutred, values that look like addresses - or at least relative offsets. even if this is more data than code, there's enough structure here to chew on and learn more about the firmware.
-### one instruction, to many instructions
+## one instruction, to many instructions
if i were stumped at this point i'd have started looking for common byte sequences, working through a list to guess what might be function prologues or epilogues, and go from there. but, being neither stumped nor interested in switching away from the next most advanced tool i have on hand - `xxd -ps noes | vim -` - i stuck with eyeballing common bytes. `8e` stuck out:
<pre class="codebox">
@@ -158,7 +191,7 @@ cc86f2cd87f2e004c882f2b928c855f3e01ac854f3e1fee850f321c850f3e0
</pre>
nothing huge, seems like a workable assumption.
-### a virtuous cycle
+## a virtuous cycle
with a guess of function prologues ane epilogues, i can guess at the instructions around the entry/exit of these "theoriezed functions".
@@ -247,7 +280,8 @@ so 90XX as conditional branch... here's a function i'd guessed at instruction bo
<pre class="codebox">
87 86
cc27efcd28ef28 c8f1b0 e002 c863ef e05a c862ef
-cd65ef 14 cc64ef e201 e400 bf83e9 76 11 77 71 16 19 90 03 bf420b e8f1b0 <span class=blue>; 9003 is one instruction, not two</span>
+cd65ef 14 cc64ef e201 e400 bf83e9
+76 11 77 71 16 19 90 03 bf420b e8f1b0 <span class=blue>; 9003 is one instruction, not two</span>
98fb
8e 8f b9
</pre>
@@ -257,7 +291,8 @@ fixing that up with what i know now it looks more like...
87 86
cc27efcd28ef28 c8f1b0 e002 c863ef e05a c862ef
cd65ef 14 cc64ef
-e201 e400 bf83e9 76 11 77 71 16 19
+e201 e400 bf83e9
+76 11 77 71 16 19
9003 bf420b <span class=blue>; jCC $+3</span>
e8f1b0
98fb <span class=blue>; is this jCC $-5?</span>
@@ -267,7 +302,7 @@ e8f1b0
so maybe 9X is a whole family of conditional branches? plausible...
<pre class="codebox">
e008 c803f3 bf52d8 28 c8e6ed c8eeed
-8eb9e8eded9001 <span class=blue>; hadn't noticed this 9001 at first. conditional branch over a ret?</span>
+8eb9e8eded9001 <span class=blue>; hadn't noticed this 9001 at first. jcc over a ret?</span>
b9 28 c8eeed c8eded
e4f1 e5ed bf82c3 bffedc bf106f
</pre>
@@ -321,7 +356,7 @@ bf420b 8eb9e400bf
this is great; `7x` definitely seems like it generates some kind of branch condition, and `9xXX` seems like a conditional branch based on that result.
-### control flow!!
+## control flow!!
from this point onward, i'll be marking up approximate level of nesting with indentation. for each branch over a byte of code, it will be indented an additional level. when the branch target is reached, unindent. for simple control flow this gives a general idea of how PC moves through a region.
@@ -427,7 +462,7 @@ bfc62e
e101 799803bc38c0e803f3609809e841b6e942b6
</pre>
-## loads/stores!!
+## loads and stores!!
`e8 XXXX` is probably a load! then `c8 XXXX` is a store! might be an absolute 16b address then? does that suggest `e0` is a relative load? maybe some kind of banked load.
@@ -456,7 +491,7 @@ e1bf <span class=blue>; r1&lt;-[...0xbf]</span>
e823f2 <span class=blue>; r0&lt;-[0xf223]</span>
21 <span class=blue>; ???</span>
c823f2 <span class=blue>; [0xf223]&lt;-r0</span>
-</pre>`
+</pre>
this is great: control flow, loads/stores, this is enough to start finding
where registers are read and written, and start figuring out arithmetic or
@@ -667,7 +702,7 @@ e835ef e932ef 19 e933ef 19 e934ef 19
</pre>
where `19` would mean this `or`s all four bytes and checking for.. zero? non-zero?
-## inc/dec is a loop's best friend
+## inc and dec are a loop's best friend
and maybe 4X is dec? shr? `40` agrees, here's a branch table or smth?
<pre class="codebox">
@@ -774,7 +809,7 @@ ah, still unsure about `bf`, but this seems useful:
8e b9
</pre>
-# more subtle load/store?
+## more subtle loads or stores?
distracted by `fe01`. found this:
<pre class="codebox">
@@ -858,7 +893,7 @@ also, that tells us `90` is `jnz`. `98` then is probably `jz`. that's consistent
</pre>
implementing a branch table for `i` in `0..5`?
-# a multiplier!
+## a multiplier!
`69` and `38..3f` make more sense from this loop:
<pre class="codebox">