From 2dc28a68f04c92adecefb05bbe8fa1ebb24d4189 Mon Sep 17 00:00:00 2001 From: iximeow Date: Sun, 14 Jul 2024 14:11:47 -0700 Subject: normalize headings, include toc --- source/blog/yax/avnera/notes.md | 59 ++++++++++++++++++++++++++++++++--------- 1 file changed, 47 insertions(+), 12 deletions(-) (limited to 'source') diff --git a/source/blog/yax/avnera/notes.md b/source/blog/yax/avnera/notes.md index a338a7b..bdc9777 100644 --- a/source/blog/yax/avnera/notes.md +++ b/source/blog/yax/avnera/notes.md @@ -1,3 +1,5 @@ +% learning an ISA by force of will + cpu instruction sets are one of my special interests. whitequark posted about a weird instruction set. so of course i asked for a copy of the binary. it indulged me! it's called `noes`, who knows why.
@@ -43,7 +45,38 @@ and so here is where i started:
 
 i also often think about [this lovely writeup](https://www.robertxiao.ca/hacking/dsctf-2019-cpu-adventure-unknown-cpu-reversing/) from Robert Xiao on a similar problem presented as a Dragon CTF teaser challenge a few years ago. working from an unknown data encoding all the way out to an instruction set and high level behavior is certainly _possible_, but it's not an opportunity that comes up often. it sounds fun! so i decided to chew on `noes` with as little context as i could have - the opportunity doesn't come up too often!
 
-### which way is up?
+making heads or tails of the binary turned out to be quite a few words, which i've roughly broken up as:
+
+
+
+## which way is up?
 
 even just at the bottom of this first window it's clear there's some kind of structure to this thing. but if it's code or data, who knows. i did luck out that the terminal size i happened to open `noes` with showed some structure, otherwise i'd have resorted to the same age-old trick of "resize the window until it looks right".
 
@@ -116,7 +149,7 @@ and so seeing `41f0`, `40f0`, `51f0`, `50f0`, `43f0`, `42f0`, `53f0`, `52f0`, an
 
 this is a great start: there's some kind of structure, something that looks like a workable guess for how at least one instruction is strucutred, values that look like addresses - or at least relative offsets. even if this is more data than code, there's enough structure here to chew on and learn more about the firmware.
 
-### one instruction, to many instructions
+## one instruction, to many instructions
 
 if i were stumped at this point i'd have started looking for common byte sequences, working through a list to guess what might be function prologues or epilogues, and go from there. but, being neither stumped nor interested in switching away from the next most advanced tool i have on hand - `xxd -ps noes | vim -` - i stuck with eyeballing common bytes. `8e` stuck out:
 
@@ -158,7 +191,7 @@ cc86f2cd87f2e004c882f2b928c855f3e01ac854f3e1fee850f321c850f3e0
 
nothing huge, seems like a workable assumption. -### a virtuous cycle +## a virtuous cycle with a guess of function prologues ane epilogues, i can guess at the instructions around the entry/exit of these "theoriezed functions". @@ -247,7 +280,8 @@ so 90XX as conditional branch... here's a function i'd guessed at instruction bo
 87 86
 cc27efcd28ef28 c8f1b0 e002 c863ef e05a c862ef
-cd65ef 14 cc64ef e201 e400 bf83e9 76 11 77 71 16 19 90 03 bf420b e8f1b0 ; 9003 is one instruction, not two
+cd65ef 14 cc64ef e201 e400 bf83e9
+76 11 77 71 16 19 90 03 bf420b e8f1b0 ; 9003 is one instruction, not two
 98fb
 8e 8f b9
 
@@ -257,7 +291,8 @@ fixing that up with what i know now it looks more like... 87 86 cc27efcd28ef28 c8f1b0 e002 c863ef e05a c862ef cd65ef 14 cc64ef -e201 e400 bf83e9 76 11 77 71 16 19 +e201 e400 bf83e9 +76 11 77 71 16 19 9003 bf420b ; jCC $+3 e8f1b0 98fb ; is this jCC $-5? @@ -267,7 +302,7 @@ e8f1b0 so maybe 9X is a whole family of conditional branches? plausible...
 e008 c803f3 bf52d8 28 c8e6ed c8eeed
-8eb9e8eded9001                        ; hadn't noticed this 9001 at first. conditional branch over a ret?
+8eb9e8eded9001                        ; hadn't noticed this 9001 at first. jcc over a ret?
 b9 28 c8eeed c8eded
 e4f1 e5ed bf82c3 bffedc bf106f
 
@@ -321,7 +356,7 @@ bf420b 8eb9e400bf this is great; `7x` definitely seems like it generates some kind of branch condition, and `9xXX` seems like a conditional branch based on that result. -### control flow!! +## control flow!! from this point onward, i'll be marking up approximate level of nesting with indentation. for each branch over a byte of code, it will be indented an additional level. when the branch target is reached, unindent. for simple control flow this gives a general idea of how PC moves through a region. @@ -427,7 +462,7 @@ bfc62e e101 799803bc38c0e803f3609809e841b6e942b6
-## loads/stores!! +## loads and stores!! `e8 XXXX` is probably a load! then `c8 XXXX` is a store! might be an absolute 16b address then? does that suggest `e0` is a relative load? maybe some kind of banked load. @@ -456,7 +491,7 @@ e1bf ; r1<-[...0xbf] e823f2 ; r0<-[0xf223] 21 ; ??? c823f2 ; [0xf223]<-r0 -` + this is great: control flow, loads/stores, this is enough to start finding where registers are read and written, and start figuring out arithmetic or @@ -667,7 +702,7 @@ e835ef e932ef 19 e933ef 19 e934ef 19 where `19` would mean this `or`s all four bytes and checking for.. zero? non-zero? -## inc/dec is a loop's best friend +## inc and dec are a loop's best friend and maybe 4X is dec? shr? `40` agrees, here's a branch table or smth?
@@ -774,7 +809,7 @@ ah, still unsure about `bf`, but this seems useful:
 8e b9
 
-# more subtle load/store? +## more subtle loads or stores? distracted by `fe01`. found this:
@@ -858,7 +893,7 @@ also, that tells us `90` is `jnz`. `98` then is probably `jz`. that's consistent
 
implementing a branch table for `i` in `0..5`? -# a multiplier! +## a multiplier! `69` and `38..3f` make more sense from this loop:
-- 
cgit v1.1