From de07fbcbe3b31c5b90c5e18191e34ca9aa0afbf9 Mon Sep 17 00:00:00 2001 From: iximeow Date: Sun, 14 Jul 2024 13:14:53 -0700 Subject: avnera notes editing, color adjustment --- content_template.pandoc | 4 +- source/blog/yax/avnera/notes.md | 1298 ++++++++++++++++++++------------------- 2 files changed, 680 insertions(+), 622 deletions(-) diff --git a/content_template.pandoc b/content_template.pandoc index cc0d3f6..ce15605 100644 --- a/content_template.pandoc +++ b/content_template.pandoc @@ -22,9 +22,9 @@ applicable for aha-rendered ANSI text */ .codebox .black { color: #2e3436; } .codebox .red { color: #cc0000; } - .codebox .green { color: #4e9a06; } + .codebox .green { color: #6eaa26; } .codebox .yellow { color: #c4a000; } - .codebox .blue { color: #3465a4; } + .codebox .blue { color: #5485d4; } .codebox .purple { color: #75507b; } .codebox .cyan { color: #06989a; } .codebox .white { color: #d3d7cf; } diff --git a/source/blog/yax/avnera/notes.md b/source/blog/yax/avnera/notes.md index a5797eb..a338a7b 100644 --- a/source/blog/yax/avnera/notes.md +++ b/source/blog/yax/avnera/notes.md @@ -1,16 +1,16 @@ cpu instruction sets are one of my special interests. whitequark posted about a weird instruction set. so of course i asked for a copy of the binary. it indulged me! it's called `noes`, who knows why. -``` +
 > ls -al noes
 -rw-rw-r-- 1 iximeow iximeow 12935 May 23 18:12 noes
-```
+
so, 12.6KiB of some firmware for a headset or something, and an otherwise unknown instruction set. this is catnip, to me. and so here is where i started: -``` +
 00000000   BC 60 BB 68  E4 E3 E5 ED  E2 8F E3 01  28 42 99 03  43 91 05 D4  C4 BC 69 BB  .`.h........(B..C.....i.
 00000018   BC 26 BE E0  04 C8 41 F0  E0 44 C8 40  F0 E0 BB C8  51 F0 E0 94  C8 50 F0 EC  .&....A..D.@....Q....P..
 00000030   E3 ED BF D8  0E BC E0 05  BF D0 0E C8  E3 ED CC 19  B9 E0 04 C8  41 F0 E0 64  ....................A..d
@@ -18,7 +18,7 @@ and so here is where i started:
 00000060   E0 BC C8 42  F0 E0 BB C8  53 F0 E0 D3  C8 52 F0 E4  01 BF 90 75  BC E9 72 E8  ...B....S....R.....u..r.
 00000078   99 B7 90 03  BC E9 72 E2  1C E3 00 E0  72 C8 43 F0  E0 E7 C8 42  F0 E0 BB C8  ......r.....r.C....B....
 00000090   53 F0 E0 B4  C8 52 F0 BC  C0 72 E0 13  C8 20 B5 E0  EC C8 1F B5  BC 59 40 E1  S....R...r... .......Y@.
-000000A8   08 E8 48 B6  79 90 0A 28  C8 4C B6 C8  4B B6 BC 3C  6B E8 48 B6  90 07 C8 4C  ..H.y..(.L..K..
 
 i also often think about [this lovely writeup](https://www.robertxiao.ca/hacking/dsctf-2019-cpu-adventure-unknown-cpu-reversing/) from Robert Xiao on a similar problem presented as a Dragon CTF teaser challenge a few years ago. working from an unknown data encoding all the way out to an instruction set and high level behavior is certainly _possible_, but it's not an opportunity that comes up often. it sounds fun! so i decided to chew on `noes` with as little context as i could have - the opportunity doesn't come up too often!
 
@@ -50,7 +50,7 @@ even just at the bottom of this first window it's clear there's some kind of str
 so there's some structure, the file is kind of tiny, the file is notionally a firmware for a processor, so presumably the processor also is kind of tiny. the bytes here are not obviously an 8080/6502/etc. probably not a tiny ARM core, because the repetition at the end of the above is offset by 1: this processor must be OK with instructions at odd addresses.
 
 scrolling through the file for anything else interesting and this stands out:
-```
+
 00001358   B9 80 81 82  83 84 85 E8  E6 B0 80 E8  E7 B0 80 E8  E8 B0 80 E8  E9 B0 80 E8  ........................
 00001370   EA B0 80 E8  EB B0 80 E8  EC B0 80 E8  ED B0 80 E8  EE B0 80 E8  EF B0 80 E8  ........................
 00001388   F0 B0 80 86  87 E1 03 E8  59 F3 21 98  3D BF 8E 31  BC 2E CF 80  81 82 83 84  ........Y.!.=..1........
@@ -59,20 +59,20 @@ scrolling through the file for anything else interesting and this stands out:
 000013D0   CD DF 8F 8E  88 C8 F0 B0  88 C8 EF B0  88 C8 EE B0  88 C8 ED B0  88 C8 EC B0  ........................
 000013E8   88 C8 EB B0  88 C8 EA B0  88 C8 E9 B0  88 C8 E8 B0  88 C8 E7 B0  88 C8 E6 B0  ........................
 00001400   8D 8C 8B 8A  89 88 BA E8  F3 B4 C8 5C  ED E8 F2 B4  C8 5B ED B9  E0 03 C8 0D  ...........\.....[......
-```
+
this is different, which makes it interesting! this is a long span of bytes with very few ascii bytes, unlike the rest of the file which has a more frequent mix of bytes in `[0, 255]`. the content starts with an increasing series, `80 81 82 83 84 85 E8 E6 B0 80 E8 E7 B0 80 ...`, and towards the end has `8D 8C 8B 8A 89 88`. this might be data? maybe a lookup table? there are other regions of clear structure, like: -``` +
 00002478   6A F1 ED 6B  F1 12 E9 6C  F1 21 72 13  E9 6D F1 21  73 14 E9 6E  F1 21 74 15  j..k...l.!r..m.!s..n.!t.
 00002490   E9 6F F1 21  75 12 E9 25  EE 21 72 13  E9 26 EE 21  73 14 E9 27  EE 21 74 15  .o.!u..%.!r..&.!s..'.!t.
 000024A8   E9 28 EE 21  75 CA 68 F1  CB 69 F1 CC  6A F1 CD 6B  F1 12 1B 1C  1D 98 07 E4  .(.!u.h..i..j..k........
-```
+
but what does `21 72 13 E9` mean? or `21 73 14 E9`? `21 74 15 E9`? maybe four-byte instructions with different operands? ok. time to break out the big tools. -``` +
 # iximeow> xxd -ps noes | head -n 20
 bc60bb68e4e3e5ede28fe30128429903439105d4c4bc69bbbc26bee004c8
 41f0e044c840f0e0bbc851f0e094c850f0ece3edbfd80ebce005bfd00ec8
@@ -94,23 +94,23 @@ e050c844f0e0bbc855f0e0f6c854f0e104121972e06ac847f0e05ec846f0
 e0bcc857f0e003c856f0e108121972e061c849f0e0e1c848f0e0bcc859f0
 e024c858f0e110121972e057c84bf0e089c84af0e0bcc85bf0e027c85af0
 e120121972e032c84df0e0c8c84cf0e0bcc85df0e046c85cf0e140121972
-```
+
more structure to this, highlighting helps.. -``` -308eb9e200e004
c8
41f0e044
c8
40f0e0bb
c8
51f0e094
c8
50f0e101121972 -e072
c8
43f0e0bc
c8
42f0e0bb
c8
53f0e0d3
c8
52f0e102121972e040
c8
45f0 -e050
c8
44f0e0bb
c8
55f0e0f6
c8
54f0e104121972e06a
c8
47f0e05e
c8
46f0 -e0bc
c8
57f0e003
c8
56f0e108121972e061
c8
49f0e0e1
c8
48f0e0bc
c8
59f0 -e024
c8
58f0e110121972e057
c8
4bf0e089
c8
4af0e0bc
c8
5bf0e027
c8
5af0 -e120121972e032
c8
4df0e0
c8
c8
4cf0e0bc
c8
5df0e046
c8
5cf0e140121972 -``` +
+308eb9e200e004c841f0e044c840f0e0bbc851f0e094c850f0e101121972
+e072c843f0e0bcc842f0e0bbc853f0e0d3c852f0e102121972e040c845f0
+e050c844f0e0bbc855f0e0f6c854f0e104121972e06ac847f0e05ec846f0
+e0bcc857f0e003c856f0e108121972e061c849f0e0e1c848f0e0bcc859f0
+e024c858f0e110121972e057c84bf0e089c84af0e0bcc85bf0e027c85af0
+e120121972e032c84df0e0c8c84cf0e0bcc85df0e046c85cf0e140121972
+
if `c8` marks the start of some instruction or sequence, the those sequences are something like: -``` +
 c841f0e044 c840f0e0bb c851f0e094 c850f0e101 ...
 ... c843f0e0bc c842f0e0bb c853f0e0d3 c852f0e1 ...
-```
+
and so seeing `41f0`, `40f0`, `51f0`, `50f0`, `43f0`, `42f0`, `53f0`, `52f0`, and others like it, immediately suggests something little-endian is happening. those might be offsets for a memory access? `e044`, `e0bb`, etc could be other immediates or operand selectors. maybe `c8` is an opcode itself? @@ -119,43 +119,43 @@ this is a great start: there's some kind of structure, something that looks like ### one instruction, to many instructions if i were stumped at this point i'd have started looking for common byte sequences, working through a list to guess what might be function prologues or epilogues, and go from there. but, being neither stumped nor interested in switching away from the next most advanced tool i have on hand - `xxd -ps noes | vim -` - i stuck with eyeballing common bytes. `8e` stuck out: -``` -75bffce2fe0671fe0272fe0373f219d2
8e
8fb98786ca56ef147615771674 -1775bffce2ea56eff674bfe444
8e
8fb9878628c857eec856eee412bf14e9 -761177e010de03e20016741775bf5f05
8e
8fb9e
8e
9ed9803bccfe2e85aee -e95bee19982de855eec84befc94defe85aeec84cefe26
8e
3eee461e5eebf +
+75bffce2fe0671fe0272fe0373f219d28e8fb98786ca56ef147615771674
+1775bffce2ea56eff674bfe4448e8fb9878628c857eec856eee412bf14e9
+761177e010de03e20016741775bf5f058e8fb9e8e9ed9803bccfe2e85aee
+e95bee19982de855eec84befc94defe85aeec84cefe268e3eee461e5eebf
 dfe3ea5aeeeb5beefa079026c85beec85aeebcc0e3e854eec84befe859ee
-c84defe85
8e
ec84cefe26
8e
3eee461e5eebfdfe3e001c84befe85deec84d -``` +c84defe858eec84cefe268e3eee461e5eebfdfe3e001c84befe85deec84d +
and in fact the longer common sequences are `8e8fb98786`: -``` -75bffce2fe0671fe0272fe0373f219d2
8e8fb98786
ca56ef147615771674 -1775bffce2ea56eff674bfe444
8e8fb98786
28c857eec856eee412bf14e9 +
+75bffce2fe0671fe0272fe0373f219d28e8fb98786ca56ef147615771674
+1775bffce2ea56eff674bfe4448e8fb9878628c857eec856eee412bf14e9
 761177e010de03e20016741775bf5f058e8fb9e8e9ed9803bccfe2e85aee
 e95bee19982de855eec84befc94defe85aeec84cefe268e3eee461e5eebf
 dfe3ea5aeeeb5beefa079026c85beec85aeebcc0e3e854eec84befe859ee
 c84defe858eec84cefe268e3eee461e5eebfdfe3e001c84befe85deec84d
-```
+
this shows up across the file, but `8786` is only sometimes present. so maybe this is the epilogue of one function, and the prologue of the next? in which case the epilogue would be `8e8fb9` and the prologue is `8786`. then `b9` is `ret`? `8e8f` and `8786` are `pop` and `push` respectively? lets see if that gives us reasonably-sized functions. as some examples: -``` -...
8e8fb9
+
+... 8e8fb9
 
-8786cae5eecce6eee412bf14e9761177e8e6eede03e8e5eede04e20016741775bf5f05
8e8fb9
+8786cae5eecce6eee412bf14e9761177e8e6eede03e8e5eede04e20016741775bf5f058e8fb9 e200e45390d4b9e4f5e5edbf82c3e0d4c85cefe0e8c85bef28c85eefe003 [ 210 bytes ] -bf2bd7
8e8fb9
+bf2bd78e8fb9 e500e406e260e3ed14e1005272110b73f27215527504e118 [ 240 bytes ] -19e0f2c8ecee177116c0c9eeeec8edeee1fff65172e412bf4bd4
8e8fb9
+19e0f2c8ecee177116c0c9eeeec8edeee1fff65172e412bf4bd48e8fb9 cc86f2cd87f2e004c882f2b928c855f3e01ac854f3e1fee850f321c850f3e0 [ 420 bytes ] -75bf8cdaea0aef02ca0aefe19912799181
8e8fb9
-``` +75bf8cdaea0aef02ca0aefe199127991818e8fb9 +
nothing huge, seems like a workable assumption. ### a virtuous cycle @@ -163,129 +163,129 @@ nothing huge, seems like a workable assumption. with a guess of function prologues ane epilogues, i can guess at the instructions around the entry/exit of these "theoriezed functions". some more looking around, `c8` is pretty common and seems to be followed by two bytes that might be an address? -``` -52e8ec76edbfdee6e876ed9805e401bc52e8b928
c8
06ee
c8
05ee
c8
47eee8 -03f3619016e003
c8
4bee
e001
c8
4aee
e140e8a3f919
c8
a3f9bc3b60e102e8 -4aee799026e003
c8
4beee8eded9805e008
c8
03f3
e008
c8
8cf971e898f919 -
c8
98f9e1bfe8a3f921
c8
a3f9b9e010
c8
03f3
bf9f5ce108e898f919
c8
98f9 -e1bfe8a3f921
c8
a3f9e001
c8
46eeb98786e101e847ee79902c28
c8
47eee4 -05e5eebf82c3e0
c8
c8
52b9e071
c8
51b9e001
c8
54b9e0f4
c8
53b9e201e400 -bf9513c906ee
c8
05eee102e8e6ed799003bf2dc2e2cce342e85bede95ced -``` +
+52e8ec76edbfdee6e876ed9805e401bc52e8b928c806eec805eec847eee8
+03f3619016e003c84beee001c84aeee140e8a3f919c8a3f9bc3b60e102e8
+4aee799026e003c84beee8eded9805e008c803f3e008c88cf971e898f919
+c898f9e1bfe8a3f921c8a3f9b9e010c803f3bf9f5ce108e898f919c898f9
+e1bfe8a3f921c8a3f9e001c846eeb98786e101e847ee79902c28c847eee4
+05e5eebf82c3e0c8c852b9e071c851b9e001c854b9e0f4c853b9e201e400
+bf9513c906eec805eee102e8e6ed799003bf2dc2e2cce342e85bede95ced
+
there are definitely other `c8`s here that don't make sense yet, but `06ee .. 05ee` and `4bee .. 4aee` look like sequential addresses, and `03f3` shows up a few times which suggests these addresses are probably absolute. in-between there are several `e0` followed by a relatively low byte, `e001` between two `c8` sequences, `e008`, `e010`, an `e071` once. the second byte might be an immediate, maybe an offset? the values tend towards bitmasks, for whatever reason. this also happens with `e1` and an `e2` in the same region: -``` +
 52e8ec76edbfdee6e876ed9805e401bc52e8b928c806eec805eec847eee8
 03f3619016e003c84beee001c84aeee140e8a3f919c8a3f9bc3b60e102e8
 4aee799026e003c84beee8eded9805e008c803f3e008c88cf971e898f919
 c898f9e1bfe8a3f921c8a3f9b9e010c803f3bf9f5ce108e898f919c898f9
 e1bfe8a3f921c8a3f9e001c846eeb98786e101e847ee79902c28c847eee4
-05e5eebf82c3e0c8c852b9e071c851b9e001c854b9e0f4c853b9
e201
e400 -bf9513c906eec805eee102e8e6ed799003bf2dc2
e2cc
e342e85bede95ced -``` +05e5eebf82c3e0c8c852b9e071c851b9e001c854b9e0f4c853b9e201e400 +bf9513c906eec805eee102e8e6ed799003bf2dc2e2cce342e85bede95ced +
so maybe `eX` is a whole range of instructions with one-byte immediates? with this, lets see how a hypothesized function breaks apart.. -``` +
 87 86
 14761577f698 19e0f2
 c8ecee 177116c0 c9eeee c8edee e1ff f65172 e412 bf4bd4
 8e 8f b9
-```
+
7X is another one-byte instruction maybe? 1X too? calling `86` "push A" and `87` "push B", similarly with `8e 8f`, that gives us: -``` -87 86 ; push B; push A +
+87 86                       ; push B; push A
 14 76 15 77 f698 19e0f2
 c8ecee 17 71 16 c0 c9eeee c8edee e1ff f651 72 e412 bf4bd4
-8e 8f b9                    ; pop A; pop B; ret
-```
+8e 8f b9                    ; pop A; pop B; ret
+
-checking that other blocks seem to break apart reasonably as "functions", this is how vim started looking. knowing `c8XXXX` is an instruction in turn makes other instructions more clear: -``` +checking that other blocks seem to break apart reasonably as "functions", this is how vim starts to look. knowing `c8XXXX` is an instruction in turn makes other instructions more clear: +
 87 86
-28 c809ef 72 e461 e5ee bf29e3       ; 28 looks like something, 72 looks like something, bf?
-e200 e468 e5ee bf29e3 bf2ae0 bffedc ; bf?
+28 c809ef 72 e461 e5ee bf29e3       ; 28 looks like something, 72 looks like something, bf?
+e200 e468 e5ee bf29e3 bf2ae0 bffedc ; bf?
 28 76 77 e84cee e94dee e201a
-6939384290fac923f5c822f5e101e826f519c826f5c981f1e850f1649806 ; dunno about these
-28c878ed9818e803f3619807e001c809ef900bc6e1081679e107174991bc ; dunno about these
-bf4fdde826f56090fabf33e1e0072f9008e0082e9003bf2dc2e809ef9803 ; dunno about these
+6939384290fac923f5c822f5e101e826f519c826f5c981f1e850f1649806 ; dunno about these
+28c878ed9818e803f3619807e001c809ef900bc6e1081679e107174991bc ; dunno about these
+bf4fdde826f56090fabf33e1e0072f9008e0082e9003bf2dc2e809ef9803 ; dunno about these
 bf2bd7
-8e 8f b9 ; but an epilogue
-```
+8e 8f b9 ; but an epilogue
+
lots that would be too early to guess about, but `28` seems like a functional instruction, as does `72`. `bf` might be a relative load or store? if this is a vaguely normal 8-bit CPU, there ought to be conditional relative branches around somewhere too, which can help point towards instruction boundaries. most relative branches are short, either in the positive or negative direction (for loops), so that's worth keeping in mind. keeping an eye out is the best option, not really sure how to proactively find them. at the very least, it's probably not `e0..e7` as the conditional branches, because the following byte is sometimes `ff` (branch `$-1`??) or `00` (branch `$`???) continuing on, picking function boundaries somewhat arbitrarily on "seen `b9`", this is illustrative: -``` +
 ... snip ...
 b9
 
 ; new function?
-e500 e406 e260 e3ed 14 e100 52 72   ; 14, 52, also 72 instructions?
-110b 73f2 7215 52 7504 e118         ; not sure if this makes sense
+e500 e406 e260 e3ed 14 e100 52 72   ; 14, 52, also 72 instructions?
+110b 73f2 7215 52 7504 e118         ; not sure if this makes sense
 14 79 91e8 15
-b9                                  ; ret
+b9                                  ; ret
 
 ; new function?
 28 c83bb5 e0fc c83cb5 28 c83db5 c83eb5 c842b5 e017 c841b5
 e0ed c840b5 e061 c83fb5 bfded5 c865ed bf314d 71 9003 e006 b9
 
-28 b9                               ; something, ret
+28 b9                               ; something, ret
 
 ; new prologue
 86 ...more...
-```
+
`28 b9` seems too short to be a function (why call to `28`? if you want `28` just inline it), so that's noteworthy. but 9003 is 3 bytes before it. 9003 as a `jz $+3`? and `28 b9` is an alternate ret? that skips over `e006; ret`? seems workable. so 90XX as conditional branch... here's a function i'd guessed at instruction boundaries for early on, and i'd gotten wrong: -``` +
 87 86
 cc27efcd28ef28 c8f1b0 e002 c863ef e05a c862ef
-cd65ef 14 cc64ef e201 e400 bf83e9 76 11 77 71 16 19 90 03 bf420b e8f1b0 ; 9003 is one instruction, not two
+cd65ef 14 cc64ef e201 e400 bf83e9 76 11 77 71 16 19 90 03 bf420b e8f1b0 ; 9003 is one instruction, not two
 98fb
 8e 8f b9
-```
+
-so fixing that up... -``` +fixing that up with what i know now it looks more like... +
 87 86
 cc27efcd28ef28 c8f1b0 e002 c863ef e05a c862ef
 cd65ef 14 cc64ef
 e201 e400 bf83e9 76 11 77 71 16 19
-9003 bf420b                                     ; jCC $+3
+9003 bf420b                                     ; jCC $+3
 e8f1b0
-98fb                                            ; is this jCC $-5?
+98fb                                            ; is this jCC $-5?
 8e 8f b9
-```
+
so maybe 9X is a whole family of conditional branches? plausible... -``` +
 e008 c803f3 bf52d8 28 c8e6ed c8eeed
-8eb9e8eded9001                        ; hadn't noticed this 9001 at first. conditional branch over a ret?
+8eb9e8eded9001                        ; hadn't noticed this 9001 at first. conditional branch over a ret?
 b9 28 c8eeed c8eded
 e4f1 e5ed bf82c3 bffedc bf106f
-```
+
adjusting that a bit: -``` +
 e008 c803f3 bf52d8 28 c8e6ed c8eeed
 8eb9e8eded
 9001 b9
 28 c8eeed c8eded
 e4f1 e5ed bf82c3 bffedc bf106f
-```
+
finding other interesting patterns around 9Xs, this function: -``` -87 86 15 71 14 e304 53 9101 0176 ; 91XX as jCC? +
+87 86 15 71 14 e304 53 9101 0176              ; 91XX as jCC?
 1177 e101 fe01 79 e104 f679 9034 e102 fe01
-79 9006 bf3cd5bccdc7e101fe01                  ; 79 is a test or cmp or sub maybe?
+79 9006 bf3cd5bccdc7e101fe01                  ; 79 is a test or cmp or sub maybe?
 79 9803 bccdc7e1f7e854f121c854f1e400bffa48e1fde856f121c856f1bf
 17d5bccdc7e110f6799034e850f164986c e101 fe01
 79 900f e401 bf67c5
@@ -294,19 +294,17 @@ e200 e423 bf2b31 e850f1 62 983d e400 983b e102 f679 90 38 e850f1 64 9832 fe01 79
 e401 bf67c5 e852f1 60 9819 e400 9812 e101 fe01 79 900e e1f7 e854f1 21 c854f1
 e401 bf67c5
 8e8fb9
-```
+
following offsets for the proposed jCC in the third and fourth lines yields: -``` -79 9006 bf3cd5bccdc7 e101 fe01 ; so fe01 is something (`feXX`?) -79 9803 bccdc7 e1f7 e854f1 21 c854f1 e400 ; bcXXXX is something +
+79 9006 bf3cd5bccdc7 e101 fe01                ; so fe01 is something (`feXX`?)
+79 9803 bccdc7 e1f7 e854f1 21 c854f1 e400     ; bcXXXX (or more?) is something
 bffa48e1fde856f121c856f1bf
-```
+
-incidentally in the literal next function that knowledge of bf breaks things up into another pattern, -``` - ; 7X definitely generates some branch condition. - ; nested conditions ahead, indentation roughly for control flow +incidentally in the literal next function that knowledge of `bf` breaks things up into another pattern, +
 86
 e406 bf551b 76 980b e406 bf551b 76 9803 bf420b e406
 bf8f4d 71 9003 bf420b e0ee c840b5 e0f3 c83fb5 28 c842b5 e016 c841b5
@@ -319,14 +317,20 @@ e806ef 79 900c
 e807ef 7a 9006
 e808ef 7b 9803
 bf420b 8eb9e400bf
-```
+
-same with more structure... -``` +this is great; `7x` definitely seems like it generates some kind of branch condition, and `9xXX` seems like a conditional branch based on that result. + +### control flow!! + +from this point onward, i'll be marking up approximate level of nesting with indentation. for each branch over a byte of code, it will be indented an additional level. when the branch target is reached, unindent. for simple control flow this gives a general idea of how PC moves through a region. + +revisiting the above with this additional structure is immediately informative! +
 86
-e406 bf551b   ; this and the 76 after are the same as the one two lines down
+e406 bf551b   ; this and the 76 after are the same as the one two lines down
 76 980b
-  e406 bf551b ; doing something to r4? getting a condition out? does bf551b reference memory?
+  e406 bf551b ; doing something to r4? getting a condition out? does bf551b reference memory?
   76 9803
     bf420b
 e406 bf8f4d
@@ -355,52 +359,52 @@ e80db1 e90eb1 ea0fb1 eb10b1 ec05ef
       e808ef
       7b 9803
         bf420b
-; note 8e b9 here, some kind of early ret?
-; missed that at first!
-8eb9e400bf
+; note 8e b9 here, some kind of early ret?
+; missed that at first!
+8eb9 e400bf
 52e8ec76edbfdee6e876ed9805e401bc52e8b928c806eec805eec847eee8
 03f3619016e003c84beee001c84aeee140e8a3f919c8a3f9bc3b60e102e8
 4aee799026e003c84beee8eded9805e008c803f3e008c88cf971e898f919
 c898f9e1bfe8a3f921c8a3f9b9e010c803f3bf9f5ce108e898f919c898f9
 e1bfe8a3f921c8a3f9e001c846ee
 b9
-```
+
reconsidering other lines, there's this from early on which is not obviously wrong but now clearly has an error: -``` +
 79 9009 fe07 72 fe08 73 e015 d216 74 17 75 bfead9 bc3bdb 16 74 17 75 bfe605
       ^                        ^
       $+9 is an instruction    is $+9, the split was wrong
-```
+
so this should be -``` +
 79 9009 fe07 72 fe08 73 e015 d2 16 74 17 75 bfead9 bc3bdb 16 74 17 75 bfe605
       ^                         ^
       $+9 is an instruction     is $+9d
-```
+
`16` is an instruction on its own, and so is `d2`. back to looking for interesting structures, and here's part of a larger function: -``` +
 e108
 e8d5ee
 79 9113
   e1f8 51 72 e8d6ee e100
-  9802          ; jump forward to 42..
+  9802          ; jump forward to 42..
     50 31 42
-  99fb          ; jump backwards to 50..
+  99fb          ; jump backwards to 50..
   bc45eb
 e9d5ee e008 59 49 71 e8d6ee bc40eb 69 38 41 99fb e100 76 11 77 e9d7ee
 16 79 e9d8ee 17 49
 9959
-```
+
so there's a short loop, the loop's body is `50 31 42`, and some condition means the loop is entered skipping `50 31`. different topic for a moment, there are lots of e8XXXX/c8XXXX. what's going on with that? something to orient with... -``` +
 87 86
 28
 c8f4b0
@@ -413,93 +417,101 @@ e0af c8c1b4
 e434 bf8f4d
 71 9003
   bc87c0
-e83bb5 c807ee ; the immediates here are interesting actually
-e83cb5 c808ee ; incrementing by 1
-e83db5 c809ee ; on the first and second instruction
-e83eb5 c80aee ; 0xb53e, 0xee0a ?
+e83bb5 c807ee ; the immediates here are interesting actually
+e83cb5 c808ee ; incrementing by 1
+e83db5 c809ee ; on the first and second instruction
+e83eb5 c80aee ; 0xb53e, 0xee0a ?
 bf2fd6
 bfdcdb
 bfc62e
 e101 799803bc38c0e803f3609809e841b6e942b6
-```
+
+ +## loads/stores!! `e8 XXXX` is probably a load! then `c8 XXXX` is a store! might be an absolute 16b address then? does that suggest `e0` is a relative load? maybe some kind of banked load. seems like `c9` is also a store, probably all c8-cf and e8-ef are store/load? -``` +
 bf8ac0 e1fe e850f3 21 c850f3
 b9 28 c8feed c8fded 72 e449 bcc2d4
-e829b4 c863ef                       ; another 32b copy
+e829b4 c863ef                       ; another 32b copy
 e828b4 c862ef
 e825b4 c865ef
 e824b4 c864ef
 e200 e400
 bf83e9
-c9f2ed c8f1ed e8f1ed e9f2ed         ; [edf2]->r1; [edf1]->r0; r0->[edf1]; r1->[edf2]? this is wrong
+c9f2ed c8f1ed e8f1ed e9f2ed         ; [edf2]->r1; [edf1]->r0; r0->[edf1]; r1->[edf2]? this is wrong
 bc8ad9
 e40e
 bf4204
-c929b4 c828b4                       ; again, storing and then loading later?
+c929b4 c828b4                       ; again, storing and then loading later?
 e00d
-ea28b4 eb29b4                       ; but ea/eb would be r4, r5 maybe
-```
+ea28b4 eb29b4                       ; but ea/eb would be r4, r5 maybe
+
elsewhere is another interesting sequence, annotating by the theory so far, -``` -e1bf ; r1<-[...0xbf] -e823f2 ; r0<-[0xf223] -21 ; ??? -c823f2 ; [0xf223]<-r0 -```` +
+e1bf    ; r1<-[...0xbf]
+e823f2  ; r0<-[0xf223]
+21      ; ???
+c823f2  ; [0xf223]<-r0
+
` + +this is great: control flow, loads/stores, this is enough to start finding +where registers are read and written, and start figuring out arithmetic or +other operations. + +## it does, in fact, have an ALU so `21` is maybe, `op r0, r1`? `21` can't encode two registers (would be `001y yzzz`? not enough space to say `r4, r5` here). so might be an implicit r0. `28` is a different `op2 r0, r0`? consider -``` -e803f3 ; r0<-[0xf303] -60 9009 ; also 60: generates a status from r0? - 28 ; definitely an instruction - c83dee ; [0xee3d]<-r0 - e001 ; r0<-[..0x01] - c85aed ; [0xed5a]<-r0 +
+e803f3    ; r0<-[0xf303]
+60 9009   ; also 60: generates a status from r0?
+  28      ; definitely an instruction
+  c83dee  ; [0xee3d]<-r0
+  e001    ; r0<-[..0x01]
+  c85aed  ; [0xed5a]<-r0
 
   bfd547
   71
 98fa
-```
+
`28` might be `xor r0, r0`, it's often precedes a `c8` store: -``` +
 87 86
-28 c8f4b0         ; xor r0, r0 (?); [0xb0f4]<-r0
+28 c8f4b0         ; xor r0, r0 (?); [0xb0f4]<-r0
 bf03bd
 e013 c820b5
 ... ...
-b9                ; ret
-28                ; first instruction in the block? function?
-c8feed c8fded     ; [0xedfe]<-r0; [0xedfd]<-r0
-72 e449 bcc2d4    ; op r0, r2; r4<-[..0x49]; ??
+b9                ; ret
+28                ; first instruction in the block? function?
+c8feed c8fded     ; [0xedfe]<-r0; [0xedfd]<-r0
+72 e449 bcc2d4    ; op r0, r2; r4<-[..0x49]; ??
 e829b4 c863ef
 e828b4 c862ef
 e825b4 c865ef
 e824b4 c864ef
-```
+
`78` is not present as an instruction it seems, `79` is? -``` -bfc62e ; unknown -e101 ; r1<-[..0x01] -79 9803 ; op r0, r1?; jCC $+3 - bc38c0 ; unknown -e803f3 ; r0<-[0xf303] -``` +
+bfc62e    ; unknown
+e101      ; r1<-[..0x01]
+79 9803   ; op r0, r1?; jCC $+3
+  bc38c0  ; unknown
+e803f3    ; r0<-[0xf303]
+
`7a` is a single-byte instruction, as is `74` and `b4`: -``` +
 29 4c 912d
-  ea5bed eb5ced e048 e120 ; r2<-[0xed5b]; r3<-[0xed5c]; r0<-[..0x48]; r1<-[..0x20]
-  7a e080 2b              ; 28 seems like xor r0, r0, so 2b is xor r0, r3?
-  74 e080 29              ; 29 as xor r0, r1?
+  ea5bed eb5ced e048 e120 ; r2<-[0xed5b]; r3<-[0xed5c]; r0<-[..0x48]; r1<-[..0x20]
+  7a e080 2b              ; 28 seems like xor r0, r0, so 2b is xor r0, r3?
+  74 e080 29              ; 29 as xor r0, r1?
   4c 9118
     e850f1
     64 9012
@@ -511,54 +523,54 @@ bfc62e e101
   e10f e8b0b4
   79 9009
     e8b1 b4 62 9803 61 900a
-```
+
fishing around to find more about the `1X` and `2X` opcodes, this region is interesting: -``` +
 62 9811
-  e108 e854f1             ; r1<-[..0x08]; r0<-[0xf154]
-  19                      ; op r0, r1? ;
-  c854f1                  ; [0xf154]<-r0; maybe 0001_1xxx is add?
-  e852f1                  ; r0<-[0xf152]
-  60                      ; something on r0 producing a condition..
+  e108 e854f1             ; r1<-[..0x08]; r0<-[0xf154]
+  19                      ; op r0, r1? ;
+  c854f1                  ; [0xf154]<-r0; maybe 0001_1xxx is add?
+  e852f1                  ; r0<-[0xf152]
+  60                      ; something on r0 producing a condition..
   9802
-    e600                  ; 1110_0xxx yyyyyyyy may actually be "load imm8 into rX"
-16                        ;
+    e600                  ; 1110_0xxx yyyyyyyy may actually be "load imm8 into rX"
+16                        ;
 74 bf67c5
-                          ; the sequence here is eventful
-e18f                      ; r1<-0x8f
-e825f2                    ; r0<-[0xf225]
-21                        ; op r0, r1
-e170                      ; r1<-0x70
-19                        ; op r0, r1
-c825f2                    ; [0xf225]<-r0
+                          ; the sequence here is eventful
+e18f                      ; r1<-0x8f
+e825f2                    ; r0<-[0xf225]
+21                        ; op r0, r1
+e170                      ; r1<-0x70
+19                        ; op r0, r1
+c825f2                    ; [0xf225]<-r0
 e008 c803f3 bf52d8
 28 c8e6ed c8eeed
-```
+
some evidence that `21` may be and specifically: -``` -e1bf ; r1<-0xbf -e823f2 ; r0<-[0xf223] -21 ; op r0, r1 ; if op were add, presumably there is a sub, why not sub 0x40? -c823f2 ; [0xf223]<-r0 ; and masks bits, makes somewhat more sense... -``` +
+e1bf      ; r1<-0xbf
+e823f2    ; r0<-[0xf223]
+21        ; op r0, r1     ; if op were add, presumably there is a sub, why not sub 0x40?
+c823f2    ; [0xf223]<-r0  ; and masks bits, makes somewhat more sense...
+
is `78`..`7f` is `cmp/test/sub r0, rN`: -``` +
 e812b1 e913b1 ea14b1 eb15b1
 ecfcee
-7c 9012           ; is [0xeefc] == [0xb112]?
+7c 9012           ; is [0xeefc] == [0xb112]?
   e8fdee
-  79 900c         ; is [0xeefd] == [0xb113]?
+  79 900c         ; is [0xeefd] == [0xb113]?
     e8feee
-    7a 9006       ; is [0xeefe] == [0xb114]?
+    7a 9006       ; is [0xeefe] == [0xb114]?
       e8ffee
-      7b 9803     ; is [0xeeff] == [0xb115]?
+      7b 9803     ; is [0xeeff] == [0xb115]?
         bf420b
 e80db1 e90eb1 ea0fb1 eb10b1
 ec05ef
-7c 9012           ; is the same for [0xb10d..0xb110] == [0xef05..0xef08]
+7c 9012           ; is the same for [0xb10d..0xb110] == [0xef05..0xef08]
   e806ef
   79 900c
     e807ef
@@ -567,76 +579,76 @@ ec05ef
       7b 9803
         bf420b
 8e b9
-```
+
notably `78` does not seem to appear as an instruction. preference for `xor r0, r0 (0x28)`? or not sub? this may help make sense of operand ordering as well, -``` -87 86 ; push r7; push r6 -14 76 15 77 f698 ; mov r4, r0; sub r0, r6, r6; mov r5, r0; sub r0, r7, r7 -19 e0f2 c8ecee ; mov r0->r1; r0<-0xf2; r0->[0xeeec] -17 ; this is why it's likely that the selected register is a destination, - ; 17 would be mov r0, r7. if 77 modified r0, r7 would be unmodified, and 77 would be dead code - ; instead if 77 modifies r7, this moves `r7_in - r5` into r0 -71 ; then this subtracts from r1, preservation of r0 after f698 (or 15) - ; otherwise 19 would be pointless -16 ; r6->r0 -c0 ; ??? -c9eeee ; why it would modify from r1, `[0xeeee]<-r1` -c8edee ; and `[0xeeed]<-r0` +
+87 86             ; push r7; push r6
+14 76 15 77 f698  ; mov r4, r0; sub r0, r6, r6; mov r5, r0; sub r0, r7, r7
+19 e0f2 c8ecee    ; mov r0->r1; r0<-0xf2; r0->[0xeeec]
+17                ; this is why it's likely that the selected register is a destination, 17 would
+                  ; be mov r0, r7. if 77 modified r0, r7 would be unmodified, and 77 would be dead
+                  ; code. instead if 77 modifies r7, this moves `r7_in - r5` into r0
+71                ; then this subtracts from r1, preservation of r0 after f698 (or 15)
+                  ; otherwise 19 would be pointless
+16                ; r6->r0
+c0                ; ???
+c9eeee            ; why it would modify from r1, `[0xeeee]<-r1`
+c8edee            ; and `[0xeeed]<-r0`
 e1ff f651
 72 e412 bf4bd4
 8e 8f b9
-```
+
but does any of this mean: -``` -c83bef ; [0xef3b]<-r0 -c93cef ; [0xef3c]<-r1 -ca3def ; [0xef3d]<-r2 -cb3eef ; [0xef3e]<-r3 -e825ee ; r0<-[0xee25] -e93bef ; r1<-[0xef3b] -19 c825ee ; op r0, r1; [0xee25]<-r0 ; 18..1f is likely not add, sub, could be adc/sbc, maybe `or` -e826ee ; r0<-[0xee26] -e93cef ; r0<-[0xef3c] -19 c826ee ; op r0, r1; [0xee26]<-r0 ; if 19 is `or`, this is computing or of two 32 regions -e827ee ; r0<-[0xee27] -e93def ; r1<-[0xef3d] -19 c827ee ; op r0, r1; [0xee27]<-r0 -e828ee ; r0<-[0xee28] -e93eef ; r1<-[0xef3f] -19 c828ee ; op r0, r1; [0xee28]<-r0 -ea3bef ; r2<-[0xef3b] ; then .. something? -eb3cef ; r3<-[0xef3c] -ec3def ; r4<-[0xef3d] -ed3eef ; r5<-[0xef3e] -80 e837ef ; op; r0<-[0xef37] -2280 ; op r0, r2; op -e838ef ; r0<-[0xef38] -2373 ; op r0, r3; op r0, r3 -e839ef ; r0<-[0xef39] -2474 ; op r0, r4; op r0, r4 -e83aef ; r0<-[0xef3a] -2575 ; op r0, r5; op r0, r5 -``` +
+c83bef      ; [0xef3b]<-r0
+c93cef      ; [0xef3c]<-r1
+ca3def      ; [0xef3d]<-r2
+cb3eef      ; [0xef3e]<-r3
+e825ee      ; r0<-[0xee25]
+e93bef      ; r1<-[0xef3b]
+19 c825ee   ; op r0, r1; [0xee25]<-r0 ; 18..1f is likely not add, sub, could be adc/sbc, maybe `or`
+e826ee      ; r0<-[0xee26]
+e93cef      ; r0<-[0xef3c]
+19 c826ee   ; op r0, r1; [0xee26]<-r0 ; if 19 is `or`, this is computing or of two 32 regions
+e827ee      ; r0<-[0xee27]
+e93def      ; r1<-[0xef3d]
+19 c827ee   ; op r0, r1; [0xee27]<-r0
+e828ee      ; r0<-[0xee28]
+e93eef      ; r1<-[0xef3f]
+19 c828ee   ; op r0, r1; [0xee28]<-r0
+ea3bef      ; r2<-[0xef3b]            ; then .. something?
+eb3cef      ; r3<-[0xef3c]
+ec3def      ; r4<-[0xef3d]
+ed3eef      ; r5<-[0xef3e]
+80 e837ef   ; op; r0<-[0xef37]
+2280        ; op r0, r2; op
+e838ef      ; r0<-[0xef38]
+2373        ; op r0, r3; op r0, r3
+e839ef      ; r0<-[0xef39]
+2474        ; op r0, r4; op r0, r4
+e83aef      ; r0<-[0xef3a]
+2575        ; op r0, r5; op r0, r5
+
or this: -``` -e876b4 ; r0<-[0xb476] -e977b4 ; r1<-[0xb476] -c8e6b0 ; [0xb0e6]<-r0 -c9e7b0 ; [0xb0e7]<-r1 -28 c8e8b0 ; [0xb0e8]<-0 -c8e9b0 ; [0xb0e9]<-0 +
+e876b4      ; r0<-[0xb476]
+e977b4      ; r1<-[0xb476]
+c8e6b0      ; [0xb0e6]<-r0
+c9e7b0      ; [0xb0e7]<-r1
+28 c8e8b0   ; [0xb0e8]<-0
+c8e9b0      ; [0xb0e9]<-0
 72 73
-e9d8ee      ; r1<-[0xeed8]
-e8d7ee      ; r0<-[0xeed7]
-bff5ec      ; ?
-ecd5ee      ; r4<-[0xeed5]
+e9d8ee      ; r1<-[0xeed8]
+e8d7ee      ; r0<-[0xeed7]
+bff5ec      ; ?
+ecd5ee      ; r4<-[0xeed5]
 bc63ea
-            ; so this loop is... do { X r1, X r3, X r2, X r1, X r0, X r4 } while cond(r4)?
+            ; so this loop is... do { X r1, X r3, X r2, X r1, X r0, X r4 } while cond(r4)?
   69 3b 3a 39 38 44
 99f8
 ecd3ee 5474 e8d4ee 097114
@@ -646,17 +658,19 @@ e8daee
 c877b4
 e8d9ee
 c876b4
-```
+
`18..1f` seem like `r0 |= rX`: -``` +
 e835ef e932ef 19 e933ef 19 e934ef 19
 9019
-```
+
where `19` would mean this `or`s all four bytes and checking for.. zero? non-zero? +## inc/dec is a loop's best friend + and maybe 4X is dec? shr? `40` agrees, here's a branch table or smth? -``` +
 9814
   11 19
   9810
@@ -667,130 +681,132 @@ and maybe 4X is dec? shr? `40` agrees, here's a branch table or smth?
     40 98df
     40
 bf57d0 bfbfcf
-```
+
seems like `40` is `dec r0`, consider this loop: -``` +
   e103
   e87fb4
-  21              ; op r1
-  74              ; op r4
-  e500            ; r5<-0x00
-  e00b            ; r0<-0x0b
+  21              ; op r1
+  74              ; op r4
+  e500            ; r5<-0x00
+  e00b            ; r0<-0x0b
 loop:
-    69 34 35 40   ; op r1?; op? r4; op? r5; dec r1
-  90 fa           ; jnz loop
-```
+    69 34 35 40   ; op r1?; op? r4; op? r5; dec r1
+  90 fa           ; jnz loop
+
so 0100_0XXX seems like `dec rN`. 0011_0XXX may be `inc rN`? and what is 0x69. some more about the low 7Xs: -``` +
 86
-14 76 e8e2ed    ; r4->r0? ; ...??? r6; r0<-[0xede2] .. maybe 76 is xchg r0, r6?
-7e 9817         ; 7e is maybe "compare r0 and r6"; jz?
+14 76 e8e2ed    ; r4->r0? ; ...??? r6; r0<-[0xede2] .. maybe 76 is xchg r0, r6?
+7e 9817         ; 7e is maybe "compare r0 and r6"; jz?
   e003 c858ef e0ff c859ef ce5aefe458e5efbfded6cee2ed
 8e b9
-```
+
since other ops seem oriented around operations on r0 and modifying r0, the low `7x`'s might be moving from `r0` to a different register? in contrast to low `1x` which move into r0. for example in the partially-disassembled snippet, -``` -r1 <- 0x04 -r0 <- r2 +
+r1 <- 0x04
+r0 <- r2
 r0 |= r1
 op7x.lo r0, r2
-r0 <- 0x32
-[0xf029] <- r0
-```
+r0 <- 0x32
+[0xf029] <- r0
+
it's loaded `r0`, modified it, and would clobber it after the unknown op. `op7x.lo` must at least read `r0` and write `r2` or other state. there aren't any other instructions to read flags or anything before the next `op7x.lo`, -``` -r1 <- 0x10 -r0 <- r2 +
+r1 <- 0x10
+r0 <- r2
 r0 |= r1
 op7x.lo r0, r2
-```
+
so it could be an add/sub to store back into r2, but the `or` wouldn't make sense. if the `r0` is the only register that can be modified by arithmetic instructions - instructions seem small so there's not much encoding space - then modifying a value would look like "copy to r0, modify, copy back". meanwhile `78..7f` is probably a `cmp` (rather than `sub`): in a sequence like -``` - r1 <- 0x03 - r0 <- [0xb475] - op7xhi r0, r1 ; byte 0x79 - jcc.lo.0 $+0x10 ; bytes 9010, destination `dest` - r1 <- 0x04 - r0 <- [0xeed2] +
+  r1 <- 0x03
+  r0 <- [0xb475]
+  op7xhi r0, r1   ; byte 0x79
+  jcc.lo.0 $+0x10 ; bytes 9010, destination `dest`
+  r1 <- 0x04
+  r0 <- [0xeed2]
   op7xhi r0, r1
   jcc.hi.0 $+0x08
-  r0 <- 0x03
-  [0xeed2] <- r0
+  r0 <- 0x03
+  [0xeed2] <- r0
   op.bc ec2c
 dest:
-  r1 <- 0x03
-```
+  r1 <- 0x03
+
so if `op7xhi r0, r1` modified the destination, that modification is clobbered. it generates flags (consumed by `jcc.lo.0`). `79` is a very common prefix to `90xx` or `98xx` branches, but uncommon to stand alone. counterpoint though, sequences like -``` -e100 ; r1 <- 0x00 -bfecac ; unknown -c9dfee c8deee ; [0xeedf] <- r1; [0xeede] <- r0 -e8eded 902f ; r0 <- [0xeded]; jcc $+0x2f? -``` +
+e100            ; r1 <- 0x00
+bfecac          ; unknown
+c9dfee c8deee   ; [0xeedf] <- r1; [0xeede] <- r0
+e8eded 902f     ; r0 <- [0xeded]; jcc $+0x2f?
+
have a useful branching condition with only loads (barring `bfecac` generating a status). and even if `bfecac` did generate a status, the next code if this is taken would be -``` -e8e8ed 9841 ; r0 <- [0xede8]; jcc $+0x41 -``` +
+e8e8ed 9841     ; r0 <- [0xede8]; jcc $+0x41
+
so either the `e8` load is enough to generate a status or the `98` branch is fully determined from the ealier `bfecac`. it's possible; the branches could be a pair like `jnz` and `ja`, where there is a third reasonable condition (`jb`) that becomes the implicit third outcome. but in that case why `e8e8ed` before the branch? so perhaps the `90`/`98` conditions are predicated fully on the contents of `r0`? ah, still unsure about `bf`, but this seems useful: -``` +
 86
 14 76 e8e2ed
 7e 9817
-  e003 c858ef   ; [0xef58] <- 0
+  e003 c858ef   ; [0xef58] <- 0
   e0ff
-  c859ef ce5aef ; [0xef59] <- 0; [0xef5a] <- 0xff
-  e458          ; r4 <- 0x58
-  e5ef          ; r5 <- 0xef ; so r5 and r4 together hold `ef58`, just assigned
-  bfded6        ; consumes r4, r5, writes r6?
-  cee2ed        ; [0xed2e] <- r6
+  c859ef ce5aef ; [0xef59] <- 0; [0xef5a] <- 0xff
+  e458          ; r4 <- 0x58
+  e5ef          ; r5 <- 0xef ; so r5 and r4 together hold `ef58`, just assigned
+  bfded6        ; consumes r4, r5, writes r6?
+  cee2ed        ; [0xed2e] <- r6
 8e b9
-```
+
+ +# more subtle load/store? distracted by `fe01`. found this: -``` -87 86 ; push r7; push r6 -28 c83df3 ; xor r0, r0; [0xf33d] <- r0 -e00a c83cf3 ; r0 <- 0x0a; [0xf33c] <- r0 -28 c8c2ee ; xor r0, r0; [0xeec2] <- r0 -e6ca e7ee ; r6<-ca; r7<-ee ; r7:r6 = 0xeeca -fe02 c837f3 ; fe02 ; [0xf337] <- r0 -fe01 c836f3 ; fe01 ; [0xf336] <- r0 -fe03 c838f3 ; fe03 ; [0xf338] <- r0 -fe04 c839f3 ; fe04 ; [0xf339] <- r0 -fe05 c83af3 ; fe05 ; [0xf33a] <- r0 -fe06 c83bf3 ; fe06 ; [0xf33b] <- r0 -f6 9827 ; -``` +
+87 86       ; push r7; push r6
+28 c83df3   ; xor r0, r0; [0xf33d] <- r0
+e00a c83cf3 ; r0 <- 0x0a; [0xf33c] <- r0
+28 c8c2ee   ; xor r0, r0; [0xeec2] <- r0
+e6ca e7ee   ; r6<-ca; r7<-ee ; r7:r6 = 0xeeca
+fe02 c837f3 ; fe02 ; [0xf337] <- r0
+fe01 c836f3 ; fe01 ; [0xf336] <- r0
+fe03 c838f3 ; fe03 ; [0xf338] <- r0
+fe04 c839f3 ; fe04 ; [0xf339] <- r0
+fe05 c83af3 ; fe05 ; [0xf33a] <- r0
+fe06 c83bf3 ; fe06 ; [0xf33b] <- r0
+f6 9827     ;
+
so `fe0X` writes to `r0`. before `feXX` are issued, `r6` and `r7 are often loaded with values that are also similar to nearby pointer values. so `r7:r6` usually forms a valid pointer. `fe00` does not exist in the image. is there a shorter instruction for a load of `[r7:r6 + 0]`? separately, looks like `deXX` is `store r0 to [r7:r6 + XX]`. consider this code: -``` -87 86 ; push r7; push r6 -ca40ef cc41ef ; [0xef40] <- r2; [0xef41] <- r4 -e412 bf14e9 76 11 77 ; r4 <- 0x12; call? ; r0->r6; r1->r0; r0->r7 -e072 ; r0 <- 0x72 -de03 ; hmm -e841ef de04 ; r0 <- [0xef41]; hmm -e840ef de05 ; r0 <- [0xef40]; hmm -e83eee de06 ; r0 <- [0xee3e]; hmm -e200 16 74 17 75 bf5f05 ; e2 <- 00; r6->r0; r0->r4; r7->r0; r0->r5; call? -8e 8f b9 ; pop r6; pop r7; ret -``` +
+87 86                   ; push r7; push r6
+ca40ef cc41ef           ; [0xef40] <- r2; [0xef41] <- r4
+e412 bf14e9 76 11 77    ; r4 <- 0x12; call? ; r0->r6; r1->r0; r0->r7
+e072                    ; r0 <- 0x72
+de03                    ; hmm
+e841ef de04             ; r0 <- [0xef41]; hmm
+e840ef de05             ; r0 <- [0xef40]; hmm
+e83eee de06             ; r0 <- [0xee3e]; hmm
+e200 16 74 17 75 bf5f05 ; e2 <- 00; r6->r0; r0->r4; r7->r0; r0->r5; call?
+8e 8f b9                ; pop r6; pop r7; ret
+
so if the move of `r1:r0` to `r7:r6` is for a reason, that likely means: * the calling convention returns pointers as `r1:r0` @@ -798,61 +814,63 @@ so if the move of `r1:r0` to `r7:r6` is for a reason, that likely means: then between each `deXX` the program only loads `r0` with an `e8XXXX`, so `deXX` does not modify `r0`. if it modifies other registers, it's not `r2` (clobbered later), not `r4`, `r5` (clobbered later). if it's an indirect store through `r7:r6` it doesn't seem to increment (if it does, this is a .... very strange access pattern). -most likely seems to be `r0 -> [r7:r6 + imm8]`. that seems like a plausible function: -``` +most likely seems to be `r0 -> [r7:r6 + imm8]`. that seems like a plausible function: +
 push r7; push r6;
-[0xef40] <- r2; [0xef41] <- r4;
-r4 <- 0x12; call 0xe914;
-r1:r0 -> r7:r6                          ; grouped a few moves together for this overall effect
-r0 <- 0x72;     [r7:r6 + 3] <- r0
-r0 <- [0xef41]; [r7:r6 + 4] <- r0
-r0 <- [0xef40]; [r7:r6 + 5] <- r0
-r0 <- [0xee3e]; [r7:r6 + 6] <- r0
-r2 <- 0x00; r7:r6 -> r5:r4; call 0x055f ; eliding more movs
+[0xef40] <- r2; [0xef41] <- r4;
+r4 <- 0x12; call 0xe914;
+r1:r0 -> r7:r6                          ; grouped a few moves together for this overall effect
+r0 <- 0x72;     [r7:r6 + 3] <- r0
+r0 <- [0xef41]; [r7:r6 + 4] <- r0
+r0 <- [0xef40]; [r7:r6 + 5] <- r0
+r0 <- [0xee3e]; [r7:r6 + 6] <- r0
+r2 <- 0x00; r7:r6 -> r5:r4; call 0x055f ; eliding more movs
 pop r6; pop r7;
 ret
-```
+
and `18..1f` is `or`! here's another region: -``` -e400 bf4204 ; r4 <- 0x00; call -76 11 77 ; r1:r0 -> r7:r6 ; similar to before: exact movs are r0->r6; r1->r0; r0->r7 -71 16 ; r0 -> r1; r6 -> r0 -19 ; unknown -9003 ; jcc $+3 - bf420b ; call -e0ff de03 ; r0 <- 0xff; [r7:r6 + 3] <- r0 -28 de04 ; xor r0, r0; [r7:r6 + 4] <- r0 -e005 de05 ; r0 <- 0x05; [r7:r6 + 5] <- r0 -28 de06 ; xor r0, r0; [r7:r6 + 6] <- r0 -``` +
+e400 bf4204       ; r4 <- 0x00; call
+76 11 77          ; r1:r0 -> r7:r6      ; similar to before: exact movs are r0->r6; r1->r0; r0->r7
+71 16             ; r0 -> r1; r6 -> r0
+19                ; unknown
+9003              ; jcc $+3
+  bf420b          ; call
+e0ff de03         ; r0 <- 0xff; [r7:r6 + 3] <- r0
+28 de04           ; xor r0, r0; [r7:r6 + 4] <- r0
+e005 de05         ; r0 <- 0x05; [r7:r6 + 5] <- r0
+28 de06           ; xor r0, r0; [r7:r6 + 6] <- r0
+
`bf4204` returned a pointer that would be used in `de03` and later, below. before it is used there though, `71 16 19` does something with the two bytes of pointer before conditionally calling(?) something(?). there aren't many useful operations on the two bytes. it's probably `or`, meaning `71 16 19` forms a null check, and there are likely other hits for that sequence... there are seven. four have the condition branch over a `bf420b`, so maybe `0xb42` is a fault handler? reset? some kind of trap. it probably doesn't return here since i'm certain that `r7:r6` is not useful for writing anyway. also, that tells us `90` is `jnz`. `98` then is probably `jz`. that's consistent with sequences from earlier, like -``` -11 19 9810 ; r0 <- r1; r0 |= r1 ; jz $+10 -40 98b6 ; dec r0 ; jz $-0x4a -40 98ba ; dec r0 ; jz $-0x46 -40 98d8 ; dec r0 ; jz $-0x28 -40 98db ; dec r0 ; jz $-0x25 -40 98df ; dec r0 ; jz $-0x21 -``` +
+11 19 9810  ; r0 <- r1; r0 |= r1  ; jz $+10
+40 98b6     ; dec r0              ; jz $-0x4a
+40 98ba     ; dec r0              ; jz $-0x46
+40 98d8     ; dec r0              ; jz $-0x28
+40 98db     ; dec r0              ; jz $-0x25
+40 98df     ; dec r0              ; jz $-0x21
+
implementing a branch table for `i` in `0..5`? +# a multiplier! + `69` and `38..3f` make more sense from this loop: -``` -28 c8e8b0 c8e9b0 ; xor r0, r0; [0xb0e8] <- r0; [0xb0e9] <- r0; -72 73 ; r0 -> r2; r0 -> r3 -e9d8ee e8d7ee ; r1:r0 <- [0xeed7:0xeed8] -bff5ec ; call -ecd5ee ; r4 <- [0xee5d] -bc63ea ; dunno - 69 3b 3a 39 38 44 ; ??? but r3, r2, r1, r0, then dec r4 -99f8 ; conditional branch to ??? -``` +
+28 c8e8b0 c8e9b0    ; xor r0, r0; [0xb0e8] <- r0; [0xb0e9] <- r0;
+72 73               ; r0 -> r2; r0 -> r3
+e9d8ee e8d7ee       ; r1:r0 <- [0xeed7:0xeed8]
+bff5ec              ; call
+ecd5ee              ; r4 <- [0xee5d]
+bc63ea              ; dunno
+  69 3b 3a 39 38 44 ;  ??? but r3, r2, r1, r0, then dec r4
+99f8                ; conditional branch to ???
+
so for each of `3b..38` it operates on `rN`, maybe `r0`. but if it accumulates into `r0`, why loop `r4` times? if `3b` mutates only `r3`, then there are few operations that make sense for all four registers: * not `adc/sbc` (add/sub X to each byte?) @@ -866,55 +884,55 @@ if it's `rcr/rcl` then `69` clears the carry flag between loops so the loop impl assuming `38..3f` is `rcr` since `r3:r2:r1:r0` matches endianness seen elsewhere. seems like `fa` is similar to `fe`, but loading through `r3:r2` instead of `r7:r6`. -``` -fe07 72 fe08 73 fa03 c872ed fa02 c871ed ; [r7:r6 + 7..8] -> r3:r2; fa03; store [0xed72]; fa02; store [0xed71] +
+fe07 72 fe08 73 fa03 c872ed fa02 c871ed     ; [r7:r6 + 7..8] -> r3:r2; fa03; store [0xed72]; fa02; store [0xed71]
 fe07 72 fe08 73 e004 d2 bceacc
-fe07 72 fe08 73 e873ed da02                 ; [r7:r6 + 7..8] -> r3:r2; load [0xed73]; da02
-fe07 72 fe08 73 e875ed da04 e874ed da03     ; [r7:r6 + 7..8] -> r3:r2; load [0xed73]; da02
-```
+fe07 72 fe08 73 e873ed da02                 ; [r7:r6 + 7..8] -> r3:r2; load [0xed73]; da02
+fe07 72 fe08 73 e875ed da04 e874ed da03     ; [r7:r6 + 7..8] -> r3:r2; load [0xed73]; da02
+
`fa` might clobber `r3:r2`? again if it was "load and increment" or "store and increment" then the immediate offsets are very odd. could just be redundant loads of `r3:r2`? also from this, `da` looks similar to `de`: `store r0 to [r3:r2 + XX]`. `21` might be `and r0, r1`? `r1` seem to often have some immediate consecutive bitmasky thing loaded shortly before `21`. same for `22`. check this out: -``` +
 f2 74
-e001 e100 e200 e300 ; r3:r2:r1:r0 <- 0
-9804                ; ???
-  50 31 32 33       ; op ?r0?; op r1, op r2, op r3 - maybe
-                    ; ` r0; rcl r1; rcl r2; rcl r3
-
-  44                ; dec r4
-99f9                ; conditional loop
-ec2eef 24 80        ; load r4; op r4; push r0
-e82fef 21 71        ; load r0; op r2; r0->r1
-e830ef 22 72        ; load r0; op r2; r0->r2
-e831ef 23 73        ; load r0; op r3; r0->r3
-88                  ; pop r0
-c832ef c933ef ca34ef cb35ef ; store r3:r0 -> [0xef32:0xef35]
-```
+e001 e100 e200 e300 ; r3:r2:r1:r0 <- 0
+9804                ; ???
+  50 31 32 33       ; op ?r0?; op r1, op r2, op r3 - maybe
+                    ; ` r0; rcl r1; rcl r2; rcl r3
+
+  44                ; dec r4
+99f9                ; conditional loop
+ec2eef 24 80        ; load r4; op r4; push r0
+e82fef 21 71        ; load r0; op r2; r0->r1
+e830ef 22 72        ; load r0; op r2; r0->r2
+e831ef 23 73        ; load r0; op r3; r0->r3
+88                  ; pop r0
+c832ef c933ef ca34ef cb35ef ; store r3:r0 -> [0xef32:0xef35]
+
looks like the loop is building up a 32b bitmask, `and`ing, then storing back? ok, different function: -``` - e408 ; r4 <- 8 +
+  e408                ; r4 <- 8
 back:
-    69 3d               ; ccf; rcr r5
-    911d                ; jcc.lo.1 forward
+    69 3d               ; ccf; rcr r5
+    911d                ; jcc.lo.1 forward
 back:
 
-      e8eab0 56 c8eab0  ; r0 <- [0xb0ea]; op5x r0, r6; [0xb0ea] <- r0
-      e8ebb0 09 c8ebb0  ; r0 <- [0xb0eb]; op0x r0, r1; [0xb0eb] <- r0
-      e8ecb0 0a c8ecb0  ; r0 <- [0xb0ec]; op0x r0, r2; [0xb0ec] <- r0
-      e8edb0 0b c8edb0  ; r0 <- [0xb0ed]; op0x r0, r3; [0xb0ed] <- r0
+      e8eab0 56 c8eab0  ; r0 <- [0xb0ea]; op5x r0, r6; [0xb0ea] <- r0
+      e8ebb0 09 c8ebb0  ; r0 <- [0xb0eb]; op0x r0, r1; [0xb0eb] <- r0
+      e8ecb0 0a c8ecb0  ; r0 <- [0xb0ec]; op0x r0, r2; [0xb0ec] <- r0
+      e8edb0 0b c8edb0  ; r0 <- [0xb0ed]; op0x r0, r3; [0xb0ed] <- r0
 
-      69                ; ccf
+      69                ; ccf
 forward:
-    36 31 32 33 44    ; something on r6, r1, r2, r3; dec r4
-  90d8                ; jnz back
-  b9                  ; ret
-```
+    36 31 32 33 44    ; something on r6, r1, r2, r3; dec r4
+  90d8                ; jnz back
+  b9                  ; ret
+
first observation: `91` is probably `jnc`. if it's `jc` then the loop would be entered with a carry flag set ... only on the first iteration. it seems more likely this is relying on knowing `cf` is unset to not execute `ccf` needlessly at the jump target. compared with other codegen, maybe this is a hand-written intrinsic? @@ -923,16 +941,16 @@ second observation: `30..37` might be `rcl`? whatever `56 .. 09 .. 0a .. 0b` doe then if `30..37` is `rcl`, the loop implements ` << 8`. why is TBD, but it seems like a plausible high-level behavior. elsewhere, this helps explain `50`: -``` -f6 74 ; ; r0 -> r4 -e001 e100 e200 e300 ; r3:r2:r1:r0 <- 00_00_00_01 -9804 ; jcc $+4 - 50 31 32 33 ; ; rcl r1; rcl r2; rcl r3 +
+f6 74                         ; ; r0 -> r4
+e001 e100 e200 e300           ; r3:r2:r1:r0 <- 00_00_00_01
+9804                          ; jcc $+4
+  50 31 32 33                 ; ; rcl r1; rcl r2; rcl r3
 
-  44                          ; dec r4
-99f9                          ; jcc $-7
-c83bef c93cef ca3def cb3eef   ; [0xef3b:0xef3e] <- r3:r2:r1:r0
-```
+  44                          ; dec r4
+99f9                          ; jcc $-7
+c83bef c93cef ca3def cb3eef   ; [0xef3b:0xef3e] <- r3:r2:r1:r0
+
if `50` were `add r0, r0`, this implements `1u32 << r4` - `add r0, r0` is functionally the same as shifting `r0` left by 1 with highest bit carried @@ -942,74 +960,76 @@ seems that `cf` is indeterminate. this region also reinforces that `99` is `jnc`. if `99` were `jc` the loop would be taken at most once, but as `jnc` it is taken until `r4 == 0`. this in turn helps explain `08..0f`: -``` -e408 ; r4 <- 8 +
+e408                    ; r4 <- 8
 bit:
-  69 3d 911d            ; ccf; rcr r5; jc clear
-    e8eab0 56 c8eab0    ; add [0xb0ea], r6 ; (taking creative liberties with the isa)
-    e8ebb0 09 c8ebb0    ; op [0xb0eb], r1
-    e8ecb0 0a c8ecb0    ; op [0xb0ec], r2
-    e8edb0 0b c8edb0    ; op [0xb0ed], r3
+  69 3d 911d            ; ccf; rcr r5; jc clear
+    e8eab0 56 c8eab0    ; add [0xb0ea], r6 ; (taking creative liberties with the isa)
+    e8ebb0 09 c8ebb0    ; op [0xb0eb], r1
+    e8ecb0 0a c8ecb0    ; op [0xb0ec], r2
+    e8edb0 0b c8edb0    ; op [0xb0ed], r3
 
 clear:
-  69 36 31 32 33 44     ; ccf; rcl r6:r1:r2:r3; dec r4
-90d8                    ; jnz bit
-b9                      ; ret
-```
+  69 36 31 32 33 44     ; ccf; rcl r6:r1:r2:r3; dec r4
+90d8                    ; jnz bit
+b9                      ; ret
+
so... this would be a 32b by 8b multiply.. but only if `op` is `adc`. for each set bit in `r5`, add `r6:r1:r2:r3` into `0xb0ea`. shift `r6:r1:r2:r3` left 1 regardless of bit being set in `r5`. repeat 8 times for each bit in `r5`. ... that said, the calling convention for this is different from every other function, and is moderately unhinged: why is `r4` unused? why is `r0` unused? why is `r6` *used*??? either way. `08..0f` is `adc`. but this function is weird enough to try figuring that out sooner than later. looking for the memory address referenced here, `0xb0ea` there's this region i'd looked at very early on that seems relevant: -``` +
 c870ef c971ef
 e878b4 e979b4 ec70ef 59 4c 74 11 e971ef 49 71 14 c977b4
 c876b4 28 c8beb9 e0ea c8c0b9 e00d c8bfb9 e201 e405 bcc632 e105 e875b4
 79 9004
   28 c8d2ee
-b98485 86 e600        ; something; push r6; r6 <- 0
-ceeab0                ; [0xb0ea] <- 0
-ceebb0                ; [0xb0eb] <- 0
-ceecb0                ; [0xb0ec] <- 0
-ceedb0                ; [0xb0ed] <- 0
-76 ede6b0             ; r0 -> r6 ; r5 <- [0xb0e6]
-bf 2f                 ; op; op
-ed ed e7b0bf 2f       ; ??
-ed ed e8b0bf 2f       ; ??
-ed ed e9b0bf 2f       ; ??
-ed                    ; ??
-e8eab0                ; [0xb0ea:0xbeed] <- r3:r2:r1:r0
+b98485 86 e600        ; something; push r6; r6 <- 0
+ceeab0                ; [0xb0ea] <- 0
+ceebb0                ; [0xb0eb] <- 0
+ceecb0                ; [0xb0ec] <- 0
+ceedb0                ; [0xb0ed] <- 0
+76 ede6b0             ; r0 -> r6 ; r5 <- [0xb0e6]
+bf 2f                 ; op; op
+ed ed e7b0bf 2f       ; ??
+ed ed e8b0bf 2f       ; ??
+ed ed e9b0bf 2f       ; ??
+ed                    ; ??
+e8eab0                ; [0xb0ea:0xbeed] <- r3:r2:r1:r0
 e9ebb0
 eaecb0
 ebedb0
 8e 8d 8c b9
-```
+
but the whole thing in the middle is nonsense. taking a much closer look, though, this was before i'd learned... many things about the instruction set. first, on line 6 the first instruction is not `b98485`! it is just `b9` - `ret`. so this region is actually the end of one function and start of the next. `84 85 86` are pushes in the prologue of the real function of interest. -additionally, `bf` is not a standalone instruction, it takes two bytes as an immediate to `call`. and `ed` is not an instruction on its own, it is `r5 <- [imm16]`. so lets delineate that correctly... -``` -84 85 86 e600 ; push r4; push r5; push r6; r6 <- 0 -ceeab0 ; [0xb0ea] <- 0 -ceebb0 ; [0xb0eb] <- 0 -ceecb0 ; [0xb0ec] <- 0 -ceedb0 ; [0xb0ed] <- 0 -76 ; r0 -> r6 -ede6b0 bf2fed ; r5 <- [0xb0e6]; call 32x8b multiply? -ede7b0 bf2fed ; r5 <- [0xb0e7]; call 32x8b multiply? -ede8b0 bf2fed ; r5 <- [0xb0e8]; call 32x8b multiply? -ede9b0 bf2fed ; r5 <- [0xb0e9]; call 32x8b multiply? -e8eab0 ; [0xb0ea:0xbeed] <- r3:r2:r1:r0 +additionally, `bf` is not a standalone instruction, it takes two bytes as an immediate to `call`. and `ed` is not an instruction on its own, it is `r5 <- [imm16]`. so lets delineate that correctly... +
+84 85 86 e600         ; push r4; push r5; push r6; r6 <- 0
+ceeab0                ; [0xb0ea] <- 0
+ceebb0                ; [0xb0eb] <- 0
+ceecb0                ; [0xb0ec] <- 0
+ceedb0                ; [0xb0ed] <- 0
+76                    ; r0 -> r6
+ede6b0 bf2fed         ; r5 <- [0xb0e6]; call 32x8b multiply?
+ede7b0 bf2fed         ; r5 <- [0xb0e7]; call 32x8b multiply?
+ede8b0 bf2fed         ; r5 <- [0xb0e8]; call 32x8b multiply?
+ede9b0 bf2fed         ; r5 <- [0xb0e9]; call 32x8b multiply?
+e8eab0                ; [0xb0ea:0xbeed] <- r3:r2:r1:r0
 e9ebb0
 eaecb0
 ebedb0
 8e 8d 8c b9
-```
+
-and so here we are: this function implements a 32b x 32 multiply of the integers in `b0ea:b0ed` and `b0e6:b0e9`, storing the result in `b0ea:b0ed`. notable mention to `r0`, which happens to be the low byte of the last round of multiplication, so the `e8eab0: [0xb0ea] <- r0` is in fact correctly storing the low byte of this whole thing to the output region. notable mention, too, to `76: r0 -> r6`, because by leaving `r0` free for clobber the inner multiply routine does not need to move the `r0` argument elsewhere to free `r0` for use in `add/adc`. and loads from memory are no more expensive (in terms of code size) when loading to an alternate register, so it's simple enough to load directly to `r6` for the to-multiply byte of reach step. +and so here we are: this function implements a 32b x 32 multiply of the integers in `b0ea:b0ed` and `b0e6:b0e9`, storing the result in `b0ea:b0ed`. notable mention to `r0`, which happens to be the low byte of the last round of multiplication, so the `e8eab0: [0xb0ea] <- r0` is in fact correctly storing the low byte of this whole thing to the output region. notable mention, too, to `76: r0 -> r6`, because by leaving `r0` free for clobber the inner multiply routine does not need to move the `r0` argument elsewhere to free `r0` for use in `add/adc`. and loads from memory are no more expensive (in terms of code size) when loading to an alternate register, so it's simple enough to load directly to `r6` for the to-multiply byte of reach step. -OK. this is great progress so far. many instructions make sense, composition of those instructions seems reasonable. the only remaining encoding regions that are unknown are: +## what's left? + +OK. this is great progress so far. many instructions make sense, composition of those instructions seems reasonable. the only remaining encoding regions that are unknown are: * `00..07` * `48..4f` * `58..5f` @@ -1024,20 +1044,20 @@ OK. this is great progress so far. many instructions make sense, composition of and as a bonus, knowing the relationship of the last two functions i'd looked at, i know the base address of this rom (finally!!): the inner multiply routine starts at `0xedf2`, so the first byte of this image is at address `0xed2f (mapped) - 0x31c3 (file) == 0xbb5c`. -theory for `d2`, `d4`, `d6`, as well as `e2`, `e4`, `e6`: like their `d` counterparts but with no immediate offset. that is, `d4` is `[r5:r4] <- r0`? heres a hex region to help inform this theory: -``` - 900d ; jcc later - ea15ee eb16ee ; r4<-[0xee15]; r5<-[0xee16] - fa01 dc01 ; r0<-[r3:r2+1]; [r5:r4+1]<-r0 - f2 d4 ; ?? ?? - b9 ; ret +theory for `d2`, `d4`, `d6`, as well as `e2`, `e4`, `e6`: like their `d` counterparts but with no immediate offset. that is, `d4` is `[r5:r4] <- r0`? heres a hex region to help inform this theory: +
+  900d              ; jcc later
+    ea15ee eb16ee   ; r4<-[0xee15]; r5<-[0xee16]
+    fa01 dc01       ; r0<-[r3:r2+1]; [r5:r4+1]<-r0
+    f2 d4           ; ?? ??
+    b9              ; ret
 
 later:
-  ea15ee eb16ee     ; r4<-[0xee15]; r5<-[0xee16]
-  fa03 dc01         ; r0<-[r3:r2+3]; [r5:r4+1]<-r0
-  fa02 d4           ; r0<-[r3:r2+2]; ??
-  b9                ; ret
-```
+  ea15ee eb16ee     ; r4<-[0xee15]; r5<-[0xee16]
+  fa03 dc01         ; r0<-[r3:r2+3]; [r5:r4+1]<-r0
+  fa02 d4           ; r0<-[r3:r2+2]; ??
+  b9                ; ret
+
so, this seems like a conditional branch to move 16b from one part of a struct or another, to a single destination location. `d4` probably stores the lower byte being copied, evidenced by `fa02` to load it in the later branch. then `f2` is probably a load of the lower byte, to store it in the earlier case. there is no `dc00` or `fa00` or similar.... probably because for offset-by-zero cases, there are these shorter instructions for the same outcome. this happens to make for a neat pattern as well for opcodes like `0b11x1_iNNN`: @@ -1047,52 +1067,64 @@ so, this seems like a conditional branch to move 16b from one part of a struct o and so this opens more questions than it answers! what happens if `NNN` an odd register? can this machine indirect through a register pair like `r4:r3`? why is the pair `r1:r0` never used? what about `r7:r6`? in fact `rEven:rOdd` seems never used, are those instructions entirely different? -[week long pause here] +... [week long pause here. Destiny 2: The Final Shape launched, and everything else ground to a halt] ... + +## whittling down the last few opcodes... -OK. short list of remaining instructions. still, want to figure out as many as possible. seems like `48..4f` has some kind of a lead here: -``` +OK. short list of remaining instructions. motivation and optimism are starting to fade.. but i want to figure out as many as possible. + +### `48..4f` + +seems like `48..4f` has some kind of a lead here: +
 e878b4 e979b4 ec70ef 59 4c 74 11 e971ef 49 71 14 c977b4
-```
+
which at first only looks interesting for its use of `4c`, not used much at all in this program. structuring that slightly differently makes some of the relationships a little clearer: -``` -e878b4 e979b4 ; r0:r1 <- [0xb478:0xb479] -ec70ef 59 4c 74 11 ; r4 <- [0xef70]; ???; ???; r0 -> r4; r1 -> r0 -e971ef 49 71 14 ; r1 <- [0xef71]; ???; r0 -> r1; r4 -> r0 -c977b4 c876b4 ; [0xb477:0xb478] <- r0:r1 -``` +
+e878b4 e979b4       ; r0:r1 <- [0xb478:0xb479]
+ec70ef 59 4c 74 11  ; r4 <- [0xef70]; ???; ???; r0 -> r4; r1 -> r0
+e971ef 49 71 14     ; r1 <- [0xef71]; ???; r0 -> r1; r4 -> r0
+c977b4 c876b4       ; [0xb477:0xb478] <- r0:r1
+
the `4c` and `49` operations clearly modify r0. `59` might modify a register or so something else; if it modifies a register, it's probably `r1` which *is* used later. it seems like r1 is the high byte of a 16b integer, so an operation directly on that byte seems a little unlikely. `59` might be a mirror of `69` (clear carry flag), setting the carry flag instead? as for `49` and `4c`, best guesses are heavily informed by what i already know: this isn't `adc`, `or`, `and`, `add`, rotate left or right, ... but given the seeming 16b value being operated on, maybe these are `sbc`. that would mean with `59` being `set cf`, this is computing something like `*0xb479 -= *0xef70 + 1`. this isn't a lot to go on for `sbc`, but double-checking a different function, it's at least coherent: -``` +
 e8e8ed 9841
   e103 fe01 79 9807
     e102 fe01 79
   9022
   ea03ee eb04ee
   e058 e11b
-  59 4a 72                    ; ??? ; sbc r0, r2; r0 -> r2
-  11 4b 73                    ; r1 -> r0 ; sbc r0, r3; r0 -> r3
-  e8deee 7a e8dfee 4b 9108    ; r0 <- [0xeede]; r0 -= r2; r0 <- [0xeedf]; sbc r0, r3; jc $+8
+  59 4a 72                    ; ??? ; sbc r0, r2; r0 -> r2
+  11 4b 73                    ; r1 -> r0 ; sbc r0, r3; r0 -> r3
+  e8deee 7a e8dfee 4b 9108    ; r0 <- [0xeede]; r0 -= r2; r0 <- [0xeedf]; sbc r0, r3; jc $+8
     bf27c5 e400 bff4dd
   e102 fe01 79 9803
     bc05c7
   28 c8e8ed bc05c7
 e101 fe01
 79 9008
-```
+
i've marked up the most relevant lines: `59` is a leader again, and `r1` is used here, but if `59` modifies `r1` then, again, it's something that makes sense to do first and only to the upper byte of a 16bit number. `r2:r3` seem subtracted into, and with the `load; sub; load; sbc; jc` sequence this implements something like `if (r2:r3 - 0x1b58 >= [0xeede:0xeedf])` +### `59` ... or a wild guess towards `58..5f`? + going to also assume that `59` is `set carry flag`, since no other `58..5f` instructions seem to be present here.. this mirrors `69` as well. +### `60..67` ... where possible + `61` shows up before conditional branches, usually after loading from `0xf303`..? is that maybe a gpio address? `60` and `62` are also present .... here: -``` +
 fe07 72 fe08 73 fa04 c875ed fa03 c874ed e850f1 62 9003 bceacc e852f1 60 9003 bceacc e400 bf67c5 bceacc
-```
+
... is `60..67` something like "extract bit N of r0"? r0 is typically loaded before it's executed, and conditional branches are always present after. probably not consuming an `rN` and probably modifies r0 for the condition. difficult to imagine another purpose for a 3-bit field at that point. -`ba` is probably `iret`? in this: -``` +### `ba` + +another region i looked at very early on has a "ba" in it at least. it's a remarkably rare opcode, it seems: +
 8f 8e 88
 c8f0b0 88 c8efb0 88 c8eeb0 88 c8edb0 88
 c8ecb0 88 c8ebb0 88 c8eab0 88 c8e9b0 88
@@ -1100,59 +1132,63 @@ c8e8b0 88 c8e7b0 88 c8e6b0
 8d 8c 8b 8a 89 88
 bae8f3 b4 c85ced e8f2b4 c85bed
 b9
-```
-... is probably actually split up wrong, rather than `bae8f3 b4`, this is `ba e8f3b4`! matching with the `e8f2b4` to load one byte lower a few instructions later. fixed up that looks like this:
-```
+
+... is actually split up wrong, rather than `bae8f3 b4`, this is `ba e8f3b4`! matching with the `e8f2b4` to load one byte lower a few instructions later. fixed up that looks like this: +
 [elided restore of 0xb0e6:0xb0f0]
 8d 8c 8b 8a 89 88
 ba
 
 e8f3b4 c85ced e8f2b4 c85bed
 b9
-```
+
+ +so then this routine is restoring the region of bytes used for 32b x 32b multiply, all registers, then almost-but-not-ret. given the full-restore including scratch memory, this seems like the end of an interrupt routine. so `ba` is `iret`? consistent with what might be an ISR return at least. there happens to be another small routine directly after. -so then this routine is restoring the region of bytes used for 32b x 32b multiply, all registers, then `iret`. consistent with what might be an ISR return. there happens to be another small routine directly after. +### `00..07` turning all the way back, this pattern gives an idea for `00..07`: -``` -e850ef e951ef ; r1:r0 <- [0xef51]:[0xef50] -e404 ; r4 <- 0x04 -54 ; r0 += r4 -9101 ; jnc $+1 - 01 ; ??? -80 ; push r0 -f0 ; r0 <- [r1:r0] -74 ; r4 <- r0 -88 ; pop r0 -f801 ; r0 <- [r1:r0 + 1] -75 ; r5 <- r0 -``` +
+e850ef e951ef   ; r1:r0 <- [0xef51]:[0xef50]
+e404            ; r4 <- 0x04
+54              ; r0 += r4
+9101            ; jnc $+1
+  01            ; ???
+80              ; push r0
+f0              ; r0 <- [r1:r0]
+74              ; r4 <- r0
+88              ; pop r0
+f801            ; r0 <- [r1:r0 + 1]
+75              ; r5 <- r0
+
or this, -``` -15 71 14 ; r1:r0 <- r5:r4 -e304 ; r3 <- 0x04 -53 ; r0 += r3 -9101 ; jnc $+1 - 01 ; ??? -76 11 77 ; r7:r6 <- r1:r0 -``` +
+15 71 14        ; r1:r0 <- r5:r4
+e304            ; r3 <- 0x04
+53              ; r0 += r3
+9101            ; jnc $+1
+  01            ; ???
+76 11 77        ; r7:r6 <- r1:r0
+
so here, `01` is only conditionally executed if adding produced a carry out. `r0` and `r1` seem to be operated on together, so the two might be logically a 16-bit integer. so `01` might be `inc rN`? in that case the carry out is being conditionally added into the higher byte. nothing else has seemed obviously like an `inc` yet. this seems a little odd on the whole in the first snippet, since the result of addition doesn't seem to be preserved.. `r0` is clobbered in the last load. could that whole region have been `f805 74 f806 75`? might be missing some additional behavior. other uses of `00..07` don't obviously disagree with this though. for example, `00`: -``` -e8eeed 00 c8eeed ; [0xedee] += 1 +
+e8eeed 00 c8eeed  ; [0xedee] += 1
 ...
-fa03 00 da03      ; [r3:r2 + 3] += 1
+fa03 00 da03      ; [r3:r2 + 3] += 1
 ...
-fe04 00 de04      ; [r7:r6 + 4] += 1
+fe04 00 de04      ; [r7:r6 + 4] += 1
 ...
-e8c0ee 00 c8c0ee  ; [0xeec0] += 1
-```
+e8c0ee 00 c8c0ee  ; [0xeec0] += 1
+
so, maybe `00` actually is inc. -this all is some progress, not much unknown left. from the earlier list: +## mostly done, what's left in the encoding space? + +this all is some progress, not much unknown left. from the earlier list: * `00..07` - `inc rN` * `48..4f` @@ -1168,81 +1204,81 @@ this all is some progress, not much unknown left. from the earlier list: * `b8..bf`, except `b9`, `ba`, `bc` (maybe jump?), `bf` * `c0..c7`, which is remarkably rare. `c0`, `c4`, `c6`? * `d0..df`, ~except `da`, `de`. seen but not understood: `db`, `dc`~ - `d0..d7`, evens, are `[rN+1:rN] <- r0` - `d8..df`, evens, are `[rN+1:rN + imm] <- r0` + `d0..d7`, evens, are `[rN+1:rN] <- r0` + `d8..df`, evens, are `[rN+1:rN + imm] <- r0` `db` is not actually present, was a misreading of the program * `f0..ff`, ~except `fa`, `fe`. seen but not understood: `f0`, `f2`, `f3`, `f4`, `f6`~ - `f0..f7`, evens, are `r0 <- [rN+1:rN]` - `f8..ff`, evens, are `r0 <- [rN+1:rN + imm]` + `f0..f7`, evens, are `r0 <- [rN+1:rN]` + `f8..ff`, evens, are `r0 <- [rN+1:rN + imm]` `f3` is not actually present, was a misreading of the program -so.. last questions: +so.. last questions: * is `78..7f` actually `sub` or `cmp`? * what is `a0`? * what are `c0..c7`? -### `78..7f` ... `sub` or `cmp`? +## `78..7f` ... `sub` or `cmp`? the question really is, "does this instruction modify `r0`?" - it's possible that the instruction computes `sub`, stores the result, and the program never actually uses that result, either because substraction isn't often used or because of a compiler deficiency, something else, whatever. so the best guess here is, "is `r0` ever preserved after a `78..7f`?" or asked differently, "does `r0` get preserved/restored around a `78..7f`?" the only hint that `78..7f` might clobber `r0` comes from regions like this: -``` -e103 fe01 79 9807 ; r1 <- 3; r0 <- [r7:r6]; sub r0, r1; jz ... - e102 fe01 79 9022 ; r1 <- 2; r0 <- [r7:r6]; sub r0, r1; jnz ... +
+e103 fe01 79 9807               ; r1 <- 3; r0 <- [r7:r6]; sub r0, r1; jz ...
+  e102 fe01 79 9022             ; r1 <- 2; r0 <- [r7:r6]; sub r0, r1; jnz ...
     ea03ee eb04ee e058 e11b ...
-```
+
if `79` were `cmp` and did not modify `r0`, there wouldn't be a need to reload it in `fe01`. ... but this may be poor code, and the reload may actually be redundant. since this seems to implement `if ([r7:r6] == 3 || [r7:r6] == 2) { .. load registers }`, and there are no other signs that `78..7f` clobbers `r0`, this might actually be `cmp`. -### what is `a0`? +## what is `a0`? this seems to be the only place `a0` is present: -``` +
 14 71
 bcabe6
-bfd00e        ; call 0xed0 (???)
-c8bcee        ; [0xeebc] <- r0
-a0            ; ???
-72 11 73      ; r3:r2 <- r1:r0
-fa0b c8bfee   ; [0xeebf] <- [r3:r2 + 0x0b]
-fa0a c8beee   ; [0xeebe] <- [r3:r2 + 0x0a]
-e046 40       ; r0 <- 0x46; dec r0 (???)
-da0a          ; [r3:r2 + 0x0a] <- r0
-e0e6 9901     ; r0 <- 0xe6; jc $+01
-  40          ; dec r0 (???)
-da0b          ; [r3:r2 + 0x0b] <- r0
+bfd00e        ; call 0xed0 (???)
+c8bcee        ; [0xeebc] <- r0
+a0            ; ???
+72 11 73      ; r3:r2 <- r1:r0
+fa0b c8bfee   ; [0xeebf] <- [r3:r2 + 0x0b]
+fa0a c8beee   ; [0xeebe] <- [r3:r2 + 0x0a]
+e046 40       ; r0 <- 0x46; dec r0 (???)
+da0a          ; [r3:r2 + 0x0a] <- r0
+e0e6 9901     ; r0 <- 0xe6; jc $+01
+  40          ; dec r0 (???)
+da0b          ; [r3:r2 + 0x0b] <- r0
 b9
-```
+
whatever it is, it presumably operates on at least `r0`, writes to `r0` and `r1`. the routine at `0xed0` (outside the image?) may say more about what the registers are at its return, but from this alone it's hard to guess. it is interesting and remarkable that only `r0` is saved to `[0xeebc]`, not `r1`! -### what are `c0..c7`? +## what are `c0..c7`? seems like the most informative region to hint at these instructions: -``` +
 e81ff8 61 903c
-  ea29ef eb2aef   ; r5:r4 <- [0xef2a]:[0xef29]
-  fa08 71 fa07    ; r1:r0 <- [r5:r4 + 8]:[r5:r4 + 7]
-  c0              ; ??
-  74 11 75        ; r5:r4 <- r1:r0
-  12              ; r0 <- r2
-  e92aef          ; r1 <- [0xef2a]
-  e307 53 9101    ; r3 <- 0x07; add r0, r3; jnc $+1
-    01            ; inc r1
-  80 f0 72 88     ; r2 <- [r1:r0]
-  f801 73         ; r3 <- [r1:r0 + 1]
-  f2              ; r0 <- [r3:r2]
-  e1ff 51         ; r0 += 0xff
-  77              ; r7 <- r0
-e600              ; r6 <- 0
-9806              ;
-  f4 c88cf8       ; [0xf88c] <- [r5:r4]
-  c4              ; ??
-  06              ; inc r6
-
-  16 7f           ; r0 <- r6; cmp r0, r7
-91f6              ; jb $-0x0a
-```
+  ea29ef eb2aef   ; r5:r4 <- [0xef2a]:[0xef29]
+  fa08 71 fa07    ; r1:r0 <- [r5:r4 + 8]:[r5:r4 + 7]
+  c0              ; ??
+  74 11 75        ; r5:r4 <- r1:r0
+  12              ; r0 <- r2
+  e92aef          ; r1 <- [0xef2a]
+  e307 53 9101    ; r3 <- 0x07; add r0, r3; jnc $+1
+    01            ; inc r1
+  80 f0 72 88     ; r2 <- [r1:r0]
+  f801 73         ; r3 <- [r1:r0 + 1]
+  f2              ; r0 <- [r3:r2]
+  e1ff 51         ; r0 += 0xff
+  77              ; r7 <- r0
+e600              ; r6 <- 0
+9806              ;
+  f4 c88cf8       ; [0xf88c] <- [r5:r4]
+  c4              ; ??
+  06              ; inc r6
+
+  16 7f           ; r0 <- r6; cmp r0, r7
+91f6              ; jb $-0x0a
+
the ending loop makes some sense: load from a 16-bit pointer, store to maybe-IO-register(?), increment `r6`, repeat until `r6 == r7`. in other contexts where `c0` is used, `r1:r0` is recently populated with a 16-bit integer too. so it seems likely that `c[0-7]` operates on at least `rN`, maybe `rN+1` if it's more like the `d_` or `f_` two-registers-as-an-address instructions. @@ -1251,78 +1287,78 @@ if `c4` were a load or store it would probably operate with respect to `r0` and looking at the `c0` earlier in this block `r1:r0` is loaded immediately before, and then read (copied to `r5:r4`) immediately after. `r5:r4` is used for the `f4` load, so those registers form something like a pointer. compare with the other use of `c4` in this program here: -``` - e4e3 e5ed ; r4 <- 0xe3; r5 <- 0xed - e28f e301 ; r2 <- 0x8f; r3 <- 0x01 - 28 ; xor r0, r0 +
+ e4e3 e5ed ; r4 <- 0xe3; r5 <- 0xed + e28f e301 ; r2 <- 0x8f; r3 <- 0x01 + 28 ; xor r0, r0 loop: - 42 9903 ; dec r2; jnc body - 43 9105 ; dec r3; jc exit + 42 9903 ; dec r2; jnc body + 43 9105 ; dec r3; jc exit body: - d4 ; [r5:r4] <- r0 - c4 ; ??? - bc69bb ; jmp loop + d4 ; [r5:r4] <- r0 + c4 ; ??? + bc69bb ; jmp loop exit: - bc26be ; jmp ... somewhere ... -``` + bc26be ; jmp ... somewhere ... + -`r5:r4` is written through, but the combined `dec r2; jnc body; dec r3; jc exit; ... jmp loop` forms a a loop that repeats until `r3:r2` is decremented past zero. the loop body is simply `[r5:r4] <- r0`, `r0` set to zero, `c4` probably operates on `r5:r4`, and if it modifies `r0` then `r0` is left in that modified state for the next store through `[r5:r4]`. +`r5:r4` is written through, but the combined `dec r2; jnc body; dec r3; jc exit; ... jmp loop` forms a a loop that repeats until `r3:r2` is decremented past zero. the loop body is simply `[r5:r4] <- r0`, `r0` set to zero, `c4` probably operates on `r5:r4`, and if it modifies `r0` then `r0` is left in that modified state for the next store through `[r5:r4]`. looking at other 8-bit processors for inspiration regarding `c[0246]`, it seems plausible that it is in fact an increment for a register pair. in that case, the loop forms a memset, clearing `0x1c0` bytes of memory. this is also almost at the start of the image - not knowing where execution begins, it still seems likely enough that this is related to initialization. there might not be a corresponding 16-bit decrement instruction? or if there is, like the 8080, it might not set flags, and so would not be useful to decrement `r3:r2` in this loop. looking back at the other loop earlier: -``` -e600 ; r6 <- 0 -9806 ; - f4 c88cf8 ; [0xf88c] <- [r5:r4] - c4 ; inc r5:r4 - 06 ; inc r6 - - 16 7f ; r0 <- r6; cmp r0, r7 -91f6 ; jc $-0x0a -``` +
+e600              ; r6 <- 0
+9806              ;
+  f4 c88cf8       ; [0xf88c] <- [r5:r4]
+  c4              ; inc r5:r4
+  06              ; inc r6
+
+  16 7f           ; r0 <- r6; cmp r0, r7
+91f6              ; jc $-0x0a
+
then taking `c4` to be `inc r5:r4` makes this a loop writing the bytes from a buffer `r7` bytes long at `r5:r4` into the address `f88c`. why not decrement `r7` instead of the inc/mov/compare?? -### but wait! what happened with `jcc`? +## but wait! what happened with `jcc`? in writing this up i flip-flopped on the meaning of `91` and `99` jumps without entirely realizing it. two different regions of code suggest different semantics! first, the inner multiply loop from earlier: -``` - e408 ; r4 <- 8 +
+  e408                  ; r4 <- 8
 back:
-    69 3d               ; ccf; rcr r5
-    911d                ; jcc.lo.1 forward
+    69 3d               ; ccf; rcr r5
+    911d                ; jcc.lo.1 forward
 back:
 
-      e8eab0 56 c8eab0  ; add [0xb0ea], r6
-      e8ebb0 09 c8ebb0  ; add [0xb0eb], r1
-      e8ecb0 0a c8ecb0  ; add [0xb0eb], r2
-      e8edb0 0b c8edb0  ; add [0xb0eb], r3
+      e8eab0 56 c8eab0  ; add [0xb0ea], r6
+      e8ebb0 09 c8ebb0  ; add [0xb0eb], r1
+      e8ecb0 0a c8ecb0  ; add [0xb0eb], r2
+      e8edb0 0b c8edb0  ; add [0xb0eb], r3
 
-      69                ; ccf
+      69                ; ccf
 forward:
-    36 31 32 33 44    ; ccf; rcl r6:r1:r2:r3; dec r4
-  90d8                ; jnz back
-  b9                  ; ret
-```
+    36 31 32 33 44    ; ccf; rcl r6:r1:r2:r3; dec r4
+  90d8                ; jnz back
+  b9                  ; ret
+
where `91` seems like `jnc` - "jump past adding in the multiplier if the next bit in the multiplicand was 0". but a different loop suggests the opposite reading: -``` - e4e3 e5ed ; r4 <- 0xe3; r5 <- 0xed - e28f e301 ; r2 <- 0x8f; r3 <- 0x01 - 28 ; xor r0, r0 +
+  e4e3 e5ed   ; r4 <- 0xe3; r5 <- 0xed
+  e28f e301   ; r2 <- 0x8f; r3 <- 0x01
+  28          ; xor r0, r0
 loop:
-  42 9903     ; dec r2; jnc body
-  43 9105     ; dec r3; jc exit
+  42 9903     ; dec r2; jnc body
+  43 9105     ; dec r3; jc exit
 body:
-  d4          ; [r5:r4] <- r0
-  c4          ; ???
-  bc69bb      ; jmp loop
+  d4          ; [r5:r4] <- r0
+  c4          ; ???
+  bc69bb      ; jmp loop
 exit:
-  bc26be      ; jmp ... somewhere ...
-```
+  bc26be      ; jmp ... somewhere ...
+
where instead it's `99` that looks like a `jnc` - "if decrementing r2 did not borrow, do not decrement r3 and continue another loop iteration". and `91` is what looks like a `jc`- "if decrementing r3 borrowed, skip past the loop body". either "jc" and "jnc" are conditional on more than it first seems, or perhaps more likely, `dec` produces a carry bit any time the result is not zero. as an example: @@ -1336,6 +1372,27 @@ either "jc" and "jnc" are conditional on more than it first seems, or perhaps mo that would bring this all back together: `99` is `jc`, `91` is `jnc`. it's rare that there's a `dec; jcc` (one other instance in this program at `0x16db`), so it's hard to cross-check this interpretation. +## last thoughts + +in looking at this i was very surprised by how informative loops - especially +short loops - are for finding bounds of what a program may or likely does not +do. this isn't very surprising in retrospect; short programs don't have +opportunities to do very much, and doing the same not-very-much in a loop has +even fewer opportunities to do something useful. _and_ loops are usually +conditioned on a relatively simple predicate: `while x < 10 do { ... }`, or +`do { ... } while x > 10`, or `while x != 0 { x = loop_body() }`. a _lot_ of +behavior fell out of finding short loops and making sense of the instructions +used to drive them. + +this definitely applies when you *do* know the instruction set but are trying +to make sense of a larger program - it's just good advice when reverse +engineering a program. it's neat to see the idea carry through when you're +figuring out the instruction set itself. + +additionally: this is doable! what's totally unknown at this point is mostly +instructions that don't appear in this program (at which point it's hard to +guess about behavior...) + ## conclusion that seems to be the ISA, at least as used in this program. this architecture seems like an outsider art re-envisioning of the 8080, with fewer register to register movs, and more indexed memory accesses. it seems interesting that this architecture has loads like `[r7:r6]` and `[r7:r6 + N]` but not `[r7:r6 + rN]`. having non-offset load/store through a register pair seems a bit out of place in its own right: it's a lot of encoding space to reserve for a relatively rare operation. maybe it's more common in some reference program, and this firmware is the odd one? @@ -1343,10 +1400,10 @@ that seems to be the ISA, at least as used in this program. this architecture se `a0..af` are almost nonexistent here, and might be other 8080-style instructions. `b0..b8` are not represented in this program, and would be prime encoding space for conditional returns. it wouldn't be terribly shocking if a compiler didn't know to use conditional returns and instead conditionally branched over returns. the moment _i_, at least, have been waiting for, after describing as much of the ISA as possible, is to compare notes with others who have looked at this CPU or programs for it: -* whitequark's binja plugin: https://github.com/whitequark/binja-avnera -* several years ago: https://github.com/Prehistoricman/AV7300 +* whitequark's binja plugin: [https://github.com/whitequark/binja-avnera](https://github.com/whitequark/binja-avnera) +* several years ago: [https://github.com/Prehistoricman/AV7300](https://github.com/Prehistoricman/AV7300) -... we almost entirely agree! it seems that Prehistoricman tested with a physical CPU, and has some notes for opcodes that are otherwise not present: https://github.com/Prehistoricman/AV7300/blob/master/Instruction%20set%20notes.txt#L195-L198 +... we almost entirely agree! it seems that Prehistoricman tested with a physical CPU, and has some notes for opcodes that are otherwise not present: [https://github.com/Prehistoricman/AV7300/blob/master/Instruction%20set%20notes.txt#L195-L198](https://github.com/Prehistoricman/AV7300/blob/master/Instruction%20set%20notes.txt#L195-L198) whitequark records `58..5f` and `68..6f` as `set` and `clr` respectively, which i suspect are the same as i'd understood: set (or clear) bit in status register. this is also what Prehistoricman understood them to mean. @@ -1356,18 +1413,19 @@ if you happen to want to disassemble programs for Avnera processors - it's not a ### summarized materials -there are a few more programs reportedly for this architecture here, from Prehistoricman: -[link 1, sha256] -[link 2, sha256] +there are a few more programs reportedly for this architecture here, from Prehistoricman: +[link 1, sha256] +[link 2, sha256] -the program i reference heavily in this post is here: -[link 3, sha256] +the program i reference heavily in this post is here: +[link 3, sha256] +* [noes](./noes) [mirror] -whitequark's excellent cheatsheet of the encoding space: +whitequark's excellent cheatsheet of the encoding space: * https://github.com/whitequark/binja-avnera/tree/main?tab=readme-ov-file#cheatsheet this last one i find interesting as history for what i guessed right, wrong, and revisited how early on - my notes as i touched up and revisited `noes` with -increasingly-better understanding: -* [yax/avnera/disasm/](./disasm) +increasingly-better understanding: +* [yax/avnera/disasm/](./disasm/) -- cgit v1.1