source/blog/yax/arch/rx.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251

# Renesas RX

notes from writing `yaxpeax-rx`, largely from reading the rx v1/v2/v3 manuals:

* `rxv1`: RX Family RXv1 Instruction Set Architecture (User's Manual: Software), Rev. 1.30 (Dec 2019)
  * retrieved 2023-12-16 from https://www.renesas.com/us/en/document/mas/rx-family-rxv1-instruction-set-architecture-users-manual-software-rev130
  * sha256: `e659dd509141da6bb1cfabf26c9f9ab5996d02060acaad2b5702963116834415`
* `rxv2`: RX Family RXv2 Instruction Set Architecture (User's Manual: Software), Rev. 1.00 (Nov 2013)
  * retrieved 2023-12-16 from https://www.renesas.com/us/en/document/mas/rx-family-rxv2-instruction-set-architecture-users-manual-software
  * sha256: `c12fc8d16adf1530f2cad3f75974d2a29062580a984a71fd9461417b66bba18a`
* `rxv3`: RX Family RXv3 Instruction Set Architecture (User's Manual: Software), Rev. 1.00 (Nov 2018)
  * retrieved 2023-12-16 from https://www.renesas.com/us/en/document/mas/rx-family-rxv3-instruction-set-architecture-users-manual-software-rev100
  * sha256: `829815515a57d077bdfa418e0e167b512f2a04b3db3613329a4d8980399cf74c`

broadly: of all the instruction sets, this is definitely one of them. 16
general-purpose registers. some instructions have shorter-form encodings that
use only three bits for register selection, rather than four. so i imagine a
preference to use the low eight registers for code density reasons. i'm curious
how that works out for real programs and compilers weighing register choice
like that.

`BMCnd` stands out as an interesting instruction; `Conditional bit transfer`
undersells it. it moves the state of a condition, `0` or `1`, to the specified
bit in a destination. the destination can either be a register or memory, and
otherwise leaves the destination value unmodified. `SCCnd` is similar but
behaves more like x86's `setcc` instructions: set the entire destination
byte/register to `0` or `1` depending on the condition.

## rx v2

v2 adds a smattering of new instructions, and architectural extensions - see section `3.2 List of RXv2 Extended Instruction Set`.

* a second accumulator register was added, bringing the set to `a0` and `a1`.
* many instructions were extended to operate on either `a0` or `a1`, in place of prior `a0`-only forms.
* `fsqrt`! new! and 3-operand forms of `fadd`, `fmul`, and `fsub`.
* and, accumulators are 72-bit now.

## rx v3

v3 adds less, but also more. again, section `3.2 List of RXv3 Extended Instructions` for exact info.

* `bfmov/`bfmovz`, which i talk a bit more about below, for bulk bit transfers between words
* a 3-operand form of `xor`, giving it parity with other instructions like `add`, `sub`, etc
* AND AN ENTIRE SET OF DOUBLE-PRECISION INSTRUCTIONS AND 16 NEW DOUBLE-PRECISION REGISTERS.

practically speaking, the summaries here are accurate with i found when reading
through the manuals' contents. why did i have to read through the manuals
meticulously?

# decode table, or lack thereof

instruction encodings are listed in alphabetic order of instruction mnemonics. this is not amenable to writing a disassembler.. so i went through all three versions of the manual and transcribed encodings *from* the manual into a text file i could easily reorder. and so [notes/encoding_table](https://github.com/iximeow/yaxpeax-rx/blob/no-gods-no-/notes/encoding_table) was born. reorder that to be approximately by bits, and [notes/reordered_encodings](https://github.com/iximeow/yaxpeax-rx/blob/no-gods-no-/notes/reordered_encodings). finally, i tried finding patterns across encodings and simplifying the total number of encodings across all instructions, and that left me with [notes/grouped_encodings](https://github.com/iximeow/yaxpeax-rx/blob/no-gods-no-/notes/grouped_encodings).

vendors! please do not make me write things like this!! i'm not good at it!!!
```
0 0 0 0 0 1 1 0 | mi  [ opc ] ld  | [ rs  ] [ rd  ]                   SUB src, dest (v1, v2, v3)
                  0 0 => B    0 0 => [Rs]                                                        
                  0 1 => W    0 1 => dsp:8[Rs]                                                   
                  1 0 => L    1 0 => dsp:16[Rs]                                                  
                  1 1 => UW   1 1 => Rs                                                          
    opc={sub, cmp, add, mul, and, or, X, X, see below}                                           
                                                                                                 
0 0 0 0 0 1 1 0 | mi  1 0 0 0 ld  | 0 0 0 [  opc  ] | [ rs  ] [ rd  ] SBB src, dest (v1, v2, v3) 
                  1 0 => L                                                                       
                  _ _ => invalid                                                                 
                              00 => [Rs]                                                         
                              01 => dsp:8[Rs]                                                    
                              10 => dsp:16[Rs]                                                   
    opc={                                                                                        
      sbb(mi=10,ld!=11), X, adc(mi=10,ld!=11), X,                                                
      max, min, emul, emulu,                                                                     
      div, divu, X, X                                                                            
      tst, xor, X, X,                                                                            
      xchg, itof, X, X,                                                                          
      X, utof(v2, v3), X, X,                                                                     
      X, X, X, X,                                                                                
      X, X, X, X,                                                                                
    }                                                                                            
                                                                                                 
0 0 0 0 1 [dsp]                                                       BRA.S src (v1, v2, v3)     
                                                                                                 
0 0 0 1 c [dsp]                                                       BCnd.S src (v1, v2, v3)    
        0 => beq/bz   (src = if dsp > 2 { dsp } else { dsp + 8 })                                
        1 => bne/bnz                                                                             
                                                                                                 
0 0 1 0 [ cnd ] | [    pcdsp    ]                                     BCnd.B src (v1, v2, v3)    
        cnd => {eq, ne, geu, ltu, gtu, leu, pz, n, ge, lt, gt, le, o, no, bra.b, Reserved}       
```

the disassembler itself is largely transcription of this table into source code. including, unfortunately, a massive chain of if/else from `0b00000000` stopping at dozens of points on the way to `0b11111111`. :')

# encoding notes

## operands...

instructions with `ld` or `ls` fields encode an operand that is either `[Reg]`,
`disp[Reg]`, or `Reg` (just the register, no memory access). some of these
instructions, like the `06` encodings of `sub`, `cmp`, `add`, ... also have a
`mi` field that indicates how the memory operand is extended for use with the
second operand - which may be used only as a second source, or sometimes used
as a source+destination.

so, if `ld` is `0b11` indicating a `Reg`, and `mi` indicates, for example, `.B`
meaning sign extension of a byte. but there is no indication in the manual
that, for example, `sub` would have an encoding that would mean `sub.b r1, r5`.
so what does `mi = 0b00 = b` mean for these instructions? no idea! `yaxpeax-rx`
assumes the bits are ignored for direct register operands. someone please prove
this wrong! or right. either is fine.

## stnz/stz v2+ encoding typo

encoding `(2)` of both of these instructions is a new extension in `RXv2`. unfortunately the manual has a typo: it says that `stnz` encoding 2 looks like...

```
(2) STNZ src, dest                                   
                                                     
b7           b0 | b7           b0 | b7           b0a 
1 1 1 1 1 1 0 0 | 0 1 0 0 1 0 1 1 | [ rs  ] [ rd  ]  
                          ^^^^^^^ relevant           
```

while encoding 2 of `stz`...
```
(2) STZ src, dest                                    
                                                     
b7           b0 | b7           b0 | b7           b0a 
1 1 1 1 1 1 0 0 | 0 1 0 0 1 0 1 1 | [ rs  ] [ rd  ]  
                          ^^^^^^^ same as above!     
```

are `stz` and `stnz` somehow encoded the same? confusion abounds. internet dog the6p4c had the good idea to check binutils to cross check with what Renesas themselves might have said on the matter. they found:

[`[PATCH v2][RX] Add RXv2 Instructions`](https://sourceware.org/legacy-ml/binutils/2015-04/msg00081.html)
```
+                                                   
+/** 1111 1100 0100 1011 rsrc rdst  stz %1, %0 */   
+  ID(stcc); SR(rsrc); DR(rdst); S2cc(RXC_z);       
+                                                   
+/** 1111 1100 0100 1111 rsrc rdst  stnz  %1, %0 */ 
+  ID(stcc); SR(rsrc); DR(rdst); S2cc(RXC_z);       
```

which pretty clearly says "`stz` has the low bits of `1011`", "`stnz` has the low bits of `1111`". confusion resolved. EXCEPT: this includes a *different* copy/paste error! both instructions here have `S2cc(RXC_z)`. there's a followup commit for this,

```
commit 239efab16429cad466591ccd1c57bba786171765             
Author: Yoshinori Sato <ysato@users.sourceforge.jp>         
Date:   Thu Dec 17 01:42:34 2015 +0900                      
                                                            
    RXv2 support update                                     
                                                            
    2015-12-22  Yoshinori Sato <ysato@users.sourceforge.jp> 
                                                            
    opcodes/                                                
            * rx-decode.opc (movco): Use uniqe id.          
            (movli): Likewise.                              
            (stnz): Condition fix.                          
                                                            
[...snip...]                                                
                                                            
 /** 1111 1100 0100 1111 rsrc rdst      stnz    %1, %0 */   
-  ID(stcc); SR(rsrc); DR(rdst); S2cc(RXC_z);               
+  ID(stcc); SR(rsrc); DR(rdst); S2cc(RXC_nz);              
                                                            
[...snip...]                                                

```

so eventually everything ended up in the right state. but it's *very* funny to
look through the history and realize there were two copy-paste errors in
different directions about these two instructions. cursed additions!

## cmp...

cmp encoding (2), for `cmp #uimm:8` could be read as the bit pattern
```
0 1 1 1 0 1 li  | [ opc ] [ rs2 ]
```
like `cmp` encoding `(3)`, or similar encodings of `mul`, `and`, `or`, but with `opc=0b101`. it has the additional constraint of `li=0b01` in such a reading, but this raises a question.. if `opc=0b000` allows four immediate operand lengths - 8, 16, 24, and 32 bits, sign-extended to 32 bits - why not allow all operand lengths with zero-extension for `opc=0b101`?? alas.

## double-precision instructions...

also in the area of
```
0 1 1 1 0 1 li ...
```
instructions, in RXv3 a new set of double-precision and related instructions were added. this makes another pattern with this encoding clearer: `li` picks the number of bytes to be read for operands, even though none of the operands are necessarily interpreted as an immediate.

`li=0b01` usually represents a 32-bit immediate encoded as a sign-extended 8-bit value. so, read `0x7a`, read a byte for the opcode and destination register, then read one byte for the immediate. but for instructions like `int`, the encoding works out as
```
0 1 1 1 0 1 0 1 | 0 1 1 0 0 0 0 0 | [ uimm:8 ]                              
            li=01 opc=0110 rd=0000  ^ and read the 1-byte immediate of li=01
```

RXv3 extends this - where a 2-byte immediate might involved in an instruction like
```
0 1 1 1 0 1 1 0 | 0 0 0 1 0 1 1 0 | 0 1 0 1 0 1 0 1 | 1 0 1 0 1 0 1 0
            li=10 opc=0001 rs2=0110 imm=0x55AAi16                    
```
other new instructions, like `dadd r6, r5, r4`, are encoded.... *similarly*
```
0 1 1 1 0 1 1 0 | 1 0 0 1 0 0 0 0 | 0 1 0 1 0 0 0 0 | 0 1 1 0 0 1 0 0
            "li=10"  reserved?      rs2=0101 opc=0000 rd=0110 rs=0100
```
`li` still means "read two bytes"! they're just not an immediate anymore. wild.

## opcode selectors move around!

in RXv3, with the new double-precision instructions, there is an interesting consistency decision to note...

consider the `{dadd,dsub,dmul}` encoding pattern of
```
0 1 1 1 0 1 1 0 | 1 0 0 1 0 0 0 0 | [ rs2 ] [ opc ] | [ rd  ] [ rs  ]
```
for these instructions, the exact opcode is chosen by the four `opc` bits in the low nibble of the third byte. sure, that's fine! one of the possible opcodes here is `dcmp`, whose condition is indicated by the value of `rd`. this means that `dcmp` is encoded like:

```
0 1 1 1 0 1 1 0 | 1 0 0 1 0 0 0 0 | [ rs2 ] [ opc ] | [ rd  ] [ rs  ]       
                                            opc=0111  rd=cm={.., UN, EQ, ..}
```
or, an instruction like `double-OP src, src2` and `dest` repurposed otherwise.

this is in contrast of other two-operand instructions like `dabs`, encoded like:
```
0 1 1 1 0 1 1 0 | 1 0 0 1 0 0 0 0 | [ rs  ] [ opc ] | [ rd  ] [ opc2]  
                                            opc=1100          opc2=0001
```
where the instruction has a skeleton more like `double-OP src, dest`, with `rs` being the repurposed field. this follows! the instruction no longer has two source operands, but does have a destination operand.

i'm deeply curious why `rs` is the repurposed field here, rather than `rs2`. in that case, the "opcode" would be the third byte in its entirety, which seems like a nice property on its own. alternatively, maybe keeping the semantics of register selector bits the same simplifies decoder hardware...

## float instruction encodings

the three-operand forms of float instructions have similar mappings from bits to opcodes, compared to scalar operations.

|bits|scalar|float|
|----|----|----|
|`0000`|`sub`|`fsub`|
|`0001`|`cmp`|`undef`|
|`0010`|`add`|`fadd`|
|`0011`|`mul`|`fmul`|

this does not continue to be the case for double-precision instructions, unfortunately. for those instructions, `0001` tends to select `dadd`, rather than leave space for a future `fcmp`.

## bitfields

`bfmov` and `bfmovz` include a triplet of immediates to describe "move N bits starting from bit A out of source and into dest at bit B". the manual then goes on to say,

> If (slsb + width) > 32 and (dlsb + width) > 32, then dest becomes undefined.

... but that implies that if only one of the two overflows, dest is well-defined somehow? i think the manual *means* `or` in that sentence, alas.