yaxpeax-x86/src/long_mode, branch x86-generic

remove a few duplicate impls, add stubs for geneirc translations

2023-01-02T16:50:23+00:00

generate_opcode.py has quickly grown into generating much more than just
opcode definitions, and now handles a few duplicate impls across the
different decode modes as well. some of the added impl generation
conflicts with still-existing hand-written impls from yore, so they
needed a bit of removing.

next will be the addition of a generic module for "probably what you
want" disassembly of x86, avoiding the 64-/32-/16-bitness of the
architecture family with an attempt to decode "probably what you wanted"
from a byte sequence. it needs a little more work still, but TODO stubs
added here support that new module.

codegen `Colorized` impl and normalize `name()` implementation

2023-01-02T16:50:23+00:00

unfortunately because of the layout of instruction information this
*adds* lines rather than removes them..

yax builds again with opcodes generated by type

2023-01-02T16:50:22+00:00

fix incorrect rex selection and field description offsets

2022-12-03T23:11:09+00:00

66 prefixes are common, 0f opcodes are common

2022-12-03T23:11:09+00:00

support a fast path through the decoder for [rex-prefixed]opcode insts

2022-12-03T23:11:09+00:00

the overwhelming majority of x86 instructions are either a single-byte
opcode or a single-byte opcode with a rex prefix. supporting these
specially means that we don't have to length-check on every byte or
go through the full decode loop while reading the most likely
instructions. this is a significant improvement on typical x86 streams,
but comes at a moderate penalty for crafted x86 instructions.

the penalty is still not very bad, as the fast path is exited in favor
of the full decode loop as soon as we see a non-rex prefix byte; this
adds maybe a dozen instructions to the slow path.

just a bit more code motion that seemed to help things sometimes

2022-12-03T23:11:09+00:00

reorder prefix checks, extract vex/evex prefix handling

2022-12-03T23:11:09+00:00

sharing vex/evex invalid prefix checks improves codegen a bit, but
ordering prefix checks by likeliest prefix first reduces time falling
through prefix handling arms. both together are a notable improvement in
throughput on typical x86 code.

bundled in here is some code motion to where `mem_size = 0` and
`operand_count = 2` are executed; this is because, at least on zen2 and
cascade lake parts, bunching all stores to the instruction together
caused small stalls getting into the decoder. spreading out stores seems
to mix these assignments with parts of code that was not using memory
anyway, and pipelines better.

move opcode lookup tables into const arrays

2022-12-03T23:11:09+00:00

cleanliness, but also slightly better codegen somehow?

replace size lookup logic with a LUT

2022-12-03T23:11:09+00:00

the match compiled into some indirect branch awfulness!! no thank you