Age | Commit message (Collapse) | Author |
|
this measures a bit faster. it doesn't seem like it should be. the rex
prefix checks compile identically but move a lea for a later expression
up and pipelines better?
|
|
also remove redundant assignments of operand_count and some OperandSpec,
bulk-assign all registers and operands on entry to `read_instr`. this
all, taken together, shaves off about 7 cycles per decode.
|
|
|
|
|
|
|
|
|
|
|
|
also some long-mode cleanup in corresponding areas
|
|
|
|
|
|
|
|
|
|
|
|
i really didnt know rust could do this
|
|
|
|
instructions
|
|
|
|
|
|
in the future these can and will change (new operands, new instructions) and i would prefer they not be major breaking changes. applications can ignore them and probably do undesired variants anyway.
if you want to write a 1120-variant match, are you me? why would you do this
|
|
the in-repo benchmark got better with this inlined but it's probably
better to leave it up to the compiler when finally stitching stuff
together. i suspect that having read_operands inlined resulted in just
too many live values, and the compiler was inspired to play hijinks that
pipelined poorly. disas-bench shows a ~15% improvement from this change.
|
|
|
|
vmov* are.. somehow messed up too
|
|
does intel know no bounds
|
|
|
|
|
|
|
|
|
|
decoder flag to come
|
|
this is... a more significant rewrite than i expected yaxpeax-x86 to
ever need. it turns out that capstone is extremely permissive about
duplicative 66/f2/f3 prefixes to the point that the implemented prefex
handling was unsalvageable.
while this replaces the *0f* opcode tables, i haven't profiled these
changes. it's possible this is a net improvement for single-byte
opcodes, it could be a net loss. code size may be severely impacted.
there is still work to do.
but this in total gets very close to iced/xed/zydis parity, far more
than before.
also adds several small extensions, gfni, 3dnow, enqcmd, invpcid, some
of cet, and a few missing avx instructions.
|
|
|
|
|
|
initial work to optionally discard any instruction printing support
when using `-Z build-std` to fully remove .eh_frame, a stripped
long_mode_no_fmt .so is 61kb!
|
|
|
|
clearing reg_rrr and reg_mmm more efficiently is an extremely small win,
but a win
read_imm_signed generally should inline well and runs afoul of some
heuristic. inlining gets about 8% improved throughput on the
(unrealistic) in-repo benchmark
it would be great to be able to avoid bounds checks somehow; it looks
like they alone are another ~10% of decode time. i'm not sure how to
pull that off while retaining the generic iterator parameter. might just
not be possible.
|
|
* `mwaitx`, `monitorx`, `rdpru`, and `clzero` are now supported
* swapgs is no longer decoded in protected mode
* rdpkru and wrpkru are no longer decoded if mod bits != 11
|
|
base 0b101
for memory operands with a base, index, and displacement either
the wrong base would be selected (register number ignored, so only
`*ax` or `r8*` would be reported), or yaxpeax-x86 would report a
base register is present when it is not (`RegIndexBaseScaleDisp`
when the operand is actually `RegScaleDisp`)
thank you to Evan Johnson for catching and reporting this bug!
also bump crate version to 0.1.4 as this will be immediately tagged and
released.
|
|
|
|
|
|
also bump to 0.1.1
|
|
add doc comments for public items, record changelog, and lets ship this!!
|
|
`OperandCode` (obviously) wildly varies depending on how i feel on a
given week, so it's now hidden to avoid people depending on numerical
values of its discriminants.
`RegisterBank` got a similar treatment with a new `RegisterClass` struct
that's suitable for public use.
|
|
|
|
|
|
|
|
rep_any will get speculated `false` quite quickly, whereas checking if
the opcode is a string instruction will be costly no matter what. in the
rare case rep_any is true, i don't care how costly displaying the
instruction is - string instructions are relatively rare, and rep movs
is typically not more than one instance when it shows up.
|
|
the arms of the match in regspec_label referenced tables that were not
const. consequently, they would be rebuilt when reached, every time the
match is incanted. this holds through even when regspec_label is
inlined.
each arm could be a const array for a small and easy change, but to
avoid the indirect dispatch on spec.bank i've reorganized register names
into a single const array and selected values for `RegisterBank` such
that indices into that array can be formed.
for my next trick, i may make `REG_NAMES` a `*const u8`, with indices
picking offsets into the table - 8-byte offsets might do? this should
compact down size a little more by removing a pointer and size qword
for each string.
|
|
likely operands are now also required to have contiguous special cases
0..31. this is to avoid generating a massively sparse jump table for no
reason twice - once for unlikely_operands is quite enough as-is.
this will undoubtedly be a wildly annoying maintenance burden. if this
pans out (initial expiriments suggest it might) then maybe a macro will
do...
|
|
|
|
This reverts commit 21cc850afc108c147871c70240eda62ad13f34e0.
|
|
|