Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
|
|
This reverts commit 15c821a2d3fbf2fc0458090b6cc12f2ac093f075.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
not a huge improvement, but something
|
|
these instructions ignored rex bits even for xmm reigsters, which is
incorrect (so says xed)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
vex/rex prefix cleanup, finally profitable to inline read_0f*_opcode
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
set_embedded_instructions was unnecessarily appilied to many operand
codes; this was never a correctness issue, but meant many operand
decodings took a few more instruction than necessary to do nothing.
setting all registers to `rax` is unnecessary, only the first register's
defaulting to `rax` is effectual. this allows for not using a movabs to
load initial rax state.
adjust vex decoder inlining. this will be followed up by some cleanup
for vex operand codes.
|
|
slightly fewer (perfectly predicted anyway) branches this way
|
|
|
|
|
|
|
|
|
|
now the bits line up with enum variants directly (hopefully..)
|
|
the expectation here is that we can set a default `vqp_size` pretty
cheaply (Prefixes::new is one store, on x86_64 anyway...). then, when we
see an `operand_size` prefix, it's rare enough we can pay a little extra
to speculate on *likely* implication, and update some state (`vqp_size`
is *probably* going to be 2 because of it) accordingly. the cases where
`vqp_size` would go unused and this was wasted effort are relativlely
rare.
on the other hand, we can't profitably give `rex` this treatment:
`rex.w` would set `vqp_size` to `qword`, but rex-prefixed instructions
are so often byte-size registers that updating `vqp_size`
(conditionally, no less), is only break-even. so, keep a check for
`rex.w` at use site, where it's only a choice between `qword` or
`whatver-size-a-non-rex.w-prefixed-instruction-would-be-sized`, which
has been kept up to date by speculation when detecting `operand_size`.
|
|
in the process, fixed a decoding bug dealing with a0/a1/a2/a3 movs
(respected rex.b when rex.b should have been ignored)
this seems to maybe improve runtime ever so slightly, but this is really
meant as a cleanup commit more than anything.
|
|
these coincidentally have the general-purpose banks (rB excepted) matching their size in bytes
|
|
|
|
|
|
this request/suggestion comes from
[github](https://github.com/iximeow/yaxpeax-x86/issues/29)! thank you!
|
|
unlike every other function to test if a particular selector is picked
by prefixes, `Prefixes::cs` does not return bool, nor does it check the
currently-selected prefix. instead, it modifies the decoded `Prefixes`
to set the current prefix to `cs`.
this has been a bug all the way since 0.0.1 was released. the function
now does nothing, and is marked deprecated.
in a future 2.x release, the function will be changed to return `bool`
and be in-line with other segment selector-checking functions. in the
mean time, a new `Prefixes::selects_cs()` does the correct thing.
thank you to @meithecatte who pointed this out in
https://github.com/iximeow/yaxpeax-x86/issues/28!
|
|
|
|
|
|
the overwhelming majority of x86 instructions are either a single-byte
opcode or a single-byte opcode with a rex prefix. supporting these
specially means that we don't have to length-check on every byte or
go through the full decode loop while reading the most likely
instructions. this is a significant improvement on typical x86 streams,
but comes at a moderate penalty for crafted x86 instructions.
the penalty is still not very bad, as the fast path is exited in favor
of the full decode loop as soon as we see a non-rex prefix byte; this
adds maybe a dozen instructions to the slow path.
|
|
|
|
sharing vex/evex invalid prefix checks improves codegen a bit, but
ordering prefix checks by likeliest prefix first reduces time falling
through prefix handling arms. both together are a notable improvement in
throughput on typical x86 code.
bundled in here is some code motion to where `mem_size = 0` and
`operand_count = 2` are executed; this is because, at least on zen2 and
cascade lake parts, bunching all stores to the instruction together
caused small stalls getting into the decoder. spreading out stores seems
to mix these assignments with parts of code that was not using memory
anyway, and pipelines better.
|
|
cleanliness, but also slightly better codegen somehow?
|