Age | Commit message (Collapse) | Author |
|
cleanliness, but also slightly better codegen somehow?
|
|
the match compiled into some indirect branch awfulness!! no thank you
|
|
|
|
this includes a `Makefile` that exercises the various crate configs.
most annoyingly, several doc comments needed to grow
`#[cfg(feature="fmt")]` blocks so docs continue to build with that
feature enabled or disabled.
carved out a way to run exhaustive tests; they should be written as
`#[ignore]`, and then the makefile will run even ignored tests on the
expectation that this will run the exhaustive (but slower) suite.
exhaustive tests are not yet written. they'll probably involve spanning
4 byte sequences from 0 to 2^32-1.
|
|
Closes https://github.com/iximeow/yaxpeax-x86/issues/16
|
|
not only did the instruction have wrong data, but if displayed, the
formatter would panic.
|
|
in the process, fix 64-bit rex-byte limit, 32/16-bit mode mask reg limit
|
|
|
|
This makes generated docs refer to a type and show said type in the list of all structs rather than rustdoc showing gray text in return types.
quote doc references
|
|
|
|
|
|
|
|
this gets yaxpeax-x86 in no-inline configurations back to building as it
did before, but is quite a blunt hammer. it seems that extra calls to
`sink.record` trips the inlining thresholds for `read_with_annotation`,
and then its caller, and its caller, even when one of them is just a
delegation to its inner call.
this is particularly unfortunate because yaxpeax-x86 is now making a
decision about the inlining of a rather large function at the public
edge of its API, but these attributes match the inlining decisions that
LLVM was making before adding `DescriptionSink`. hopefully not too bad.
not sure how to handle this in the future.
|
|
|
|
|
|
|
|
even though NullSink is no-ops, it causes llvm to not inline this function, for a net perf reduction
|
|
|
|
|
|
|
|
these instructions had memory sizes reported for the operand, if it was
a memory operand, but for versions with non-memory operands the decoded
`Instruction` would imply that non memory access would happen at all.
now, decoded instructions in these cases will report a more useful
memory size.
|
|
and ip/flags
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
not that xop will ever be wanted, rip
|
|
|
|
this profiles slightly better? not entirely sure why...
|
|
|
|
|
|
|
|
|
|
|
|
|
|
the evex route would allow "valid" instructions that have the opcode
`invalid`. this is.. not correct.
|
|
|
|
at least on my zen2.
when reading prefixes, optimize for the likely case of reading an
instruction rather than an invalid run of prefixes. checking if we've
exceeded the x86 length bound immediately after reading the byte is only
a benefit if we'd otherwise read an impossibly-long instruction; in this
case we can exit exactly at prefix byte 15 rather than potentially later
at byte 16 (assuming a one-byte instruction like `c3`), or byte ~24 (a
more complex store with immediate and displacement).
these casese are extremely unlikely in practice. more likely is that
reading a prefix byte is one of the first two or three bytes in an
instruction, and we will never benefit from checking the x86 length
bound at this point. instead, only check length bounds after decoding
the entire instruction. this penalizes the slowest path through the
decoder but speeds up the likely path about 5% on my zen2 processor.
additionally, begin reading instruction bytes as soon as we enter the
decoder, and before initial clearing of instruction data. again, this is
for zen2 pipeline reasons. reading the first byte and corresponding
`OPCODES` entry improves the odds that this data is available by the
time we check for `Interpretation::Prefix` in the opcode scanning
loop. then, if we did *not* load an instruction, we immediately know
another byte must be read; begin reading this byte before applying `rex`
prefixes, and as soon as a prefix is known to not be one of the
escape-code prefix byte (c5, c4, 62, 0f). this clocked in at another ~5%
in total.
i've found that `read_volatile` is necessary to force rust to begin the
loadwhere it's written, rather than reordering it over other data. i'm
not committed to this being a guaranteed truth.
also, don't bother checking for `Invalid`. again, `Opcode::Invalid` is a
relatively unlikely path through the decoder and `Nothing` is already
optiimized for `None` cases. this appears to be another small improvement
in throughput but i wouldn't want to give it a number - it was
relatively small and may not be attributable to this effect.
|
|
|
|
|