aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-07-04new record: 50.56cpi (2290ms)iximeow
2023-07-04new perf record: 50.79cpi (2316ms)iximeow
2023-07-04best: 54.3cpi (2512ms)iximeow
2023-07-04new perf record: 51.88cpi (2363ms)iximeow
2023-07-04wipiximeow
2023-07-04more micro-opts...iximeow
set_embedded_instructions was unnecessarily appilied to many operand codes; this was never a correctness issue, but meant many operand decodings took a few more instruction than necessary to do nothing. setting all registers to `rax` is unnecessary, only the first register's defaulting to `rax` is effectual. this allows for not using a movabs to load initial rax state. adjust vex decoder inlining. this will be followed up by some cleanup for vex operand codes.
2023-07-04move some unlikely checks behind a branch that implies their possibilityiximeow
slightly fewer (perfectly predicted anyway) branches this way
2023-07-04fidget with read_E inlining AGAINiximeow
2023-07-04make operandcode 16b againiximeow
2023-07-04line up Opcode values for simple translation from opc bytesiximeow
2023-07-04fixup: handle mnemonic ordering tooiximeow
2023-07-04avoid committing values to instructions until necessary, likely opc tweaksiximeow
2023-07-04make base opcode map translation a bit simpleriximeow
now the bits line up with enum variants directly (hopefully..)
2023-07-04store non-rex expected bank when first witnessing operand size prefixiximeow
the expectation here is that we can set a default `vqp_size` pretty cheaply (Prefixes::new is one store, on x86_64 anyway...). then, when we see an `operand_size` prefix, it's rare enough we can pay a little extra to speculate on *likely* implication, and update some state (`vqp_size` is *probably* going to be 2 because of it) accordingly. the cases where `vqp_size` would go unused and this was wasted effort are relativlely rare. on the other hand, we can't profitably give `rex` this treatment: `rex.w` would set `vqp_size` to `qword`, but rex-prefixed instructions are so often byte-size registers that updating `vqp_size` (conditionally, no less), is only break-even. so, keep a check for `rex.w` at use site, where it's only a choice between `qword` or `whatver-size-a-non-rex.w-prefixed-instruction-would-be-sized`, which has been kept up to date by speculation when detecting `operand_size`.
2023-07-04fix some dancing between bank size and RegisterBank enum valuesiximeow
in the process, fixed a decoding bug dealing with a0/a1/a2/a3 movs (respected rex.b when rex.b should have been ignored) this seems to maybe improve runtime ever so slightly, but this is really meant as a cleanup commit more than anything.
2023-07-04pick useful numeric values for RegisterBankiximeow
these coincidentally have the general-purpose banks (rB excepted) matching their size in bytes
2023-07-04OperandCode as a u16 caused gross movzwl, this seems just a bit betteriximeow
2023-07-04try slimming down read_opc_hotpath moreiximeow
2023-07-04found an awk ci bugiximeow
2023-07-04disable goodfile builds for benchmakr purposesiximeow
there are a few test breakages i need to go fix now
2023-07-04goodfile: uses steps, dependencies interfaceiximeow
2023-03-05add `Opcode::is_jcc`, `Opcode::is_setcc`, and `Opcode::is_cmovcc` helpersiximeow
this request/suggestion comes from [github](https://github.com/iximeow/yaxpeax-x86/issues/29)! thank you!
2023-02-19deprecate `pub fn cs`, which is an intensely embarrassing bug of a functioniximeow
unlike every other function to test if a particular selector is picked by prefixes, `Prefixes::cs` does not return bool, nor does it check the currently-selected prefix. instead, it modifies the decoded `Prefixes` to set the current prefix to `cs`. this has been a bug all the way since 0.0.1 was released. the function now does nothing, and is marked deprecated. in a future 2.x release, the function will be changed to return `bool` and be in-line with other segment selector-checking functions. in the mean time, a new `Prefixes::selects_cs()` does the correct thing. thank you to @meithecatte who pointed this out in https://github.com/iximeow/yaxpeax-x86/issues/28!
2023-01-02do benchmarking in ci tooiximeow
2023-01-02add a goodfile, will this.. work?iximeow
2022-12-24update old yaxpeax-arch versions in ffi crates to compatible versionsiximeow
2022-12-03bump Cargo.toml to 1.1.51.1.5iximeow
2022-12-03include typo fixes in the changelog!iximeow
2022-12-03describe optimizations included in 1.1.5iximeow
2022-12-03roll up decoding loop changes for 16-bit and 32-bit decodersiximeow
this applies * f338c74656f6eef8b3080fa9f249b1cb733fd1a9 * bece19e6a69b158893abbf56a6cac25eb25d9a32 * 6353f58170d28a142e3b012c2c86f684d50dea45 * 67be1c0983244645a3c762b7aa0601f0d0ba4bb3 * 091f1d66ef853d6339a96e43d71c137ee7d3907a as one unit to both the 16-bit and 32-bit decoders.
2022-12-03apply e7f49509 to 16-bit and 32-bit decodersiximeow
2022-12-03apply 2444de11 to 16-bit and 32-bit decodersiximeow
these don't need the extra `rex`-supporting index space, so they don't have it.
2022-12-03fix incorrect rex selection and field description offsetsiximeow
2022-12-0366 prefixes are common, 0f opcodes are commoniximeow
2022-12-03support a fast path through the decoder for [rex-prefixed]opcode instsiximeow
the overwhelming majority of x86 instructions are either a single-byte opcode or a single-byte opcode with a rex prefix. supporting these specially means that we don't have to length-check on every byte or go through the full decode loop while reading the most likely instructions. this is a significant improvement on typical x86 streams, but comes at a moderate penalty for crafted x86 instructions. the penalty is still not very bad, as the fast path is exited in favor of the full decode loop as soon as we see a non-rex prefix byte; this adds maybe a dozen instructions to the slow path.
2022-12-03just a bit more code motion that seemed to help things sometimesiximeow
2022-12-03reorder prefix checks, extract vex/evex prefix handlingiximeow
sharing vex/evex invalid prefix checks improves codegen a bit, but ordering prefix checks by likeliest prefix first reduces time falling through prefix handling arms. both together are a notable improvement in throughput on typical x86 code. bundled in here is some code motion to where `mem_size = 0` and `operand_count = 2` are executed; this is because, at least on zen2 and cascade lake parts, bunching all stores to the instruction together caused small stalls getting into the decoder. spreading out stores seems to mix these assignments with parts of code that was not using memory anyway, and pipelines better.
2022-12-03move opcode lookup tables into const arraysiximeow
cleanliness, but also slightly better codegen somehow?
2022-12-03replace size lookup logic with a LUTiximeow
the match compiled into some indirect branch awfulness!! no thank you
2022-09-23Fix some typos.Bruce Mitchener
2022-05-30pshufb annotations use incorrect register banks (for now?)iximeow
the correct bank is applied far after register numbers are read. a correct annotation would need to know to defer emission until setting register banks, but also would need to work backwards for the number of bits between the current byte and modrm. not impossible, but substantial refactoring.
2022-05-07more annotation fixes?iximeow
2022-05-01add testing setup for field descriptionsiximeow
2022-04-30support 0x9a callf in 16/32-bit modesiximeow
2022-04-24fix a few issues preventing no-std builds from ... buildingiximeow
this includes a `Makefile` that exercises the various crate configs. most annoyingly, several doc comments needed to grow `#[cfg(feature="fmt")]` blocks so docs continue to build with that feature enabled or disabled. carved out a way to run exhaustive tests; they should be written as `#[ignore]`, and then the makefile will run even ignored tests on the expectation that this will run the exhaustive (but slower) suite. exhaustive tests are not yet written. they'll probably involve spanning 4 byte sequences from 0 to 2^32-1.
2022-01-12fuzz DisplayStyle::C and fix corresponding issues1.1.4iximeow
2022-01-02update changelogiximeow
2022-01-02fix incorrect decoder used in docs testiximeow
2022-01-02actually include a linkiximeow
2022-01-02explicit inline annotations for kinda_uncheckedsiximeow
unfortunately something about the wrapper functions adjusts codegen even when the wrapper functions themselves are just calls to inner functions. the in-tree benchmark (known to not be comprehensive, but enough to spot a difference), showed a ~3.5% regression in throughput with the prior commit, even though it doesn't change behavior at all. explicit #[inline(always)] gets things to a state where the wrapper functions do not penalize performance. for an example of the differences in codegen, see below. before: ``` < 141d4: 48 39 fa cmp %rdi,%rdx < 141d7: 0f 84 0b 4d 00 00 je 18ee8 <_ZN5bench16do_decode_swathe17h694154735739ce4cE+0x4e58> < 141dd: 0f b6 0f movzbl (%rdi),%ecx < 141e0: 48 83 c7 01 add $0x1,%rdi < 141e4: 48 89 7c 24 38 mov %rdi,0x38(%rsp) ... snip ... ``` after: ``` > 141d4: 48 39 ea cmp %rbp,%rdx > 141d7: 0f 84 97 4c 00 00 je 18e74 <_ZN5bench16do_decode_swathe17h694154735739ce4cE+0x4de4> > 141dd: 0f b6 4d 00 movzbl 0x0(%rbp),%ecx > 141e1: 48 83 c5 01 add $0x1,%rbp > 141e5: 48 89 6c 24 38 mov %rbp,0x38(%rsp) ... snip ... ``` there are several spans of code with this kind of change involved; there are no explicit calls to `get_kinda_unchecked` or `unreachable_kinda_unchecked` but clearly a difference did make it through to the benchmark's code. while the choice of `rbp` instead of `rdi` wouldn't seem very interesting, the instructions themselves are more substantially different. `0fb60f` vs `0fb64d00`; to encode `[rbp + 0]`, the instruction requires a displacement, and is one byte longer as a result. there are several instructions so-impacted, and i suspect the increased code size is what ended up changing benchmark behavior. after adding these `#[inline(always)]` annotations, there is no difference in generated code with or without the `kinda_unchecked` helpers!