yaxpeax-x86 - yaxpeax x86 decoder

Age	Commit message (Collapse)	Author
2023-07-05	re-correct operand order of movdq2q	iximeow

2023-07-04	more read_E hoisting	iximeow

2023-07-04	regalloc magic? no useful diff but better perf. 49.61cpi (2233ms)	iximeow

2023-07-04	two more test cases	iximeow

2023-07-04	incidental cleanup, see if inlining in evex helps/hurts (it hurts)	iximeow

2023-07-04	fix xbegin/xend (broken in DecodeCtx::rrr)	iximeow

2023-07-04	finally delete top-level modrm (50.10cpi, 2322ms)	iximeow

2023-07-04	begin project to hoist all read_E (perf better again! 50.21cpi)	iximeow

2023-07-04	fix f6 test imm lengths (perf regression :( )	iximeow

2023-07-04	new high score 49.89cpi (2259ms)	iximeow
	vex/rex prefix cleanup, finally profitable to inline read_0f*_opcode
2023-07-04	more read_E cleanup	iximeow

2023-07-04	new struct for temporary decode context (prefix management)	iximeow

2023-07-04	new record: 50.56cpi (2290ms)	iximeow

2023-07-04	new perf record: 50.79cpi (2316ms)	iximeow

2023-07-04	best: 54.3cpi (2512ms)	iximeow

2023-07-04	new perf record: 51.88cpi (2363ms)	iximeow

2023-07-04	wip	iximeow

2023-07-04	more micro-opts...	iximeow
	set_embedded_instructions was unnecessarily appilied to many operand codes; this was never a correctness issue, but meant many operand decodings took a few more instruction than necessary to do nothing. setting all registers to `rax` is unnecessary, only the first register's defaulting to `rax` is effectual. this allows for not using a movabs to load initial rax state. adjust vex decoder inlining. this will be followed up by some cleanup for vex operand codes.
2023-07-04	move some unlikely checks behind a branch that implies their possibility	iximeow
	slightly fewer (perfectly predicted anyway) branches this way
2023-07-04	fidget with read_E inlining AGAIN	iximeow

2023-07-04	make operandcode 16b again	iximeow

2023-07-04	line up Opcode values for simple translation from opc bytes	iximeow

2023-07-04	fixup: handle mnemonic ordering too	iximeow

2023-07-04	avoid committing values to instructions until necessary, likely opc tweaks	iximeow

2023-07-04	make base opcode map translation a bit simpler	iximeow
	now the bits line up with enum variants directly (hopefully..)
2023-07-04	store non-rex expected bank when first witnessing operand size prefix	iximeow
	the expectation here is that we can set a default `vqp_size` pretty cheaply (Prefixes::new is one store, on x86_64 anyway...). then, when we see an `operand_size` prefix, it's rare enough we can pay a little extra to speculate on likely implication, and update some state (`vqp_size` is probably going to be 2 because of it) accordingly. the cases where `vqp_size` would go unused and this was wasted effort are relativlely rare. on the other hand, we can't profitably give `rex` this treatment: `rex.w` would set `vqp_size` to `qword`, but rex-prefixed instructions are so often byte-size registers that updating `vqp_size` (conditionally, no less), is only break-even. so, keep a check for `rex.w` at use site, where it's only a choice between `qword` or `whatver-size-a-non-rex.w-prefixed-instruction-would-be-sized`, which has been kept up to date by speculation when detecting `operand_size`.
2023-07-04	fix some dancing between bank size and RegisterBank enum values	iximeow
	in the process, fixed a decoding bug dealing with a0/a1/a2/a3 movs (respected rex.b when rex.b should have been ignored) this seems to maybe improve runtime ever so slightly, but this is really meant as a cleanup commit more than anything.
2023-07-04	pick useful numeric values for RegisterBank	iximeow
	these coincidentally have the general-purpose banks (rB excepted) matching their size in bytes
2023-07-04	OperandCode as a u16 caused gross movzwl, this seems just a bit better	iximeow

2023-07-04	try slimming down read_opc_hotpath more	iximeow

2023-07-04	found an awk ci bug	iximeow

2023-07-04	disable goodfile builds for benchmakr purposes	iximeow
	there are a few test breakages i need to go fix now
2023-07-04	goodfile: uses steps, dependencies interface	iximeow

2023-03-05	add `Opcode::is_jcc`, `Opcode::is_setcc`, and `Opcode::is_cmovcc` helpers	iximeow
	this request/suggestion comes from [github](https://github.com/iximeow/yaxpeax-x86/issues/29)! thank you!
2023-02-19	deprecate `pub fn cs`, which is an intensely embarrassing bug of a function	iximeow
	unlike every other function to test if a particular selector is picked by prefixes, `Prefixes::cs` does not return bool, nor does it check the currently-selected prefix. instead, it modifies the decoded `Prefixes` to set the current prefix to `cs`. this has been a bug all the way since 0.0.1 was released. the function now does nothing, and is marked deprecated. in a future 2.x release, the function will be changed to return `bool` and be in-line with other segment selector-checking functions. in the mean time, a new `Prefixes::selects_cs()` does the correct thing. thank you to @meithecatte who pointed this out in https://github.com/iximeow/yaxpeax-x86/issues/28!
2023-01-02	do benchmarking in ci too	iximeow

2023-01-02	add a goodfile, will this.. work?	iximeow

2022-12-24	update old yaxpeax-arch versions in ffi crates to compatible versions	iximeow

2022-12-03	bump Cargo.toml to 1.1.51.1.5	iximeow

2022-12-03	include typo fixes in the changelog!	iximeow

2022-12-03	describe optimizations included in 1.1.5	iximeow

2022-12-03	roll up decoding loop changes for 16-bit and 32-bit decoders	iximeow
	this applies * f338c74656f6eef8b3080fa9f249b1cb733fd1a9 * bece19e6a69b158893abbf56a6cac25eb25d9a32 * 6353f58170d28a142e3b012c2c86f684d50dea45 * 67be1c0983244645a3c762b7aa0601f0d0ba4bb3 * 091f1d66ef853d6339a96e43d71c137ee7d3907a as one unit to both the 16-bit and 32-bit decoders.
2022-12-03	apply e7f49509 to 16-bit and 32-bit decoders	iximeow

2022-12-03	apply 2444de11 to 16-bit and 32-bit decoders	iximeow
	these don't need the extra `rex`-supporting index space, so they don't have it.
2022-12-03	fix incorrect rex selection and field description offsets	iximeow

2022-12-03	66 prefixes are common, 0f opcodes are common	iximeow

2022-12-03	support a fast path through the decoder for [rex-prefixed]opcode insts	iximeow
	the overwhelming majority of x86 instructions are either a single-byte opcode or a single-byte opcode with a rex prefix. supporting these specially means that we don't have to length-check on every byte or go through the full decode loop while reading the most likely instructions. this is a significant improvement on typical x86 streams, but comes at a moderate penalty for crafted x86 instructions. the penalty is still not very bad, as the fast path is exited in favor of the full decode loop as soon as we see a non-rex prefix byte; this adds maybe a dozen instructions to the slow path.
2022-12-03	just a bit more code motion that seemed to help things sometimes	iximeow

2022-12-03	reorder prefix checks, extract vex/evex prefix handling	iximeow
	sharing vex/evex invalid prefix checks improves codegen a bit, but ordering prefix checks by likeliest prefix first reduces time falling through prefix handling arms. both together are a notable improvement in throughput on typical x86 code. bundled in here is some code motion to where `mem_size = 0` and `operand_count = 2` are executed; this is because, at least on zen2 and cascade lake parts, bunching all stores to the instruction together caused small stalls getting into the decoder. spreading out stores seems to mix these assignments with parts of code that was not using memory anyway, and pipelines better.
2022-12-03	move opcode lookup tables into const arrays	iximeow
	cleanliness, but also slightly better codegen somehow?