From d8083b08dc987adeda73fb13298383c6cf519596 Mon Sep 17 00:00:00 2001 From: iximeow Date: Fri, 15 Jan 2021 18:15:04 -0800 Subject: small perf tweaks clearing reg_rrr and reg_mmm more efficiently is an extremely small win, but a win read_imm_signed generally should inline well and runs afoul of some heuristic. inlining gets about 8% improved throughput on the (unrealistic) in-repo benchmark it would be great to be able to avoid bounds checks somehow; it looks like they alone are another ~10% of decode time. i'm not sure how to pull that off while retaining the generic iterator parameter. might just not be possible. --- CHANGELOG | 2 ++ 1 file changed, 2 insertions(+) (limited to 'CHANGELOG') diff --git a/CHANGELOG b/CHANGELOG index 1325597..c608941 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -3,6 +3,8 @@ - AMD-only `monitorx`, `mwaitx`, `clzero`, and `rdpru` are now supported - `swapgs` is invalid in non-64-bit modes - `rdpkru` and `wrpkru` were incorrectly decoded when modrm bits were not `11` +* small performance tweaks. read_imm_signed is now inline(always) and some + pre-decode initialization is a bit better-packed ## 0.1.4 * [long mode only]: fix decoding of rex-prefixed modrm+sib operands selecting index 0b100 and base 0b101 -- cgit v1.1