From d8083b08dc987adeda73fb13298383c6cf519596 Mon Sep 17 00:00:00 2001
From: iximeow <me@iximeow.net>
Date: Fri, 15 Jan 2021 18:15:04 -0800
Subject: small perf tweaks

clearing reg_rrr and reg_mmm more efficiently is an extremely small win,
but a win

read_imm_signed generally should inline well and runs afoul of some
heuristic. inlining gets about 8% improved throughput on the
(unrealistic) in-repo benchmark

it would be great to be able to avoid bounds checks somehow; it looks
like they alone are another ~10% of decode time. i'm not sure how to
pull that off while retaining the generic iterator parameter. might just
not be possible.
---
 CHANGELOG | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'CHANGELOG')

diff --git a/CHANGELOG b/CHANGELOG
index 1325597..c608941 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -3,6 +3,8 @@
   - AMD-only `monitorx`, `mwaitx`, `clzero`, and `rdpru` are now supported
   - `swapgs` is invalid in non-64-bit modes
   - `rdpkru` and `wrpkru` were incorrectly decoded when modrm bits were not `11`
+* small performance tweaks. read_imm_signed is now inline(always) and some
+  pre-decode initialization is a bit better-packed
 
 ## 0.1.4
 * [long mode only]: fix decoding of rex-prefixed modrm+sib operands selecting index 0b100 and base 0b101
-- 
cgit v1.1