<feed xmlns='http://www.w3.org/2005/Atom'>
<title>yaxpeax-x86, branch opts</title>
<subtitle>yaxpeax x86 decoder</subtitle>
<link rel='alternate' type='text/html' href='http://git.iximeow.net/yaxpeax-x86/'/>
<entry>
<title>describe optimizations included in 1.1.5</title>
<updated>2022-12-03T23:03:13+00:00</updated>
<author>
<name>iximeow</name>
<email>me@iximeow.net</email>
</author>
<published>2022-12-03T23:03:13+00:00</published>
<link rel='alternate' type='text/html' href='http://git.iximeow.net/yaxpeax-x86/commit/?id=8eef2114ece0b4a96866f075e87f195a804d61cb'/>
<id>8eef2114ece0b4a96866f075e87f195a804d61cb</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>roll up decoding loop changes for 16-bit and 32-bit decoders</title>
<updated>2022-12-03T22:53:40+00:00</updated>
<author>
<name>iximeow</name>
<email>me@iximeow.net</email>
</author>
<published>2022-12-03T22:53:40+00:00</published>
<link rel='alternate' type='text/html' href='http://git.iximeow.net/yaxpeax-x86/commit/?id=64abb4e439230c8b4b8a6534989784e362efb12d'/>
<id>64abb4e439230c8b4b8a6534989784e362efb12d</id>
<content type='text'>
this applies
* f338c74656f6eef8b3080fa9f249b1cb733fd1a9
* bece19e6a69b158893abbf56a6cac25eb25d9a32
* 6353f58170d28a142e3b012c2c86f684d50dea45
* 67be1c0983244645a3c762b7aa0601f0d0ba4bb3
* 091f1d66ef853d6339a96e43d71c137ee7d3907a

as one unit to both the 16-bit and 32-bit decoders.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
this applies
* f338c74656f6eef8b3080fa9f249b1cb733fd1a9
* bece19e6a69b158893abbf56a6cac25eb25d9a32
* 6353f58170d28a142e3b012c2c86f684d50dea45
* 67be1c0983244645a3c762b7aa0601f0d0ba4bb3
* 091f1d66ef853d6339a96e43d71c137ee7d3907a

as one unit to both the 16-bit and 32-bit decoders.
</pre>
</div>
</content>
</entry>
<entry>
<title>apply e7f49509 to 16-bit and 32-bit decoders</title>
<updated>2022-12-03T22:24:27+00:00</updated>
<author>
<name>iximeow</name>
<email>me@iximeow.net</email>
</author>
<published>2022-12-03T22:24:27+00:00</published>
<link rel='alternate' type='text/html' href='http://git.iximeow.net/yaxpeax-x86/commit/?id=e871aa62141cd29a76c8901fbe97912142226d4a'/>
<id>e871aa62141cd29a76c8901fbe97912142226d4a</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>apply 2444de11 to 16-bit and 32-bit decoders</title>
<updated>2022-12-03T22:12:38+00:00</updated>
<author>
<name>iximeow</name>
<email>me@iximeow.net</email>
</author>
<published>2022-12-03T22:12:38+00:00</published>
<link rel='alternate' type='text/html' href='http://git.iximeow.net/yaxpeax-x86/commit/?id=c549f10115e727a6c0694e1ca59704826910e165'/>
<id>c549f10115e727a6c0694e1ca59704826910e165</id>
<content type='text'>
these don't need the extra `rex`-supporting index space, so they don't
have it.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
these don't need the extra `rex`-supporting index space, so they don't
have it.
</pre>
</div>
</content>
</entry>
<entry>
<title>fix incorrect rex selection and field description offsets</title>
<updated>2022-05-30T18:18:22+00:00</updated>
<author>
<name>iximeow</name>
<email>me@iximeow.net</email>
</author>
<published>2022-05-01T20:53:51+00:00</published>
<link rel='alternate' type='text/html' href='http://git.iximeow.net/yaxpeax-x86/commit/?id=091f1d66ef853d6339a96e43d71c137ee7d3907a'/>
<id>091f1d66ef853d6339a96e43d71c137ee7d3907a</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>66 prefixes are common, 0f opcodes are common</title>
<updated>2022-05-30T18:18:22+00:00</updated>
<author>
<name>iximeow</name>
<email>me@iximeow.net</email>
</author>
<published>2022-04-23T02:49:43+00:00</published>
<link rel='alternate' type='text/html' href='http://git.iximeow.net/yaxpeax-x86/commit/?id=67be1c0983244645a3c762b7aa0601f0d0ba4bb3'/>
<id>67be1c0983244645a3c762b7aa0601f0d0ba4bb3</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>support a fast path through the decoder for [rex-prefixed]opcode insts</title>
<updated>2022-05-30T18:18:22+00:00</updated>
<author>
<name>iximeow</name>
<email>me@iximeow.net</email>
</author>
<published>2022-04-21T09:35:38+00:00</published>
<link rel='alternate' type='text/html' href='http://git.iximeow.net/yaxpeax-x86/commit/?id=6353f58170d28a142e3b012c2c86f684d50dea45'/>
<id>6353f58170d28a142e3b012c2c86f684d50dea45</id>
<content type='text'>
the overwhelming majority of x86 instructions are either a single-byte
opcode or a single-byte opcode with a rex prefix. supporting these
specially means that we don't have to length-check on every byte or
go through the full decode loop while reading the most likely
instructions. this is a significant improvement on typical x86 streams,
but comes at a moderate penalty for crafted x86 instructions.

the penalty is still not very bad, as the fast path is exited in favor
of the full decode loop as soon as we see a non-rex prefix byte; this
adds maybe a dozen instructions to the slow path.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
the overwhelming majority of x86 instructions are either a single-byte
opcode or a single-byte opcode with a rex prefix. supporting these
specially means that we don't have to length-check on every byte or
go through the full decode loop while reading the most likely
instructions. this is a significant improvement on typical x86 streams,
but comes at a moderate penalty for crafted x86 instructions.

the penalty is still not very bad, as the fast path is exited in favor
of the full decode loop as soon as we see a non-rex prefix byte; this
adds maybe a dozen instructions to the slow path.
</pre>
</div>
</content>
</entry>
<entry>
<title>just a bit more code motion that seemed to help things sometimes</title>
<updated>2022-05-30T18:18:21+00:00</updated>
<author>
<name>iximeow</name>
<email>me@iximeow.net</email>
</author>
<published>2022-04-21T09:35:09+00:00</published>
<link rel='alternate' type='text/html' href='http://git.iximeow.net/yaxpeax-x86/commit/?id=bece19e6a69b158893abbf56a6cac25eb25d9a32'/>
<id>bece19e6a69b158893abbf56a6cac25eb25d9a32</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>reorder prefix checks, extract vex/evex prefix handling</title>
<updated>2022-05-30T18:16:52+00:00</updated>
<author>
<name>iximeow</name>
<email>me@iximeow.net</email>
</author>
<published>2022-04-21T09:31:40+00:00</published>
<link rel='alternate' type='text/html' href='http://git.iximeow.net/yaxpeax-x86/commit/?id=f338c74656f6eef8b3080fa9f249b1cb733fd1a9'/>
<id>f338c74656f6eef8b3080fa9f249b1cb733fd1a9</id>
<content type='text'>
sharing vex/evex invalid prefix checks improves codegen a bit, but
ordering prefix checks by likeliest prefix first reduces time falling
through prefix handling arms. both together are a notable improvement in
throughput on typical x86 code.

bundled in here is some code motion to where `mem_size = 0` and
`operand_count = 2` are executed; this is because, at least on zen2 and
cascade lake parts, bunching all stores to the instruction together
caused small stalls getting into the decoder. spreading out stores seems
to mix these assignments with parts of code that was not using memory
anyway, and pipelines better.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
sharing vex/evex invalid prefix checks improves codegen a bit, but
ordering prefix checks by likeliest prefix first reduces time falling
through prefix handling arms. both together are a notable improvement in
throughput on typical x86 code.

bundled in here is some code motion to where `mem_size = 0` and
`operand_count = 2` are executed; this is because, at least on zen2 and
cascade lake parts, bunching all stores to the instruction together
caused small stalls getting into the decoder. spreading out stores seems
to mix these assignments with parts of code that was not using memory
anyway, and pipelines better.
</pre>
</div>
</content>
</entry>
<entry>
<title>move opcode lookup tables into const arrays</title>
<updated>2022-05-30T18:16:52+00:00</updated>
<author>
<name>iximeow</name>
<email>me@iximeow.net</email>
</author>
<published>2022-04-21T09:28:34+00:00</published>
<link rel='alternate' type='text/html' href='http://git.iximeow.net/yaxpeax-x86/commit/?id=e7f4950985ab9976e9d00599c9225327c64f6439'/>
<id>e7f4950985ab9976e9d00599c9225327c64f6439</id>
<content type='text'>
cleanliness, but also slightly better codegen somehow?
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
cleanliness, but also slightly better codegen somehow?
</pre>
</div>
</content>
</entry>
</feed>
