On 1/8/2019 3:51 PM, Guy Sotomayor Jr via cctalk wrote:
Some architectures (I?m thinking of the latest Intel
CPUs) have a small loop cache
whose aim is to keep a loop entirely within that cache. That cache operates at the
full speed of the instruction fetch/execute (actually I think it keeps the decoded uOps)
cycles (e.g. you can?t go faster). L1 caches impose a penalty and of course there is
the instruction decode time as well both of which are avoided.
TTFN - Guy
I bet I/O loops throw every thing off.
Ben.