OT? Upper limits of FSB

Tue Jan 8 16:51:51 CST 2019

Some architectures (I’m thinking of the latest Intel CPUs) have a small loop cache
whose aim is to keep a loop entirely within that cache.  That cache operates at the
full speed of the instruction fetch/execute (actually I think it keeps the decoded uOps)
cycles (e.g. you can’t go faster).  L1 caches impose a penalty and of course there is
the instruction decode time as well both of which are avoided.

TTFN - Guy

> On Jan 8, 2019, at 2:43 PM, Chuck Guzis via cctalk <cctalk at classiccmp.org> wrote:
> 
> On 1/8/19 1:23 PM, Tapley, Mark via cctalk wrote:
> 
>> Why so (why surprising, I mean)? Understood an unrolled loop executes
>> faster...
> 
> That can't always be true, can it?
> 
> I'm thinking of an architecture where the instruction cache is slow to
> fill and multiple overlapping operations are involved and branch
> prediction assumes a branch taken.  I'd say it was very close in that case.
> 
> --Chuck
>