Hardware vs Simh (and FPGA)

26 Apr 2013

On Apr 26, 2013, at 12:45 PM, Mouse <mouse at Rodents-Montreal.ORG> wrote:

...
    Tools from Xilinx, Altera, Lattice, and MicroSemi run
on Linux as
 well as Windows. 
 Mouse does not run Linux or Windows  
 Right.

  or x86 processors, to my knowledge.  
 Wrong. :(  I do have and run some x86 machines, because those are what
 other people are throwing out.  I don't like them; I use them in places
 where I am mostly isolated from them, such as backup server and house
 routers; my screen-and-keyboard machines are SPARCstation 20s.  Most of
 my other architectures (68k, MIPS, ARM, PowerPC, Alpha) I have/run in
 order to help keep my code CPU-portable.  Two others have idiosyncratic
 bases: VAX for the emotional attachments I have to it and Super-H
 because the one I have has interesting harware integrated with it which
 I want to play with. 
My bad, I thought x86 was more or less banished from your domain. :-)
Is the Super-H machine you're talking about a Dreamcast?  Those are
pretty neat machines, though I never had one myself.  People are still
releasing new (good) games for them, too!

...
   Xilinx and
Altera both used to run on Solaris and HP-UX for Sparc and
 HP-PA, respectively, but those were discontinued many years ago.  For
 modern FPGAs, you really need to be running pretty beefy, modern iron
 to compile in any sort of reasonable timeframe.  
 Why?  I can't see this as anything but the fault of the compiler. 
It's the complexity of the devices.  It's not like CPU code, where
everything is linearly executed; it's a lot closer to a PCB auto-
router because it's a 2D search space of optimal routes.  Except
the problem is blown up to several hundreds of thousands of pre-
routed nets to select from.  The actual synthesis code (taking the
HDL and turning it into a device-independent intermediate netlist)
takes about as long as compiling the equivalent length of C code;
it's the "fitting" process (taking the intermediate netlist,
placing all the nodes, then choosing routes that will meet the
timing constraints) that takes so long.

...
   My 8-core Xeon
with 8 GB of RAM takes 45 minutes to compile even a
 medium-sized Arria II (Altera's mid-level FPGA) that's half-full, so
 I don't think a SparcStation 20 would be an ideal candidate for
 running the exhaustive search algorithms necessary for building
 modern devices,  
 "Necessary"?  Or "necessary to get high utilization percentages" or
 some such?  That is, is that exhaustive search necessary for
 correctness or is it akin to optimization in a C compiler? 
Optimization on a C compiler usually involves some comparatively
simple dataflow analysis and then eliminating redundancies which
are revealed by said analysis.  Fitting an FPGA involves making
a best guess at what the optimum placement is (sometimes with
"floorplan" guidance from the designer, which can help a lot)
and then making many, many passes at what the optimum routes are
to meet the timing constraints specified.  I don't know what the
exact algorithms are, since they're considered trade secrets,
but I know the way at least Xilinx's software iterates looks
close to a genetic algorithm (it keeps the most successful pass
so far and varies it until it either closes timing or gives up).

How long it takes obviously depends on the complexity of the
design.  If you just throw a 16-bit counter design in there,
or a moderately complex design with a very slow clock, it's
going to take a minute and a half to compile even on the biggest
FPGA out there because it doesn't have to make many tries before
it meets timing.  But if you crank the clock in the timing model
up to 1 GHz, it's going to try a whole lot more potential routes
before it throws its hands up in disgust (within reason; if you
did it with the 16-bit counter, it will very quickly converge on
a "this is as fast as she goes, Cap'n" value anyway).

...
  I suspect there are much cheaper algorithms that would
get most of the
 way there, wherever "there" is - but I can't experiment with that
 either because of the vendors' draconian IP stupidity. 
Maybe.  CAD algorithm design is a huge and active area of
academic research, because there's a lot of money to be made if
you can develop an algorithm that closes faster and gives higher
maximum frequencies.  I agree that it's incredibly dumb that the
bitstream semantics are hidden.  There's always an opportunity
to reverse-engineer from subtly different builds, but I do think
the EULAs probably prohibit that... if you did it for your own
edification, though, no one would know.

...
   but if you
were able to write your own generation software, you could
 at least route it yourself (with all the hazards that entails).  
 If.  Exactly. 
Alas.

- Dave

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Hardware vs Simh (and FPGA)