On Tue, 10 Apr 2012, Pontus Pihlgren wrote:
Just a few pointers.
Check out
http://forums.nekochan.net. The site is down at the moment but
usualy quite alive and has more than a few members with Onyx2 racks
(myself included).
The NekoChan guys are great - but, as you say, dead in the water recently
or I would've posted there first. I'm being impatient :)
The documentation seems to be pretty sparse on these sorts of specifics -
I've been vaguely wondering if an SGI tech generally taught owners this
sort of thing.
When it comes to spares.. well, the PSU seems to go
and I'd like to keep
my spares for when I bring my own rack up. I could probably separate
with a few node boards (uncertain what I actually have, will have to
check).
Believe me, I hear you. And the PSUs are unbelievably heavy, so shipping
would kill. And complicated. I was delighted when I popped it out and
refreshed my memory on its output though - 3.45v at 375 Amps. 375!
The MMSC fails in the display if I'm not
misstaken, perhaps a
replacement LCD could be found? But you should be able to fire up the
whole system with just one.
My understanding is the MMSCs are far more prone to fail than the screens.
The MMSCs will bring up the NIC lights on start, so I've been using that
as an indicator and only one seems to work.
More importantly, unfortunately, every rack needs an MMSC to fire. The
MMSC connects to the MSC to co-ordinate the power sequencing of the
modules and each MMSC can only talk to two modules at once.
And finally, I love your youtube channel! keep the
videos comming :)
Thank you!
I'm trying to put up something w/all these damned SGIs for a joke but it
keeps getting further delayed because of all the freakin' problems I'm
having. I'll get there in the end. Three racks gave me 24 processors the
other day, which blew me away. But I'd sure like to go for 40 (I've got a
couple spare nodeboards).
- JP
On Tue, Apr 10, 2012 at 08:47:33AM -0500, JP Hindin
wrote:
Greetings;
Long-story-short, a couple of years ago I picked up six Onyx2 racks and
have been moving them around with me without ever actually firing them up.
I've finally got myself sorted and have been slowly working through
bringing things up and having some successes, but every step closer has me
finding a new problem.
My set-up right now has one graphics head and five compute nodes cabled
together in a daisy-chain (not enough CrayLinks for anything else). It
appears all but one of the MMSCs are shot, so I'm doing manual start-ups
using the keys.
My current confusion is how to nominate which system becomes the Global
Master. For some odd reason whenever I bring up three racks the machine
I've "picked" as the master (keyboard/mouse/gfx head) comes up just fine
and boots into IRIX, but whenever I add two more nodes things get a bit
more fuzzy and the Global Master appears to migrate around.
I had initially believed that the last rack in the power-up sequence would
always become the Global Master, since it goes and finds all the rest, but
this apparently is not the case... or perhaps there are corollaries I'm
unaware of.
The more times I turn this thing on and off the more hardware is failing
on me, not unexpectedly. I've lost a PSU, a node board and now one of the
racks has started making a worryingly hot-electrical smell. I'd really
like to get it all working together just once before I get old and grey.
Cheers;
- JP