Component level repair

24 Jan 2007

...
  Perhaps you'd like to explain your industry
experience to some of our
 customers then? They might listen to such reason coming from you. 
I did not have to explain anything to my customers - they already knew
how things worked.
I started out at US Robotics, in their manufacturing division. For the
years I was there, I became quite intimate with the process for
building many of their modems and systems (Total Control). For a while
USR was suffering from poor quality and yeilds. When they found I am a
very good quality auditor, they gave me near free reign to walk the
floor and nitpick. It was actually sort of fun. Because of this
nitpicking job I came to know the techs, and how and why they work.
The techs were responsible for doing reworks from the manufacturing
floor, as well as repairing returned products. Manufacturing reworks
were given priority over customer returns, simply because the returns
had unknown histories - USR just did not know what the customers had
been doing to the products. Each product had a test station or two,
and a complex formula to see if a device was worth repairing.
Sometimes the stations would instantly reject a board, especially if
it was a low cost device. Most of the time it would diagnose the
problem, then the tech would have a fixed amount of time to do the
repair - and the times were not very long at all. If the tech did not
think he could do the repair in the time, was backed up, or had more
pressing repairs to do, he could scrap the device. This happened quite
a bit. The techs at USR were expensive. Their test stations expensive.
The tools expensive. The lost profits from scrapping boards starts to
fade when compared to the infrastructure costs to repair them. This is
why the bean counters did not complain. The model USR used made them
happy - and it is a model shared in near every electronics
manufacturing firm.
Additionally, there is a lot to do with reliability in the field.
Repaired or reworked devices are not as reliable as something that
comes off the line perfectly. Everytime a board is handled,
statistically a little bit of damage occurs. Everytime solder hits the
board more damage occurs. Even when a board is dropped, bad things
happen. Most often these are not catostrophic failures, but latent
failures. Latent failures can be weakened chips, cracked chip dies,
cracked crystal oscillators, damaged vias, and so forth. Latent
failures still work - they will pass most of the tests at the stations
- then fail in the field. And back into the repair stream, but now
with an angry customer. With too many failure the customer shops
elsewhere. The bean counters get very cross when customers leave.
Then I moved over from being a USR employee to being a USR customer,
while still using some of the same products. ANS hired me to work on
the AOL network backbone (the remains of the old NSFnet). This was a
bunch of T3s when most of the world was a bunch of T1s - VERY serious
amounts of traffic. Downtimes were kept to a minimum, due to
redundancies, but every so often something bad would happen, or some
downtime was needed for an upgrade. AOL watched like a hawk, and
counted the downtime in very small increments (I think seconds per
month, as well as lost packets). When in the field doing a repair or
swap (we engineers did it, as AOL did not trust anyone else), we had
to be very quick about it, as downtime cost AOL enormous amounts of
money per second. Obscene amounts with lots of zeros at the end. It
was not just because people could not connect from their home
computers, so they might go elsewhere. It was also big businesses that
also used the AOL backbone that could not connect, and advertisers
whose internet ads were going in the bit bucket. The effects of
downtime spread out fast.
Obviously the way to keep the downtime low, AOL off our backs, and the
bean counters (again) happy - swaps were done as fast as possible,
using factory fresh parts. There was no time to probe around with a
scope or DMM to find a bad resistor or something. Field repairs like
this would have saved AOL maybe $7000 for a new Cisco FDDI card, but
incur $120000 worth of damages (these are realistic numbers). One does
not need to be an accountant to see this.
And now, after leaving AOL, I see the thrid part of the triad - the
scrap market. But I have talked about that already.
So you see why I wonder about people that think that component level
troubleshooting works in industry these days. I know there are still a
very few niche parts of the industry that can tolerate this, but they
are very small and shrinking fast.
--
Will

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

Component level repair