On Fri, 22 Jun 2001, Lawrence LeMay wrote:
Really? I never had problems with Smartdrive on 286
and 386's. I remember
running a IBM-AT with something like 10 Megs of ram, and using about 6-8
Megs with smartdrive as a huge disk cache. Great way to make the system
run like a bat outta hell (well, hard drives were big and slow back then).
That was probably on a dos 3.3 system, maybe with windows 3.0
Hmm, of course we never used disk compression, so maybe thats why I saw
no problems and only major benefits to using Smartdrive.
No, the problems were not related to compression, although that could
exacerbate any other problems.
The problem was with the delayed write.
If you had SMARTDRV configured for read caching only, then it worked fine.
To turn off write caching, invoke it with /X, or SMARTDRV C (not SMARTDRV
/C) would turn on read caching with no write caching for drive C. I also
like to always use the /V (verbose) option for SMARTDRV, MSCDEX, etc.
The problems with write caching:
example 1: a user saves a WordPervert document, exits WordPervert, and
being "done", shuts the computer off. The write cache had not been
written to disk.
2: power failure while there is content in the write cache.
3: SMARTDRV rearranged the sequence of writes for greater efficiency. If
interrupted, the alteration of the sequence of writes meant serious
discrepancies between what the DIRectory showed v what was on the
disk. (as opposed to "normal" disk I/O, where an interruption would mean
that later changes were lost, but earlier changes had been completed.
4: disk error during write. In "normal" disk I/O, an error writing to
the disk would invoke the critical error handle, aka "ABORT, RETRY,
IGNORE, FAIL" If RETRY doesn't do it, it is possible to ABORT out of the
currently running program, or IGNORE the error (continue on, pretending
that nothing had happened)
But with write caching, SMARTDRV has already told the system that the
write had been successfully completed, before it actually tried. (Kinda
like when you tell your boss that you finished the project, but left it
home. At lunch time you dash home, fire up the computer to start, and
it blue screens.) SMARTDRV can RETRY. But it CAN'T ABORT out of the
middle of the program, because the program was "successfully" finished
long ago. It can't even IGNORE.
During Windoze 3.10 installation (I was a Beta), if you hit a disk write
error (hard drives were not as reliable then as now), you could write down
which file failed, IGNORE the error, finish the installation, then go back
and manually install the WINGBAT ITALICS file, or whatever. BUT Windoze
3.10 installed SMARTDRV with write caching enabled. It would not give ANY
choice except RETRY. Power cycling was the only way to regain
control. But although all but 3 of the files had been copied, the
DIRectory was still and write cache and not written, so, once rebooted,
there was no visible trace of the partial installation. I called the BETA
support staff, and explained the problem. They said, "but that's a
HARDWARE problem". I said, "1) It is the responsibility of the operating
system to handle hardware problems without causing further
complications. 2) leaving it that way was going to cost millions of
dollars in support, and the need for free upgrades to fix it."
When MS-DOS 6.00 came out, without any notification to the user, nor
request for confirmation, it installed SMARTDRV with write caching
enabled. Few users were aware of SMARTDRV, or that it had been
installed. The user was given an option to run drive compression. Every
user who ran drive compression was aware that they had done so, but
unaware that they were also running SMARTDRV.
Users started losing data (see above examples). They KNEW that they were
using drive compression, but DIDN'T know that they were using SMARTDRV.
Infoworld wrote several DOZEN articles "Users of new DOS having problems"
"DBLSPACE USERS losing data" "More problems with DBLSPACE", ... ALL
blaming the drive compression, but never isolating WHAT was going wrong.
The Infoworld "lab" set up a "test" system (there IS a difference
between
demonstrating the existence of a problem v isolating it!) that consisted
of a script that would do some spreadsheet macros, then do some
WordPervert macros, and then do a cold reboot and repeat (see above
examples!). Sure 'nuff they started getting corruption of the disk,
therefore "PROVING that DBLSPACE was corrupting disks". Bill Gates called
the editor and claimed that the Infoworld test wasn't valid and did NOT
prove that DBLSPACE was at fault. (He did NOT explain that there was
another known MICROS~1 culprit). The editor reported it as an attempt at
intimidation.
Eventually, MICROS~1 had to "fix the problems with DBLSPACE". They came
out with DOS 6.20. They implemented a long feature wish list of
reliability related features, including things like asking for
confirmation before over-writing an existing file. They also "fixed
DBLSPACE".
The changes that they made to "fix DBLSPACE":
1) turned off write caching in SMARTDRV
2) IF the user turned write caching back on, then when exiting a program,
SMARTDRV would not let DOS display the prompt until the write cache had
been written (thus taking care of the user who THOUGHT that it was done)
3) if write caching was enabled, SMARTDRV didn't rearrange the write
sequence.
[see what those fixes to DBLSPACE have in common?]
Infoworld reported that MICROS~1 had "Fixed DBLSPACE in the new version".
If you never turned off the machine prematurely, and never had an
unrecoverable write error, then you might never have experienced the
problems. You were lucky.
--
Grumpy Ol' Fred cisin(a)xenosoft.com