On Tue, Apr 17, 2018 at 1:49 PM, Fred Cisin via cctalk <
cctalk at classiccmp.org> wrote:
>
I always found it amusing that many
programs (even FORMAT!) would fail
with the wrong error message if their internal DMA buffers happened to
straddle a 64K block boundary. THAT was a direct result of failure to
adequately integrate, or at least ERROR-CHECK!, the segment-offset kludge
bag. Different device drivers and TSRs could affect at 16 byte intervals
where the segment of a program ended up loading.
It was NOT hard to normalize the Segment:Offset address and MOVE the
buffer to another location if it happened to be straddling.
Huh. I would guess that this is the source of a DOS bug that I found back
in the day, reported to MS, and never heard back.
Working on a large application, ground out a new release, got a call from
the production (the guy that ran the floppy duplicator) that his quality
control tests were failing -- the application on the floppies wouldn't
start. I grabbed one, it ran on my machine ok, wouldn't run on production's
test machine. Confiscated that machine and started swapping out hardware,
nothing helped. Tried adding tracing code to the application to see if I
could narrow down the failure point, but discovered that changing the
executable would change the behavior -- a heisenbug. Eventually worked that
the crash was related to the address that the executable was loaded at,
which was dependent on the various TSRs that were loaded -- with the
production test machine driver configuration, the load address would
reliably crash the application. If I adjusted the load address to match on
my machine, I would get the same crash.
To figure out what the crash was about, I ended up writing a small TSR that
set the "break on every instruction bit", and would push the PC and opcode
out the serial port, and collected the data streams for the crashing and
non-crashing configurations. Diffed the data streams to find where they
were diverging.
The application was large enough to have overlays -- as the program was
starting up, an overlay with run-once initialization code would be read
from disk and jumped into; in the crash configuration,
the overlay code
seemed not to be being read, or read incorrectly -- the first
opcode in the
overlay was wrong.
Wrote a simple program that read a data file of the same size as the
overlay into different locations in memory and verified that the data was
read, demonstrating that DOS was failing for one buffer address but not
another, documented it, and send it off to MS and told management that the
bug was MS and there really wasn't anything we could do.
Never heard back from them, and have actively avoided MS software ever
since.
A buffer boundary straddling error certainly sounds like the issue I was
seeing; it feels very odd to see a plausible explanation 35 years later.
-- Charles