Don, you're just seeing one side of the picture.
From the PoV of the programmer just seeking the easiest
way of coding a
problem on a 8-bit byte oriented machine, it is indeed true that an
undifferentiated sequence of bytes has considerable advantages.
However, in the historical world - and even nowadays in parts of the real
world - other considerations can occur. Most notably in the field of
efficiency and performance.
It is a truism throughout computing that an important skill is the selection
of the correct level of abstraction (and often of indirection) for the
problem.
For underutilised machines (eg personal desktops) programmer ease is
typically the primary requirement. For the heavily scheduled batch machines
of 40 years ago and for real-time applications one may need to get nearer
the metal.
Andy Holt wrote:
Yes, I
*know* this has been done other ways in the past.
What I am trying to figure out is the rationale behind
why it has (apparently) migrated into the file *name*.
That, I think, was a necessary side effect of the original Unix design
decision that "a file is a sequence of characters" without
special
propertis
that are known to the operating system.
IMO, a file *is* an untyped string of *bytes*. The OS shouldn't
care about it's representation (none of this "text mode" vs.
"binary mode" crap). It's "attributes" should solely be things
like size, creation time, ACLs, etc.
An old problem of differing size characters - typically on 36-bit-word
machines where character sizes might be 6, 7, 8, or 9 bits is now
reappearing with unicode.
I could argue that application program ought to be "blind" to the
representation of characters in a simple serial text file. In unix the "dd"
program tries, with modest success, to handle the problem. In the '60s and
'70s the saving in file space by using fixed-length records without storing
"newline" (or whatever) could be vital.
So, as far as the OS was concerned, files might be serial, sequential,
indexed sequential, random (and perhaps other organisations)
with fixed or
variable record sizes. (see the DCB card in OS360
JCL); there may be a
IMO, this was a mistake. It forces the OS to know too much
about the applications that run on it -- instead of being a
resource manager. I.e. it should implement mechanisms, not
policy.
There is a problem here - if you only present an abstraction of the hardware
to the programmer you have no means of using information about the
underlying hardware to gain performance. 40 years ago there were large books
on how to design indexed-sequential files ... and for good reason. If your
carefully optimised layout gets abstracted-away from under you performance
can drop by orders of magnitude.
complex set of access permissions (not just
read/write/modify, or even
access control list, but possibly password-controlled access or
time of day
limited also); and there probably also are a
large set of
backup options as
well.
These could make if very difficult, for example, to write a
COBOL program
whose output was source for the FORTRAN compiler
[and even
harder to do the
reverse - COBOL, at least, could probably handle
most file types)
Exactly. Or, any "unforeseen" file types...
Oh yes - your agument also
has its points.
But try feeding a unicode source text file into your copy of GCC and see
what happens.
Andy