Sean Conner wrote:
It was thus said that the Great Don once stated:
Even in the limit case (i.e. only one application
exists with which you
could use to "open" *all* JPEGs), is there still any reason why you
need to name a file:
PictureOfUsSkiing.<photograph>
Granted, you might have:
Skiing.<photograph> // i.e. photo
Skiing.<expense_tabulation> // i.e. spreadsheet
Skiing.<test> // i.e. invitation to ski trip
...
But, they could still all be *called* "Skiing" -- with some
other attribute (e.g. file creator) that actually differentiates
them.
Heh. About ten years ago I got into a similar discussion with some
friends about this, and even designed a file system that not only could
support user added metadata, but even the concept of a "name" was fluid (you
could actually use an audio clip as the "name", or a graphic). It would
also allow one to "cd" into a jpeg (for instance) and see all the segments
that make up a JPEG file (once you concede that a directory is nothing more
than a special file that "points" or "contains" other files, then
this type
of stuff just kind of falls out) or even "cd" into an executable and see the
code and data segments (which means no special tools required to support
"fat" binaries, and if you want to strip out the 68K code portions because
you're on an x86 platform, you can use the regular delete command from the
shell, stuff like that).
Excellent! In my case, directories are active objects
and the "file system" is really more appropriately called
the "name space". Since directories are active, the operations
that can be applied to a directory (i.e. that the directory
applies to *itself*) can be unconventional. And, the objects
that it can "contain" (reference?) can be quite varied.
E.g., some objects may be "volatile", others "static", etc.
But how would I copy a file "named" <FX
of screetching tires> to some
other system, like Unix? It's a source file containing C code, but the
metadata includes the latest version number, which project it belongs to,
the owner (me), and an extensive list of changes to the file since it was
first created.
This would be the problem of the "export method". E.g., how does
a digital camera copy it's files to your $computer? In the
camera, there is no notion of JPEG, TIFF, etc. They are just
values from a CCD stored in memory in some convenient order
for the hardware to generate and process. Obviously (?), the
data isn't represented internally *as* a JPEG *before* image
processing is done (e.g., jitter reduction, color compensation,
etc.). Rather, JPEG is just "an acceptable way" of exporting
the data (rather than some other oddball format that might
require the user to run some "converter" on the data prior
to use).
So, I'm
still wondering *why* this came to be (hence the
historical reference)
Probably had something to do with the popularity of Unix and MS-DOS. Unix
started out treating files as a stream (or bag) of bytes---no structure was
implied or enforced by the operating system and from what I understand, at
the time that was pretty revolutionary. I'm also guessing that at the time,
you really only had three different types of files (excluding the special
device files)---executables, object code and text files. And even *if* you
But, even "text files" have different types. E.g., shell scripts
are "different" than "ascii text". (and this is determined by
inspecting the file's *contents*, not *name*!)
wanted to waste some disk space on tracking file type
information, how much
space do you set aside in the inode for such information? (my guess is that
at the time, the creators of Unix didn't think such information was all that
important and besides, with 14 character file names, why not just let
convention win and stick the "type" as an extension?)
But you're already using 2 - 4 bytes in the inode to track this
type! I.e., '.' followed by 1 - 3 (or more) characters of extension.
And, for all practical purposes, you are poorly utilizing those
4 bytes! The first conveys no information other than "an extension
follows". And, of the 1 - 3 (typical) that follow, you really only
see 36^3 different file "types" (i.e. case neutral, and typically
only alphanumerics). So, you're storing 46656 data types in a field
that could store 4294967296. (i.e. you could use 2 bytes instead
of the 4 you are using)
Now, shuffle over to CP/M (precursor to MS-DOS) and
there, the three
letter extension *is* the file type---in all the documentation I've read
about CP/M, the three character extension is used to designate the file
type (and said extention restricted to the letters 'A' through 'Z' (and
any trailing space)). It was a separate field from the name (which was
eight characters long if I recall). MS-DOS picked up on this, and yes, if
you check the documentation for MS-DOS, the three letter extention is again
a separate field in the directory listing, and it followed the same
restrictions as CP/M.
But CP/M didn't *do* anything with the file extension (for all
practical purposes). I.e. you could have a test file called
foo.foo and your text editor would gladly open it. If an
*application* wanted to insist on particular file extensions
then it could do so -- usually pissing you off in the process
("No, Editor, I am writing an INCLUDE file, I want to name foo.inc
*not* foo.doc")
So MS-DOS *does* store file type information in the
directory entry
(however restricted it is).
Now, this metainformation about the file is easy to carry across between
systems (like Unix, and even the Macintosh) if you tack on the extension as
part of the "name" of the file. So, if you move a file
"vacation.jpg" from
Unix (which doesn't care what type of file it is) to MS-DOS, automagically
it gets the *type* when copied (JPEG image file).
This, IMO, is the only "real" reason that file type has migrated
into the namespace. It lets everyone avoid the issue of
moving files between systems by simply stating that files are
*just* bytes -- even if your applications think otherwise.
I.e. if your MS machine doesn't know what .HQX means, <shrug>.
It's harder if this metainformation is stored as
something else (like the
4 byte type field on the Macintosh---not sure what it's called exactly).
So, on a Mac you have the file "vacation" but the type is (as an example)
0x4A504547. It's meaningless on Unix, and it's a value that won't fit into
the MS-DOS extension field.
BeOS (on topic actually) had a very cool system whereby the user could
attach arbitrary meta information to a file, and the file types were stored
as MIME types (so "vacation" would have a type of "image/jpeg"). But
again,
It still doesn't address the issue raised by an earlier respondant
(i.e. tagging files for special treatment on open) but that could
be done by fabricating your own file type.
how can one transfer user added metadata of a file to
another system?
This would be worth looking into. Does BeOS run on "special"
hardware?
-spc (So it just kind of evolved that the file type
is tacked on to the
end of the file name.)