"File types"

29 Aug 2006

Sean Conner wrote:
...
  It was thus said that the Great Don once stated:
  Even in the limit case (i.e. only one application
exists with which you
 could use to "open" *all* JPEGs), is there still any reason why you
 need to name a file:
 PictureOfUsSkiing.<photograph>
 Granted, you might have:
 Skiing.<photograph>           // i.e. photo
 Skiing.<expense_tabulation>   // i.e. spreadsheet
 Skiing.<test>                 // i.e. invitation to ski trip
 ...
 But, they could still all be *called* "Skiing" -- with some
 other attribute (e.g. file creator) that actually differentiates
 them. 
   Heh.  About ten years ago I got into a similar discussion with some
 friends about this, and even designed a file system that not only could
 support user added metadata, but even the concept of a "name" was fluid (you
 could actually use an audio clip as the "name", or a graphic).  It would
 also allow one to "cd" into a jpeg (for instance) and see all the segments
 that make up a JPEG file (once you concede that a directory is nothing more
 than a special file that "points" or "contains" other files, then
this type
 of stuff just kind of falls out) or even "cd" into an executable and see the
 code and data segments (which means no special tools required to support
 "fat" binaries, and if you want to strip out the 68K code portions because
 you're on an x86 platform, you can use the regular delete command from the
 shell, stuff like that). 
Excellent!  In my case, directories are active objects
and the "file system" is really more appropriately called
the "name space".  Since directories are active, the operations
that can be applied to a directory (i.e. that the directory
applies to *itself*) can be unconventional.  And, the objects
that it can "contain" (reference?) can be quite varied.
E.g., some objects may be "volatile", others "static", etc.
...
    But how would I copy a file "named" <FX
of screetching tires> to some
 other system, like Unix?  It's a source file containing C code, but the
 metadata includes the latest version number, which project it belongs to,
 the owner (me), and an extensive list of changes to the file since it was
 first created.   
This would be the problem of the "export method".  E.g., how does
a digital camera copy it's files to your $computer?  In the
camera, there is no notion of JPEG, TIFF, etc.  They are just
values from a CCD stored in memory in some convenient order
for the hardware to generate and process.  Obviously (?), the
data isn't represented internally *as* a JPEG *before* image
processing is done (e.g., jitter reduction, color compensation,
etc.).  Rather, JPEG is just "an acceptable way" of exporting
the data (rather than some other oddball format that might
require the user to run some "converter" on the data prior
to use).
...
   So, I'm
still wondering *why* this came to be (hence the
 historical reference) 
 Probably had something to do with the popularity of Unix and MS-DOS.  Unix
 started out treating files as a stream (or bag) of bytes---no structure was
 implied or enforced by the operating system and from what I understand, at
 the time that was pretty revolutionary.  I'm also guessing that at the time,
 you really only had three different types of files (excluding the special
 device files)---executables, object code and text files.  And even *if* you 
But, even "text files" have different types.  E.g., shell scripts
are "different" than "ascii text".  (and this is determined by
inspecting the file's *contents*, not *name*!)
...
  wanted to waste some disk space on tracking file type
information, how much
 space do you set aside in the inode for such information?  (my guess is that
 at the time, the creators of Unix didn't think such information was all that
 important and besides, with 14 character file names, why not just let
 convention win and stick the "type" as an extension?) 
But you're already using 2 - 4 bytes in the inode to track this
type!  I.e., '.' followed by 1 - 3 (or more) characters of extension.
And, for all practical purposes, you are poorly utilizing those
4 bytes!  The first conveys no information other than "an extension
follows".  And, of the 1 - 3 (typical) that follow, you really only
see 36^3 different file "types" (i.e. case neutral, and typically
only alphanumerics).  So, you're storing 46656 data types in a field
that could store 4294967296.  (i.e. you could use 2 bytes instead
of the 4 you are using)
...
    Now, shuffle over to CP/M (precursor to MS-DOS) and
there, the three
 letter extension *is* the file type---in all the documentation I've read
 about CP/M, the three character extension is used to designate the file
 type (and said extention restricted to the letters 'A' through 'Z' (and
 any trailing space)).  It was a separate field from the name (which was
 eight characters long if I recall).  MS-DOS picked up on this, and yes, if
 you check the documentation for MS-DOS, the three letter extention is again
 a separate field in the directory listing, and it followed the same
 restrictions as CP/M.   
But CP/M didn't *do* anything with the file extension (for all
practical purposes).  I.e. you could have a test file called
foo.foo and your text editor would gladly open it.  If an
*application* wanted to insist on particular file extensions
then it could do so -- usually pissing you off in the process
("No, Editor, I am writing an INCLUDE file, I want to name foo.inc
*not* foo.doc")
...
    So MS-DOS *does* store file type information in the
directory entry
 (however restricted it is).
   Now, this metainformation about the file is easy to carry across between
 systems (like Unix, and even the Macintosh) if you tack on the extension as
 part of the "name" of the file.  So, if you move a file
"vacation.jpg" from
 Unix (which doesn't care what type of file it is) to MS-DOS, automagically
 it gets the *type* when copied (JPEG image file).   
This, IMO, is the only "real" reason that file type has migrated
into the namespace.  It lets everyone avoid the issue of
moving files between systems by simply stating that files are
*just* bytes -- even if your applications think otherwise.
I.e. if your MS machine doesn't know what .HQX means, <shrug>.
...
    It's harder if this metainformation is stored as
something else (like the
 4 byte type field on the Macintosh---not sure what it's called exactly).
 So, on a Mac you have the file "vacation" but the type is (as an example)
 0x4A504547.  It's meaningless on Unix, and it's a value that won't fit into
 the MS-DOS extension field.
   BeOS (on topic actually) had a very cool system whereby the user could
 attach arbitrary meta information to a file, and the file types were stored
 as MIME types (so "vacation" would have a type of "image/jpeg").  But
again, 
It still doesn't address the issue raised by an earlier respondant
(i.e. tagging files for special treatment on open) but that could
be done by fabricating your own file type.
...
  how can one transfer user added metadata of a file to
another system? 
This would be worth looking into.  Does BeOS run on "special"
hardware?
...
    -spc (So it just kind of evolved that the file type
is tacked on to the
        end of the file name.)

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

"File types"