On Mar 15, 2007, at 3:58 AM, Adrian Graham wrote:
> But by
cat-ing several mpegs together you DO end up with a
> single header,
No, you end up with an invalid target with multiple beginning-of-
file
headers throughout the file.
Ah, my bad. I naturally assumed that cat would strip those off
and just have
How would cat know what part of the stream was the header? Were you
talking about *nix cat or some other cat?
I was kind of taking the simplistic view that each file had a
standard sized
header that at least had a magic number* and some indication of
length of
said file. I can also see that it might just read-until-EOF so
maybe I was
crediting it with too much intelligence.
*osf1/tru64 files have a magic number in the header that describes
the file
type which you can determine with the 'file' command. Since Tru64
is part
BSD and part SYSV I thought said behaviour must've come from one of
those.
Nope, there is no such header in any UNIX implementation that I'm
aware of. The "magic number" you speak of isn't part of any
header...the "file" program opens the target file, looks at the first
few bytes, and then looks up the pattern in its database to arrive at
an *educated guess* as to the type of file it's looking at...for
example, if bytes 7-10 of the file are 0x4a464946 (ascii "JFIF"), it
is most likely (but not definitely!) a JPEG image file. Similarly,
if bytes 1-6 are 0x474946383961 (ascii "GIF89a") the file is most
likely a v89a GIF image file, and if bytes 1-8 are 0xfeedface, it's a
Mach-O executable from a MacOS X system.
It is important to understand, though, that this has nothing at
all to do with the operating system, and there is no common header
format of any sort. It just so happens that many types of files are
consistent in what their first few bytes contain. The "magic number"
is just a pattern of bytes that are known to be consistent from file
to file of the same type.
Try this experiment on pretty much any UNIX-likesystem:
apophis$ cat > foo.txt
GIF89a is a half-decent file format.
apophis$ file foo.txt
foo.txt: GIF file, v89
apophis$
Similarly, if you create a file on a MacOS X system whose first
eight bytes (not six!) are 0xfeedface, the "file" command will guess
it is a Mach-O executable file...and if you happen to create a file
whose first eight bytes are 0xfeedface, make it executable, and run
it, the OS will attempt to execute it but fail.
One of the primary tenets of the UNIX philosophy is that a file is
NOTHING MORE than a sequence of bytes. This point is central and
important enough that I'd argue that any UNIX implementation that
adds such headers is no longer UNIX.
Look at the man pages for the "file" command on your system to see
where it stores its database. This is almost always a text file
whose contents are well-commented; it will be fairly obvious how
"file" uses it to scan a file to try to guess its type.
-Dave
--
Dave McGuire
Port Charlotte, FL