I wrote this note yesterday, but replied to cctalk-request by mistake.
(The listserv engine wasn't impressed, by the way).
...
If you're using the built-in import or export filters provided with
Word, WordPerfect, OpenOffice, or whatever, for your own documents,
check over the results *very carefully* on a computer that does not
have the old word processor installed. For documents anything more
complex than a grocery list, the conversion tools provided with
document processing programs are not quite what you'd hope for.
One bit of advice I have is, if you want to convert old documents, do
it now. As each new edition of Word or Corel Office comes out, tools
for converting the oldest formats either disappear, or they stop
working and the vendors aren't doing the QA to know this. (The current
version of Quattro Pro crashes trying to open many spreadsheets in the
old .WB2 and .WB3 formats, for instance). People in this group may have
intermediate, older software versions lying around, but most folks
don't.
My experience is that both import and export tools do a decent job of
getting raw text in and out. Formatting, styling, graphics, equations,
cross references and outline numbering schemes? Not so much. But this
is where the investment in a document lies. Text is cheap. One way of
converting legacy formats is to have printed pages retyped in China or
India at under a dollar a page. (This is a real industry.) But
formatting is really expensive. Complex documents with drawings, math
or tables can cost a business $25 to 100 a page, or more, to have
reworked by skilled clerical or technical staff.
Word's import filters are *horrible*. The result *might* look ok at
first, but it goes downhill fast one you start editing or try to apply
a consistent stylesheet. Their treatment of equations and graphics is
downright dangerous. The filter that extracts WordPerfect's WPG
graphics only converts the basic [0,x] character set for text that
appears in labels in graphics. Characters in the other sets turn into
spaces. I saw a drawing that used the "1/2" symbol in a label that read
something along the lines of "IMPORTANT: TIGHTEN THIS NUT TO 2 1/2 FOOT
POUNDS" come out "IMPORTANT: TIGHTEN THIS NUT TO 2 FOOT POUNDS." The
equation importer is no better. "grad" (nabla, the upside-down
triangle) and "Delta" both come out as an uppercase Delta. Some other
symbols are dropped entirely. Many equations simply don't convert at
all. This has been the state of the filters since at least Office 97.
Microsoft has made no corrections.
OpenOffice? I've found it crashy, especially when trying to export or
import. Small documents, OK. Big, complex documents, no.
Export tools are typically worse than import tools. Vendors aren't
really motivated to provide a good tool that you can use to escape from
their product line. You can usually get away with *publishing* to
another format, but you usually won't get a maintainable *source*
document. (By maintainable, I mean a document with cross-references
intact, outline numbering organized the way the new environment's
native numbering system works, typography governed by a single
consistent style sheet, and so on).
My experience with exporting things from WordPerfect is that equations
are not editable, or are not converted at all. Graphics exported to any
vector format (e.g. WMF) still use the proprietary WordPerfect fonts,
so again, symbols beyond basic ASCII are a big problem. Once you move
the exported document to a system that doesn't have WordPerfect
installed, symbols disappear from the graphics. PDFs with embedded
fonts get around this, but, PDFs are really bad format for documents
that have to be maintained.
(A couple of years ago we lost a bid to do a conversion for a major US
oil company. They decided to convert a huge archive of refinery
operating manuals in-house on the cheap, by making PDFs and then
converting the PDFs to Word. They will be losing all automatic cross
references and numbering. After a few future edits, "Before opening
this valve, see the critical warnings in step 3.32.4 on page 147" will
be pointing to the wrong comments on the wrong page because these
numbers are now just literal text. Management didn't seem to be at all
concerned about this. If I lived next to a refinery on the Gulf Coast,
I'd move.)
There are a lot of other details that get lost in translation by the
native tools that won't matter as much to an individual, but do matter
a lot to organizations that have thousands of documents to maintain. My
business is there, so I've written converters from scratch. Keeps me
employed!
Brian
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
_| _| _| Brian Knittel
_| _| _| Quarterbyte Systems, Inc.
_| _| _| Tel: 1-510-559-7930
_| _| _|
http://www.quarterbyte.com