Linearizing PDF scans
J. David Bryan
jdbryan at acm.org
Sun Aug 15 00:29:37 CDT 2021
On Sunday, August 15, 2021 at 12:55, Kevin Parker wrote:
> I think it used to be called Byte Range Serving i.e. it would only
> serve up the page requested so URL's like
> somewebsite.com/myfile.pdf#page=4 would only send page 4 to the browser
> - I think this is what you are talking about
That's it exactly.
> ...but on my limited understanding it required support from the web
> server to actually give effect to this.
I believe that's right. At least all of the servers I used seemed to
support this option.
> I don't know if PDF's are optimised out of the box these days for this
> but if you optimise a PDF for web delivery it should have the markers
> in it for byte range serving. While the markers may add a bit to the
> file size, which I suspect would be negligible, the action of
> optimising it for web delivery should reduce file size quite noticeably
I use Ghostscript to perform the linearization as a post-process of the
tumble-produced PDFs. It seems to add about 5-10% in size to the dozen or
so files I've produced in both formats.
Assuming one only looks at a few pages, it would certainly reduce the
amount of data served, though, of course, if one requested the entire file,
it would actually be a slight disadvantage.
> Is it useful these days - probably not so much because of better
> bandwidth in my view (although directing a browser to open to a
> specific page can still be useful) but that is conditional on having
> well prepared PDF files.
It's an extra, albeit automated, step in my process, so it requires a
limited effort on my part. But for a version or two, GS linearization was
broken, so I wound up with a mix of linearized and non-linearized files.
Which had me wondering whether it was worth going back and linearizing the
ones that weren't.
I could see it being most useful for something like IC databooks, where one
might only want a one-time look up of a couple of pages out of a several
hundred page PDF. For something like a service manual, though, I'd
anticipate that folks would want to download the whole manual rather than a
page here and a page there.
As you say, it requires server support, and to be honest I've not checked
recently to see if servers bother byte-serving anymore. Maybe pipes are
too big to worry about it.
Anyway, I was wondering if I was a dinosaur to keep linearizing these
things if no one else was.
Thanks for your thoughts.
More information about the cctech