Linearizing PDF scans

J. David Bryan jdbryan at acm.org
Sun Aug 15 00:29:37 CDT 2021


On Sunday, August 15, 2021 at 12:55, Kevin Parker wrote:

> I think it used to be called Byte Range Serving i.e. it would only
> serve up the page requested so URL's like
> somewebsite.com/myfile.pdf#page=4 would only send page 4 to the browser
> - I think this is what you are talking about 

That's it exactly.


> ...but on my limited understanding it required support from the web
> server to actually give effect to this.

I believe that's right.  At least all of the servers I used seemed to 
support this option.


> I don't know if PDF's are optimised out of the box these days for this
> but if you optimise a PDF for web delivery it should have the markers
> in it for byte range serving. While the markers may add a bit to the
> file size, which I suspect would be negligible, the action of
> optimising it for web delivery should reduce file size quite noticeably
> anyway.

I use Ghostscript to perform the linearization as a post-process of the 
tumble-produced PDFs.  It seems to add about 5-10% in size to the dozen or 
so files I've produced in both formats.

Assuming one only looks at a few pages, it would certainly reduce the 
amount of data served, though, of course, if one requested the entire file, 
it would actually be a slight disadvantage.


> Is it useful these days - probably not so much because of better
> bandwidth in my view (although directing a browser to open to a
> specific page can still be useful) but that is conditional on having
> well prepared PDF files.

It's an extra, albeit automated, step in my process, so it requires a 
limited effort on my part.  But for a version or two, GS linearization was 
broken, so I wound up with a mix of linearized and non-linearized files.  
Which had me wondering whether it was worth going back and linearizing the 
ones that weren't.

I could see it being most useful for something like IC databooks, where one 
might only want a one-time look up of a couple of pages out of a several 
hundred page PDF.  For something like a service manual, though, I'd 
anticipate that folks would want to download the whole manual rather than a 
page here and a page there.

As you say, it requires server support, and to be honest I've not checked 
recently to see if servers bother byte-serving anymore.  Maybe pipes are 
too big to worry about it.

Anyway, I was wondering if I was a dinosaur to keep linearizing these 
things if no one else was.

Thanks for your thoughts.

                                      -- Dave



More information about the cctalk mailing list