Risks of DJVU/lossy compression - Re: If you OCR, always archive the bitmaps too

Toby Thain toby at telegraphics.com.au
Sun Sep 27 15:16:57 CDT 2015


On 2015-09-27 4:14 PM, Toby Thain wrote:
> On 2015-09-27 2:33 PM, Fred Cisin wrote:
>> On Sun, 27 Sep 2015, Pontus Pihlgren wrote:
>>> It seems to me that a better tool could solve the issue. One that
>>> could display the OCR:ed content only and the scanned content
>>> only when desired, for instance when you suspect an error.
>>> Is there such a reader? Is the content organised to make it
>>> possible.
>>
>> I haven't seen one.
>>
>>
>> I did start trying to write an heuristic probabilistic OCR one 25 years
>> ago.  The idea being to overlay the OCR'd (displayed with matching
>> fonts) over the scanned content. ...
>>
>>
>>
>
> DJVU compression is somewhat analogous to this process, ...
>
> There was a somewhat scary case study on the web a few years ago (not
> sure if it's still out there, haven't been able to find it)

Here it is.
https://news.ycombinator.com/item?id=6156238

The compression method was apparently JBIG2, but it could also affect DJVU.

--Toby

> ... The risks are obvious(*).
>
> --Toby
>
>
> * - Hat tip to PGN. comp.risks digest.
>



More information about the cctalk mailing list