test-drb December 2019

test-drb@ccmp.vtda.org

119 participants
88 discussions

by guykd＠optusnet.com.au

At 01:57 PM 2/12/2019 -0700, you wrote: >On Tue, Nov 26, 2019 at 8:51 PM Jay Jaeger via cctalk <cctalk at classiccmp.org> >wrote: > >> When I corresponded with Al Kossow about format several years ago, he >> indicated that CCITT Group 4 lossless compression was their standard. >> > >There are newer bilevel encodings that are somewhat more efficient than G4 >(ITU-T T.6), such as JBIG (T.82) and JBIG2 (T.88), but they are not as >widely supported, and AFAIK JBIG2 is still patent encumbered. As a result, >G4 is still arguably the best bilevel encoding for general-purpose use. PDF >has natively supported G4 for ages, though it gained JBIG and JBIG2 support >in more recent versions. > >Back in 2001, support for G4 encoding in open source software was really >awful; where it existed at all, it was horribly slow. There was no good >reason for G4 encoding to be slow, which was part of my motivation in >writing my own G4 encoder for tumble (an image-to-PDF utility). However, G4 >support is generally much better now. Mentioning JBIG2 (or any of its predecessors) without noting that it is completely unacceptable as a scanned document compression scheme, demonstrates a lack of awareness of the defects it introduces in encoded documents. See http://everist.org/NobLog/20131122_an_actual_knob.htm#jbig2 JBIG2 typically produces visually appalling results, and also introduces so many actual factual errors (typically substituted letters and numbers) that documents encoded with it have been ruled inadmissible as evidence in court. Sucks to be an engineering or financial institution, which scanned all its archives with JBIG2 then shredded the paper originals to save space. The fuzzyness of JBIG is adjustable, but fundamentally there will always be some degree of visible patchyness and risk of incorrect substitution. As for G4 bilevel encoding, the only reasons it isn't treated with the same disdain as JBIG2, are: 1. Bandwaggon effect - "It must be OK because so many people use it." 2. People with little or zero awareness of typography, the visual quality of text, and anything to do with preservation of historical character of printed works. For them "I can read it OK" is the sole requirement. G4 compression was invented for fax machines. No one cared much about visual quality of faxes, they just had to be readable. Also the technology of fax machines was only capable of two-tone B&W reproduction, so that's what G4 encoding provided. Thinking these kinds of visual degradation of quality are acceptable when scanning documents for long term preservation, is both short sighted and ignorant of what can already be achieved with better technique. For example, B&W text and line diagram material can be presented very nicely using 16-level gray shading, That's enough to visually preserve all the line and edge quality. The PNG compression scheme provides a color indexed 4 bits/pixel format, combining with PNG's run-length coding. When documents are scanned with sensible thresholds plus post-processed to ensure all white paper is actually #FFFFFF, and solid blacks are actually #0, but edges retain adequate gray shading, PNG achieves an excellent level of filesize compression. The visual results are _far_ superior to G4 and JBIG2 coding, and surprisingly the file sizes can actually be smaller. It's easy to achieve on-screen results that are visually indistinguishable from looking at the paper original, with quite acceptable filesizes. And that's the way it should be. Which brings us to PDF, that most people love because they use it all the time, never looked into the details of its internals, and can't imagine anything better. Just one point here. PDF does not support PNG image encoding. *All* the image compression schemes PDF does support, are flawed in various cases. But because PDF structuring is opaque to users, very few are aware of this and its other problems. And therefore why PDF isn't acceptable as a container for long term archiving of _scanned_ documents for historical purposes. Even though PDF was at least extended to include an 'archival' form in which all the font definitions must be included. When I scan things I'm generally doing it in an experimental sense, still exploring solutions to various issues such as the best way to deal with screened print images and cases where ink screening for tonal images has been overlaid with fine detail line art and text. Which makes processing to a high quality digital image quite difficult. But PDF literally cannot be used as a wrapper for the results, since it doesn't incorporate the required image compression formats. This is why I use things like html structuring, wrapped as either a zip file or RARbook format. Because there is no other option at present. There will be eventually. Just not yet. PDF has to be either greatly extended, or replaced. And that's why I get upset when people physically destroy rare old documents during or after scanning them currently. It happens so frequently, that by the time we have a technically adequate document coding scheme, a lot of old documents won't have any surviving paper copies. They'll be gone forever, with only really crap quality scans surviving. Guy

5 years, 11 months

One old Sol, Two old names...

by wh.sudbrink＠verizon.net

I've just had the pleasure of taking a new machine into my collection, a Sol 20. It's particularly interesting for several reasons. First, it was once in the possession of Jim Willing (zoom into the label next to the control key): http://wsudbrink.dyndns.org:8080/images/fixed_sol/20191125_195224.jpg For those that don't know, Jim was a very early collector of vintage computers and one of the first collectors to put up a web site with pictures of his collection, scans of documents and the like. Also, he was one of the first posters to the original classic computer mailing list: http://ana-3.lcs.mit.edu/~jnc/cctalk/ That's the first old name. Other interesting things about the Sol include that it has an 80/64 video modification (with patches all over): http://wsudbrink.dyndns.org:8080/images/fixed_sol/20191125_202606.jpg and a patched personality module socket with a custom ROM: http://wsudbrink.dyndns.org:8080/images/fixed_sol/20191125_195249.jpg which leads to the second old name. One that I don't know: http://wsudbrink.dyndns.org:8080/images/fixed_sol/20191125_211019.jpg Every time that the machine boots it displays that banner: *** DAN CETRONE *** I've done some googling but I can't find out anything about him. I've started to disassemble the contents of the ROM. There are some blocks that look like the Micro Complex ROM, but other sections don't match. I'll publish it when I'm done. Anyway, I don't know if Dan was the author or just wanted to uniquely identify his Sol. If anyone knows, knew, knew about, Dan, I'd love to hear about it. Thanks, Bill Sudbrink

5 years, 11 months

Scanning docs for bitsavers

by guykd＠optusnet.com.au

At 01:20 AM 3/12/2019 -0200, you wrote: >I cannot understand your problems with PDF files. >I've created lots and lots of PDFs, with treated and untreated scanned >material. All of them are very readable and in use for years. Of course, >garbage in, garbage out. I take the utmost care in my scans to have good >enough source files, so I can create great PDFs. > >Of course, Guy's commens are very informative and I'll learn more from it. >But I still believe in good preservation using PDF files. FOR ME it is the >best we have in encapsulating info. Forget HTMLs. I don't propose html as a viable alternative. It has massive inadequacies for representing physical documents. I just use it for experimenting and and as a temporary wrapper, because it's entirely transparent and maleable. ie I have total control over the result (within the bounds of what html can do.) >Please, take a look at this PDF, and tell me: Isn't that good enough for >preservation/use? >https://drive.google.com/file/d/0B7yahi4JC3juSVVkOEhwRWdUR1E/view OK, not too bad in comparison to many others. But a few comments: * The images are fax-mode, and although the resolution is high enough for there to be no ambiguities, it still looks bad and stylistically greatly differs from the original. Pity I don't have a copy of the original, to make demonstration scans of a few illustrations to show what it could be like, for similar file size. * The text is OCR, with a font I expect likely approximates the original fairly well. Though I'd like to see the original. I suspect the PDF font is a bit 'thic' due to incorrect gray threshold. Also it's searchable, except that the OCR process included paper blemishes as 'characters' so if you copy-paste the text elsewhere you have to carefully vet it. And not all searches will work. This is an illustration of the point that till we achieve human-leval AI, it's never going to be possible to go from images to abstracted OCR text automatically without considerable human oversight and proof-reading. And... human-level AI won't _want_ to do drudgery like that. * Your automated PDF generation process did a lot of silly things, like chaotic attempts to OCR 'elements' of diagrams. Just try moving a text selection box over the diagrams, you'll see what I mean. Try several diagrams, it's very random. * The PCB layouts, for eg PDF page #s 28, 29 - I bet the original used light shading to represent copper, and details over the copper were clearly visible. But when you scanned it in bi-level all that is lost. These _have_ to be in gray scale, and preferably post-processed to posterize the flat shading areas (for better compression as well as visual accuracy.) * Why are all the diagram pages variously different widths? I expect the original pages (foldouts?) had common sizes. This variation is because either you didn't use a fixed recipee for scanning and processing, or your PDF generation utility 'handled' that automatically (and messed up.) * You don't have control of what was OCR'd and what wasn't. For instance, why OCR table contents, if the text selection results are garbage? For eg, select the entire block at the bottom of PDF page 48. Does the highlighting create a sense of confidence this is going to work? Now copy and paste into a text editor. Is the result useful? (No.) OCR can be over-used. * 'ownership' As well as your introduction page, you put your tag on every single page. Pretty much everyone does something like this. As if by transcribing the source material you acquired some kind of ownership or bragging rights. But no, others put a very great deal of effort into creating that work, and you just made a digital copy. That the originators probably would consider an aesthetic insult to their efforts. So, why the proud tags everywhere? Summary: It's fine as a working copy for practical use. Better to have made it than not, so long as you didn't destroy the paper original in the process. But if you're talking about an archival historical record, that someone can look at in 500 years (or 5000) and know what the original actually looked like, how much effort went into making that ink crisp and accurate, then no. It's not good enough. To be fair, I've never yet seen any PDF scan of any document that I'd consider good enough. Works created originally in PDF as line art are a different class, and typically OK. Though some other flaws of PDF do come into play. Difficulty of content export, problems with global page parameters, font failures, sequential vs content page numbers, etc. With scanning there are multiple points of failure right through the whole process at present, ranging from misunderstandings of the technology among people doing scanning, problems with scanners (why are edge scanners so rare!?), lack of critical capabilities in post-processing utilities (line art on top of ink screening, it's a nightmare, also most people can't use Photoshop well, and it's necessary), failings built unavoidably into PDF, and not so great PDF viewer utilities. Apart from the intrinsic issues (aside from a few advantages) with on-screen display and controls compared to paper. I hope I have not offended you. Btw my pickiness comes from growing up in a family with commercial art, typography, printing and technical art involvement. And having in later years assisted a little with such things. So at least I know how much effort goes into such things. Keep the original. Methods and utilities will improve, and in 10 or 20 years it may be possible to make a visually perfect digital copy (with minimal effort), worthy of becoming a sole record of that thing (if history goes that way.) Guy

5 years, 11 months

InfoWorld - May 11, 1992 (3" disk formats)

by sellam.ismail＠gmail.com

I thought this was fun; stumbled upon it while looking for what words of wisdom Fred had to share about the format of 3" disks: https://books.google.com/books?id=7D0EAAAAMBAJ&pg=PA86&lpg=PA86&dq=3%22+flo… Hopefully Fred will see this and tell me whether the 3" disk format was MFM or GCR given that the Orwellipedia says the 3" disk format was initially designed to work with the Apple ][ floppy drive interface. https://en.wikipedia.org/wiki/History_of_the_floppy_disk?fbclid=IwAR2atb2Z_… Sellam

5 years, 11 months

The Internet Archive

by rickb＠bensene.com

Ethan O'Toole wrote: > We owe a ton of props to the Internet Archive. While they might not have > everything, they have a glimpse into the early days of the internet and > have been at it since early on. Here here. I very much second Ethan's sentiments regarding the Internet Archive. It's a daunting effort to scrape and store all that information. Fortunately, deduplication and compression technologies have come a long way, and long-term, online storage of large amounts of data processed as such has become much less expensive due to the huge decreases in the cost-per-bit of spinning rust. Despite all of that, it's still a lot to store, and even with these technologies, there are costs involved for staffing, servers, as well as continually adding storage. Any and all support the Internet Archive can be given is well-deserved, in my opinion. Shameless plug: I make regular donations to the Internet Archive, and right now, they are have a 2-to-1 matching gift campaign going on due to pledges from corporate and institutional donors, so if you possibly can make a donation, head over to https://archive.org and give help support this valuable /free/ resource. I just made a $25 donation myself. Every little bit helps. Best wishes for a happy and safe Thanksgiving holiday to all, -Rick -- Rick Bensene The Old Calculator Museum http://oldcalculatormuseum.com Beavercreek, Oregon USA

5 years, 11 months

191202 Classic equipment available & my bad year.

by dave.dunfield＠gmail.com

Hi, made a number of updates to the sale pages on my site, and brought back a copy of my commercial site (good for downloads). Unfortunately I screwed up the .html pages and lost some links. Should all me fixed now. Added an FAQ some more parts (eg: 8008 CPI for MOD8), some sample pricing (please see FAQ before complaining). If you've looked at the site before, do refresh each page as you go to it as many browers cache page and will happily show you the old one. http://www.classiccmp.org/dunfield/sale/index.htm Dave

5 years, 11 months

System/36 Twin-Ax decode help

by alan＠alanlee.org

All, I've recently scratched a curiosity itch on what it would take to build a multi-port Twin-Ax to WiFi bridge. The electrical interface is easy enough and ESP32s are cheap. So I built a bridge PCB-to-FPGA adapter and connected my System/36 (5362), an InfoWindow II (address 0 and 1), and my board during IPL and sign-on to see what I could sniff. The result is here: https://www.retrotronics.org/tmp/s36_ipl_twinax_decode_30nov19.zip I get occasional decode errors called out with 'BAD FRAME'. The [SPF] next to bytes mean bad start bit (0), parity error, or non-zero fill bytes respectively. And I occasionally get a sync pattern followed by either illegal Manchester transitions or return to idle without any bytes (and thus no address) - the zero frames in the log. My main question is I need help on the next step. For a brief moment, I was under the impression SNA LU6 or LU7 ran on top of the Twin-Ax line layer. But that doesn't appear to be the case. I'm not sure it's direct 5250 either. Can anyone familiar with IBM-Midrange-World take a look at the decode and point me to the next protocol layer up the stack? Even the slightest breadcrumbs would be appreciated as I know very little about the Midrange world. Additionally if anyone is familiar with the wire-level and could assist on some of the framing errors, that would help as well. The twin-ax cables are less than 2m each so the line should be 100% clean. The problems are likely something I am doing wrong in the interpreter. Thanks, -Alan Hightower

5 years, 11 months

PC 9821 hardware

by imp＠bsdimp.com

Greetings I think the time has come for me to part with my collection of PC 9821 hardware. It has deteriorated over time, but I think it all still works. I have two laptops and a desktop system. I used it to test FreeBSD/pc98 for years, but support was dropped a few years ago and I have no further need for it. It's a bit oddball for here, perhaps, but I don't want to just scrap it all... Anybody interested? Warner

5 years, 11 months

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

test-drb December 2019