Gaertk at aol.com writes: > In a message dated Fri, 5 Jul 2002 3:07:43 PM Eastern Standard Time, David Dyer-Bennet <dd-b at dd-b.net> writes: > > >Obviously you need to have a flatbed scanner to do this, and OCR > >software. The package I used is downloadable for a 15-hour trial > >(hours of actual use, not elapsed hours). And you need to be willing > >to risk or sacrifice a copy of the book in question; we don't have a > >budget or a pile of free copies of the books sitting around anywhere. > > > >It was easiest to do with the pages loose rather than bound, which was > >pretty easy to arrange -- for $0.50 Kinko's cut the binding off for me > >on their big guillotine paper cutter. This does have the downside of > >ruining the book; but a used paperback in poor condition works fine > >for scanning. Doing it with the pages bound probably ruins the book > >anyway, through pressing it down flat on the scanner. When we get to > >the rare books, it may be worth considering alternatives, like > >photographing the pages with a digital camera. > > I scanned my copy of _Athyra_ on a flatbed scanner, and it > didn't cause much damage. It won't be mistaken for a new > book, but it's still in better condition many other books > I purchased used. In fact, the only "damage" seemed to be > a tilt to the spine, and that disappeared after being > wedged on my bookshelf for a couple months. > > BTW, whose job is it to correct all the mistakes the OCR > software puts in? For those of you who haven't tried > scanning text, current state of the art OCR software is > about 99% accurate. That means about 1 out of every 100 > *characters* will be wrong. This is why Eric Flint over > on rasfw said its cheaper to simply hire a professional > typist to retype the whole thing. The software I used produces copy considerably cleaner than that produced by a good copy-typist after you run through it's "spell check" function. That both does real spell checking (and you have to add a number of weird names to the dictionary when working with fantasy), and *also* in this product lets you look at and correct any word recognitions that it's doubtful about. It shows you both the bitmap image and the recognized text, and you can go in and edit the text arbitrarily to fix things it *didn't* have a question about if you happen to notice them. My 2-3 hours for each of the first 3 Vlad books was *including* the time to run through this spell / confidence check. I'm hoping to get results of roughly that quality from other people as well. (I found several actual typos in Jhereg in the process). I looked into commercial scanning and retyping services, and paying friends to type stuff in, and after I tried a good modern OCR product I find that OCR is *immensely* faster/cheaper than the other options. A good commercial typist won't exceed 100 words/minute over the course of a novel, so call it 1000 minutes or 16 2/3 hours for 100,000 (a good average, though the first three are shorter than that). Before any proofreading. I did Jhereg in *2* hours, *including* the spell / confidence check. (The current state of the art is considerably better than 99% on pure recognition, and *then* uses the dictionary to figure things out after that.) My experience with 1996-vintage OCR strongly suggested that paying people to type them in would be a win if I had to work with that level of OCR. -- David Dyer-Bennet, dd-b at dd-b.net / New TMDA anti-spam in test John Dyer-Bennet 1915-2002 Memorial Site http://john.dyer-bennet.net Book log: http://www.dd-b.net/dd-b/Ouroboros/booknotes/ New Dragaera mailing lists, see http://dragaera.info