ejp626 Posted October 8, 2009 Report Posted October 8, 2009 With GOCR I'm not sure if there's some sort of a wrapper existing that lets you "batch OCR" multiple images (example all tiffs in folder my_book: /images/my_book/*.TIF) and stream them into ONE text file. On my platform I've been using tesseract in combination with ocube that let's you do exactly that. There should be a tesseract binary for windows but I don't know if there's a wrapper equivalent to ocube. Maybe try a web search for windows batch ocr or something like that. This is a little out of my comfort level. But I do have a question about GOCR. Does it handle tables at all? Or allow part of a page to be saved as an image (and then embedded into the resulting document)? Those are things I absolutely need before I start monkeying around with learning a new OCR software. If not, I'll just stick with some of the commercial versions. Thanks. Eric Quote
AllenLowe Posted October 9, 2009 Author Report Posted October 9, 2009 (edited) well, it looks like I may have to make a go of it myself - Kinkos would end up in the $700-$800 range and I can't do that - I really do appreciate all the advice - BUT - can we distill this to a single program, that works with Windows XP, text only, one sheet at a time - (I don't mind if it takes a few weeks to do this) - that will convert to an editable word format? And one that I can buy easily? Edited October 9, 2009 by AllenLowe Quote
Dan Gould Posted October 9, 2009 Report Posted October 9, 2009 well, it looks like I may have to make a go of it myself - Kinkos would end up in the $700-$800 range and I can't do that - I really do appreciate all the advice - BUT - can we distill this to a single program, that works with Windows XP, text only, one sheet at a time - (I don't mind if it takes a few weeks to do this) - that will convert to an editable word format? And one that I can buy easily? I don't think there is a single program, but like I said, you can output PDF files from a scanner such as mine (but a flatbed like mine would take a long time) then use the software I mentioned to convert PDF to Word. The only time consuming thing there would be you'd open each PDF file separately to convert it, creating 350 word docs. Then you'd have to copy and paste 349 times to bring it together into one Word Doc. Quote
Aggie87 Posted October 9, 2009 Report Posted October 9, 2009 If you simply commit to typing in 10 pages a day, you'd be done in 35 days, a little over a month. 5 pages a day = 70 days. 5 pages a day wouldn't be all that much typing, really. 30 minutes max per day, if that much? Unless you're a two-finger typist. Quote
AllenLowe Posted October 9, 2009 Author Report Posted October 9, 2009 I type about 40 words/minute, that's about 8 minutes a page, but it's just something I really do not want to do (also have carpal tunnel in both hands, manageable but problematic). that may be the way to go, unfortunately, as money is a problem at least until next spring. Quote
ejp626 Posted October 9, 2009 Report Posted October 9, 2009 Well, AbbyyFinereader Express (which I will probably order) cost about $50 and runs on XP. A bunch of these OCR software packages give you a 15 day free trial, but you can only OCR one page at a time -- which sounds exactly what you are trying to do, so you might go that route. I still don't understand the GOCR well enough to try it, but there is surely a way to get that working on your system as well, and it should be fine for text only -- and it is free. Main bit of advice is to save often (not as big a deal when doing one page at a time). Quote
Christiern Posted October 9, 2009 Report Posted October 9, 2009 When I was preparing to update and expand my Bessie Smith biography (which I originally wrote on a typewriter), I used OmniPage (a Mac version) to scan in all the pages from an original edition. It really didn't take long and it was pretty accurate (the software should be even better now, 7 years later). You have to go over it, page by page, to catch the inevitable mis-reads, but that is to your advantage, because you will always find something that cries out for change. Were I to do the same thing today, I would probably read the book into my Mac. I find that to be even more accurate these days (I use MacSpeech Dictate). Much of what I post on my blog is dictated. Quote
ejp626 Posted October 9, 2009 Report Posted October 9, 2009 When I was preparing to update and expand my Bessie Smith biography (which I originally wrote on a typewriter), I used OmniPage (a Mac version) to scan in all the pages from an original edition. It really didn't take long and it was pretty accurate (the software should be even better now, 7 years later). You have to go over it, page by page, to catch the inevitable mis-reads, but that is to your advantage, because you will always find something that cries out for change. Were I to do the same thing today, I would probably read the book into my Mac. I find that to be even more accurate these days (I use MacSpeech Dictate). Much of what I post on my blog is dictated. Yeah, I used to really like OmniPage, but Caere was bought out and the buzz is that the newest versions are buggy and when you need to download a driver or something you are out of luck. Oh, and they charge you $20 to speak to anyone in customer service when this happens. Quote
Van Basten II Posted October 10, 2009 Report Posted October 10, 2009 Am i the only one who everytime he sees this thread thinks that Allen has gotten into Radiohead ? Quote
spinlps Posted October 10, 2009 Report Posted October 10, 2009 (edited) Here's another *free* possibility: Google scans and OCR's all PDF's it crawls. Allen, you would have to scan the book into PDF format and load it up to a web page for Google to crawl. Google will crawl the document and output to HTML for web viewing. At that point, you can open the HTML file in Word and save it as a Word file. OR - Send me the unbound book. I'll scan them to PDF on the whiz-bang-copier-imager-printer-scanner and just convert the file to Word for you since I sent you on the $800 Kinko's OCR safari. PM me if interested. Edited October 10, 2009 by spinlps Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.