Re: [orthodox-synod] OCR of Old Orthography Russian
- Fr Alexander Lebedeff wrote:
> Russian in old orthography **can** be successfully produced on web pages.JRS: I clicked on the link, and immediately saw the text spelled out in the old orthography,
> This site requires the user to have (or download) a particular font
> (Palatino Linotype).
without needing to download or install anything.
If one turns to the section titled "old orthography", there are some examples of ambiguity
caused by the suppression of the letter yat'!
A parallel in English: a man sent his wife an e-mail to the effect that he had obtained tickets
to the opera that night; but "gotten" came out "got ten", so she invited all her friends, only to
have an embarrassment!
Many pairs of Russian words are distinguished in writing only by the letter yat'.
Fr. John R. Shaw
- Dear Stephen,
There IS a program that recognizes the Russian Old Orthography. It's called ABBYY (that's the company name) "FineReader" Last I tried it was version 7.0, but they might have a newer version by now.
It's a Russian company and it's **much** cheaper to order it straight from Russia. I actually did that at one time, though can't remember how--I think through the Russian web page, but it could have been through another Russian site. If you order from America it's hundred of dollars--from Russia, only a fraction of that. I seem to have the "Professional" edition and there must have been a reason I would have paid the extra--maybe that was the only edition that supported the "extra" languages.
When you scan, before you do anything else, you have to go up in, I believe "Tools," and select something like "additional languages, Russian Old Spelling" e-mail me after you get it and I can walk you through it if you like. You also have to set the output font. Set it to your Old Orthography fonts. I have "Royal Times New Roman" and "Royal Arial," which you can do a search for, find, and download--or, if you like and the mail program doesn't kill the attachment, I can try e-mailing it to you. As far as I know, you have to set these settings EVERY time you use it, which is a pain, but worth it. I also found a cool decorative title font to use--can't remember the name of it right now but if you e-mail me I'll tell you. THere is also "MS Mincho" which has extra spacing between the letters and is very 19th century but neat for titles.
You can then output to Word or whatever. This scan engine is excellent. Once you train it you will have hardly a mistake. Except that I have not yet figured out how to get it to read two fonts at once. It makes mistakes on my title fonts (like reading a "c" as a "e" because the font is fancy, but it does fine on the main font. If anyone who has this can clue me in on how to get it to read both of them at once I would be grateful. As it is, it's not a big deal to correct a few letters in a title, as opposed to the whole long text.
Hope this helps. Good luck with your project and please let me know your web site!
Does anyone have any experience of OCR of scanned old orthography
Russian? I would like to be able scan printed material in the pre-
revolutionary alphabet to be published on the web (which so far as I
know, would require converting it to the modern Russian alphabet).
I would be most grateful for any information on which OCR programs
(if any) provide recognition for the additional letters.
Archives located at http://www.egroups.com/group/orthodox-synod
Yahoo! Groups Links
- ABBYY FineReader OCR package can read old russian orthography quite
well, I've used version 7, now I see that they have version 8 on
their website. However, I do not know what is included in their
standard US version.