Postby baihe shifu » Sun Apr 15, 2012 2:48 am

Okay all, I'm looking for a good (actually working) program that can take PDF scanned copies of Chinese Book pages with Chinese Text and will convert same in to English text! Surely there must a be a program "out there" that can do this. Let me know???
Postby Sanfung » Tue May 29, 2012 5:11 pm

There are optical character recognition programs available that can take Chinese text and transfer it into a text file, though it might generate a lot of errors. If you're willing, you could theoretically use an IME Pad to individually draw out characters. You can install an IME Pad for Chinese, Japanese and Korean text from the Microsoft Windows XP disc. Assuming, of course, you're running that operating system.

If you're just looking for a program that can take Chinese text and attempt to transfer it into English, you might want to have a look at Bing Translator:

or the old standby of Yahoo! Babel Fish:

Either can do a passable job, but it still helps to have a working knowledge of the language you're trying to read. You'll naturally get very robotic and sometimes very off text in this way, so do beware.

Are you trying to work with Traditional or Simplified Chinese?
Postby chh » Tue May 29, 2012 9:45 pm


Google translate can now process .pdf documents. If you go to, and select 'translate a document' it will let you upload a .pdf and select the language and target language. I quickly tried it with a rather long document in Chinese and it produced a reasonably good looking result, in terms of what translation programs are capable of right now.

Like Sanfung mentioned, you may need to do optical character recognition on the document if it hasn't already been done. For all I know, google translate will do this too. If it won't accept the document, that could be one problem.

No program is going to do a great job of giving you a translation. There will be chunks of what you get back that don't make sense or aren't interpretable as English. If the document is archaic or filled with jargon (like a martial arts text might be) this could be especially problematic. Google has a pretty awesome translation service, but it won't do what a knowledgeable human translator can do. One option would be to make note of the bungled parts when it looks like an important bit, and then show it to a native speaker of Chinese if they're willing to help.

Good luck.

Edit: good lord, the document google translate returned actually looks kind of rank, although I don't have any other translation to compare it to. If you try a couple of the suggestions in this thread and one works especially well, please let us know- I'd like to know what's out there!
Postby yeniseri » Wed May 30, 2012 12:04 pm

Chinese does not translate well with techno-scientific documents. Same applies to yangshengong/qigong sources. If you have some previous exposure then you can sift through the 'weird odd' english derivation of the original.
Normal English translation for basic documents should be an easier task, thought
Location: USA

Postby Sanfung » Thu May 31, 2012 2:42 am

Machine translation of normal documents sounds rather robotic, but it often ends up passable. While grammar might take a serious hit when a computer mangles a piece of text, someone with at least a bit of knowledge about the language being processed should be able to handle it. However, like chh and yeniseri said, machine translators have a rough go of technical jargon.

The one thing I have found machine translators useful for is converting between Traditional Chinese text and Simplified Chinese text. I have a hard time reading Simplified Chinese characters myself. Some machine translation websites can also romanize text written in a non-roman language, though these are usually focused around turning Japanese Kanji, Hiragana and Katakana text into roman text. Let me know if any of these might be helpful.

You might want to try translating little bits and pieces at a time. If you translate a couple of sentences, they might make a little more sense. Is there any way you can copy and paste from the PDF files, or are they just made up of photographic scans of the books in question?
