KFS (KM) in China:
|VII Instant Messaging interview with Li Chen
Position: Programmer at a US multinational IT company
Interviewer: Stefan Broda
Topic: Technical implications on natural language processing: Chinese Language
Li Chen: yes
Author: I found you via bluepages when I looked for "NLP"
Author: natural language processing
Li Chen: it's my former job responsibility :)
Author: I'm doing a study on KM in China and was analyzing the differences
between latin based lanugages (English, German, etc) and Chinese
Author: The differences are striking. e.g. both languages have around 50,000 words. but in English and German, you use 10,000 syllables to make them while in Chinese you have less than 500 that must have immense implications on NLP, right?
Li Chen: that's interesting. I'm not very familar with German.
Author: German and English are very similar from the language structure point of view. German has more complicated grammar though (e.g. there are 3 types of "the")
Li Chen: I see
Li Chen: the size of syllable set make difference in speech recognition. In written language, same syllables may map to many (10-100) characters --- Chinese character is single syllables
Author: Im sorry that I contacted you without proper introduction. I just find the topic very interesting and was very interestd to talk to one of our experts in IBM.
Li Chen: It's all right.
Author: I also found that chinese characters are very different from latin based writing
Li Chen: can you read Chinese?
Li Chen: unfortunately, the pictograph information is lost in computer. A Character is just a double byte code
Author: I can read a little bit. But what would you say are the differences in algorithms regarding English and Chinese?
Li Chen: the major difference could be: for Chinese we usually rely on a dictionary to identify word, where in English you can use space as delimiters. The approach deeply affect the capability of handle new words. Since it's very hard to update the dictionary with new word, actually the criteria of what is a new word is very fuzzy
Author: you mean new Chinese words are more difficult to handle?
Li Chen: yes