Definite's Extractor

My findings on Life, Linux, Open Source, and so on.

Tag Archives: chinese

Review on “Chinese Eye Tracking Study: Baidu Vs Google”

Review on “Chinese Eye Tracking Study: Baidu Vs Google”

Today I see an “interesting” post about Eye Tracking Study about Google and Baidu. “interesting” blog post. That post does make some valid points, such as reasoning in Q: What’s the difference in user experience between Baidu and Google?
and first two factors in Q: Why choose Baidu?.

However, that post has several major reasoning flaws:
1. The third factor in Q: Why choose Baidu? is misleading. Browser multi-tab viewing mode is required by all over the world, not only for Chinese.

2. The actual third factor is G.F.W. Baidu follow Chinese policy closely, and usually does show the target pages which are blocked by the firewall; Google on the other hand, does not comply as much as Baidu does, thus it’s likely that the search results lead to “dead” links, which upsets ordinary end users.

3. It claims that Chinese is hard to skim through because Chinese has too many characters without space to split the meaning in comprehensible way. It even tried to emphasize this point by providing following all-uppercase, no-space paragraph:


(To try to put in a Western conceptual framework, imagine how difficult it would be to scan meaning from this paragraph if our alphabet was extended to 2000 characters, presented in block letters and all the spaces between words were removed.)

That reason is quite silly. If that is true, Chinese would have abandon that writing system eons ago, as few can understand and willing to pass the writing through generations.

Actually, like comment 1 said, most concepts in Chinese can be represented in no more than two characters, native name seldom exceed 3 characters; European languages on the other hand, often require you to look through much more characters for a meaningful word.

Comparing display length, Chinese text looks much shorter than English, yet carries the same amount of information. Using his example:

(To try to put in a Western conceptual framework, imagine how difficult it would be to scan meaning from this paragraph if our alphabet was extended to 2000 characters, presented in block letters and all the spaces between words were removed.)

(以西方的概念架構來說,很難想像如果我們的字母增加至2000個,全以大寫顯示,移除空白的話,要怎麼讀這段文章。 )

As you can see, it doesn’t even occupy half the visual area if using the same font size. Shape-eyed readers might also notice that punctuation marks provide necessary space for scanning the meaning of the paragraph. 😛

Anime “Red pig” also provides another visual comparison among major languages in the beginning. 🙂


RHEL/CENTOS skype Chinese font display problem. (Skype 簡繁體中文顯示問題)

My skype can only show English and traditional Chinese, but not simplified Chinese.
I had not had motivation to solve this problem, because most of time,
Skype was used only as video conversation.

Nevertheless, today I decided to tackle it, and found some interesting results.
For those impatient, just run qtconfig4, then set the default font as
“Serif”, or “AR PL ShanHeiSun Uni/AR PL ZenKai”, if you can endure the bad English display.

How about other fonts? Well, first of all, font substitution does not work, at least Skype does not look at it. It seems only “Serif” has correct substitution rule. I did not dig further into fontconfig, but it works anyway.

我的 Skype 只能顯示繁體,而簡體字變成框框。
今天試了一下,發現只要在 qtconfig4 那裡把預設字型設為 「Serif」

至於為什麼會這樣?我也不知道。:-P我的 Skype 只能顯示繁體,而簡體字變成框框。
今天試了一下,發現只要在 qtconfig4 那裡把預設字型設為 「Serif」


[Solved] Opera Chinese font display problem.

I use opera every day, but it does not give me a good Chinese font display. Some character, such as 灣, the bottom part is cut.

I’ve traced the opera with –debugfont and –debugfontfilter, and found it used baekmuk (Korean font) instead. After that font removed, it solved the cut-to-half problem, but it caused new problem: the character became blurred.

I also tried qtconfig-qt4, but no prevail.

Finally, after playing with font setting in KDE4 control center (cmd: systemsettings), and finally it works!
What I did was:
1. systemsetting-> font -> use anti-alias: set Enabled
2. Press “configure” button.
3. Exclude range: set disabled.
4. Apply

And that’s it.

Are Pinyin and Zhuyin functional dependent on each other? Not really.

Many people think that Hanyu pinyin and Zhuyin are functional dependent on each other, that is, Hanyu pinyin can be converted to Zhuyin without losses, and vice versa.[1] I used to hold this belief, until I start developing libUnihan 0.5.

Originally, Hanyu pinyin design does have two-way functional dependency with Zhuyin.
however, practically Hanyu pinyin only cover 414/416 ~ 99.519% of Zhuyin (without tonemark).
The two exceptions are:

Zhuyin symbols and their pinyin equivalents Usually map to Example
ㄝ(ê), ㄜ(e) e 誒 (ㄝˋ) 惡 (ㄜˋ) are all map to è
ㄌㄩㄢ(lüan),ㄌㄨㄢ(luan) luan 攣(ㄌㄩㄢˊ)鸞(ㄌㄨㄢˊ) are all map to luán

Nevertheless, since some lüán do have luán as alternative pronunciation (such as 孿); and 誒 is sometimes interchangeable with 欸 (èi), which also sounds similar, no wonder almost none spot the difference.

BTW, there are couples of errors in [1]:

  1. Distinctive syllables should be 416, not 417, as no one actually pronounces 淋 as ㄌㄩㄣˊ (lǘn).
  2. 佛, 否, … etc. does have Zhuyin equivalent in pinyin.
  3. ㄜ, ㄝ cannot be distinguished by the rule he suggests. As explained earlier, ㄝ seems to be eaten alive if no medials (ㄧ,ㄩ) protect it.

The author also wondering whether Pinyin to Zhuyin conversion is workable. I can tell him now that his program works 99.519% of times if he were putting some effort on it. 🙂


  1. 1. 注音符號轉漢語拼音、兩個多年前的心願