Definite's Extractor

My findings on Life, Linux, Open Source, and so on.

Monthly Archives: August 2009

On ibus-xkb intergration (2)

Today I re-read Hutterer’s blog, I find out he also suggested that language toggle can be done by switching keyboard layout. Unfortunately, input methdos are far more complex than that because of following reasons:

Input methods that don’t really bind to a keyboard layout

Namely, pinyin/phonetic based input methods. Pinyin just combines English alphabets to Chinese characters, no matter on QWERTY, DVORAK, or COLEMAK. A dvorak user expects dvorak “pinyin-layout”. But what should be shown in language bar?

Indeed, Peng Huang can copy-paste en-QWERTY and en-DVORAK to make zh-QWERTY-pinyin and zh-DVORAK-pinyin, but COLEMAK, QWERTZ, AZERTY users won’t be supported until Huang explicitly copies those layouts. Doesn’t that make layout package bloated?

Input methods that share keyboard layout

Some input methods share keyboard layout. For example Cangjie 3, Cangjie 5, Quick 3, Quick 5 all use Cangjie layout. What input method should zh-Cangjie mapped to?

One key, multiple symbols

It’s not uncommon multiple symbols are mapped to one key.
If I take it correctly, the keyboard layout is essentially key position -> keysym.

But think about input metod and keyboard layout on a mobile phone keypad. What symbol should the top right key map to? Normally it should be 3.
But you still need an IM to interpret “33”, which might be either “de” with T9,
or “e” with “standard abc”.

Perhaps, that’s why MS Window use the term “keylayout/input method” for their input support. 🙂

Back to ibus-xkb intergration.
Here I don’t want to discuss the technically details but UI.

There would be a list of language – input method – keyboard layout (or xkb setting) combination. Like:

en,,,us,
en,,,us,dvorak,
en,,,us,dvorak-classic,
en,,,gb,
en,t9,,keypad,
en,morse-code,,straight-key,
jp,anthy,,us,
jp,anthy,,jp,
jp,anthy,,jp,kana
zh,pinyin,,*,
zh,cangjie3,,us,
zh,cangjie5,,us,
zh,chewing,,us,
zh,chewing,et,us,
zh,chewing,hsu,us,

1st field is language.
2nd is input method (can be null if don’t need it ).
3rd is input method variant, like the custom Zhuyin layout (can be null if don’t need it ).
4th is keyboard layout (in xkb sense). ‘*’ is for following the system default layout.
5th is variant (in xkb sense).

The benefit of choosing these combinations are

  • Can invoke input method support on-the-fly. Invoke the IM module if you need one, deactive or free the IM module if you don’t need it.
  • Compatiable to both IBus and xkb.
  • IBus know which layout is needed for the IM.
  • Also support keypad and morse code input. 🙂

Although I prefer to let IBus handles the language combination switching,
I would like to know the better approach.

Usefullness of Keyboard layout in input method.

My previous post elaborates the difficulties and reasons why most of the input methods developers won’t adopt to the xkb key layout framework. To sum up, if your input method needs and relies on exactly one layout (usually en-QWERTY), you can safely hug that layout if ibus is capable of setting that layout for you. That’s why I keep nagging Hutterer about the set/get function of xkb.

However, Peter Hutterer opened my eyes and leaded me to a new aspect to review the input method implementation. Using key layout terminology, Chinese input methods can have up to 6 levels:

  1. Input symbol level: Word roots or Pinyin/Zhuyin symbols.
  2. Lower case full width alphanumber and punctuation marks .
  3. Upper case full width alphanumber and symbols.
  4. English lower case alphanumber. (if IM support temporary English mode)
  5. English upper case alphanumber. (if IM support temporary English mode)
  6. Special/User-defined symbols.

We might also have a quasi-level: Selection key level. Although “1234567890” are widely used to select candidates, some people, however, find “asdfghjkl;” more effective. Doesn’t that deserves a quasi-level? 😛

Althogh libchewing does implement Zhuyin layout conversion and support full width alphanumber input. Using key layout representation has it own benefits:

  • Better screen keyboard support.
  • Better setting ui: so users can bind their own level triggers to every level.

So, thank you, Peter Hutterer!

On ibus-xkb intergration

Couples of days ago, Peng Huang (the IBus author), Peter Hutterer (an xkb guru) and me were talking about integrating xkb and IBus. Peter Hutterer suggested that each input method should registers its own input symbols (e.g. Cangjie / Wubi word roots or Zhuyin symbols) to as an xkb layout. However, Huang did not seem too keen on this.

I, on the other hand, was eager to adopt in this idea. In chewing, there are 8 Zhuyin layouts to be dealt with, even after ignoring the QWERTY and DVORAK influence, still has 6 layouts. By adopting Hutterer’s framework, I can concentrate on converting Zhuyin symbols to characters without worrying the current system layout.
However, as a Chinese IM developer, I kind of understand why Huang was not interested and the humps on the road.

Hutterer’s concerns:

  • Easier input method development. IM can just concentrate on converting their own symbols.
  • Neither keysyms of English nor keycodes are reliable. So are IMs on top of them.

Problems of why most IMs won’t adopt the proposed framework:

  • Perceived awareness: Most of the IM developers only know one or two layouts, they don’t know why this is important.
  • Cross platform: How about the systems that don’t support the proposed framework?
  • Bureaucracy: Currently IMs only need to submit their works to IM framework; with the proposed framework, they also need to submit the corresponding symbols to xkb community, which are alien to them.
  • Input symbols are meaningless: Unless there is a corresponding IM to process them. Some input symbols might not even be in Unicode, so why bothers registering them as xkb layout?
  • Console mode: Even if IM developers were diligent enough to register in xkb, their works would not be appreciated in console mode. FYI, ibus-fbterm is usable now.
  • Selection keys: In Zhuyin, for example, key ‘1’ may either means ‘ㄅ’ or “select the 1st candidate”. Proposed framework does not quite addressing this, because IM developers still need to interpret what does the key means.
  • Assumption of en-QWERTY is always available: Even if keycodes do change, as long as en-QWERTY is available, just hook on en-QWERTY layout keysyms then you will be fine. This is probably the main reason. 😛

We cannot really do much about the perceived awareness, cross platform, meaningless input symbols issues. But, we can join force with console key layout developers/maintainers, so “define once, use everywhere” can be achieved.

The “Selection keys” issue, which I confirmed with Hutterer about, should be dealt within IBus or IM level.

[RHEL5] Tip about setting wpa_supplicant under xen.

I’ve read some forum articles and blogs which suggest to remove mac address of wireless device to get wpa_supplicant working.

However, this is only true when the wireless device is the only network card you activate.
If you have more than one network cards, e.g. wire and wireless network card, then you need to keep the mac addresses, otherwise kernel might get confuse and assign an unexpected name (such as wlan0) to the wireless. Consequently, wpa_supplicant cannot find the correct device to activate.

[RHEL5] Set up wpa-supplicant with xen

I was looking for a solution to tame wpa_supplicant inside of xen. After some intensive Googling, finally I found a Red Hat Virturalization Guide that shows how to set xen network on a laptop.

So, if your are a RHEL coustomer, please install Virtualization-(your locale) to get the document. 🙂

Fedora 11 vs Windows 7

My dad have an old computer which is loaded with WinXp. It had had a memory problem. After it is fixed, I then download both Fedora 11 live CD and Windows 7 RC install DVD and do some quick tests.

Machine Spec

I didn’t bother to dig the detail. Nevertheless, that machine has two special devices:
a multi-purpose build-in card-and-usb reader, and a LCD TV monitor.

Download

Fedora 11: Using BT download, no drama.

Windows 7: It has a downloader. Unlike other downloaders which are obviously independent programs, it seems like windows 7 downloader depends on IE. The download was reasonable fast. However, after I upgrade to IE 8 and reboot, the downloader refused to continue downloading until I removed some incompatible plug-ins and granted some permissions.

Boot and Install

I tried live USB, but it was not working on that machine. It works with two other laptops though. Luckily, my dad has plenty of blank CD and DVD.

Fedora 11: Kind of work-out-of-the-box. One thing did bother me is that the screen skewed to the upper left. I haven’t test the networking and other stuff, because I didn’t want to mock with ADSL.

Windows 7: After boot, the screen went blank and with some words that read “Not supported“. That LCD TV supported up to 1024×768.

Conclusion

Just some simple quick test, so no need to exaggerate.

Fedora 11: Need some final fine tune to put the screen in the middle.

Windows 7: Unable to perform the further test. Maybe they should be more conservative on video mode or provide a easy-to-spot text mode installation.

Proposal for voice data naming guide

I’ve tried out gcin’s voice data, it’s neat, interesting, and useful.

Since it does not depend on gcin, I wish to pack it as an independent package, so other packages can use it. However, generally, what should we name it and other voice data?

How about: voicedata-<locale>-<generated_method>-<source>-<variant>
Where

  • locale: Locale string like en_US, zh_CN…
  • generated_method: The algorithm or synthesizer that generate the voice, or “realperson” if the recorded voice is from a real person.
  • source: Name of the project or organization that provides the voice.
  • variant: Optional field for noticeable info, such as the person who provide the voice, or parameter of the synthesizer.

    Thus in this naming guild, gcin’s voice data should be named as:
    voicedata-zh_TW-realperson-gcin-EdwardLiu