Definite's Extractor

My findings on Life, Linux, Open Source, and so on.

Tag Archives: input method

ibus-chewing-1.5.0釋出

除了修了一些 bug 之外,這個版本有幾個亮點:

  1. 使用者可以選擇在系統匣顯示「中/英」以及「全/半」狀態,如圖:
    systray
    Gnome 3的使用者可能沒法看到,但是其他桌面環境諸如 KDE/Plasma、XFCE、LXDE、LXQT,或是支援systray的視窗管理器如 fluxbox 可以看見。
    在「中」圖示按滑鼠左鍵切換「中/英」,右鍵切換「全/半」。也可用鍵盤 shift 鍵切換「中/英」,shift-space 切換「全/半」

    啟用/停用: 進入設定畫面後,在「鍵盤(keyboard)」分頁中,選項「在系統匣中顯示圖示 (Show systray icons)」。

  2. 更好地處理 Caps Lock 及中英切換。
    現在你可以選擇是用 Shift 來切換中英,或是 Caps Lock 切換中英。
    喜歡用 Caps Lock 切換英數大小寫的使用者可以停用「Caps Lock 切換中文模式」,這樣就不用擔心輸入英文時無法用 Caps Lock 切換大小寫。

重新檢視IBus-table 中文輸入法版權

前一陣子接手ibus-table以及所有的ibus-table 的中文輸入法。
看到即使是財大氣粗的Microsoft 都在一審對中易敗下陣來,想說那就對這些中文輸入碼表好好作一番檢視吧。
成品(?)在此

Tom Callaway

[Input Style] What’s Over-the-spot, on-the-spot … etc?

Last Friday I saw an ibus issue about input style support in ibus-anthy. The maintainer, Fujiwara insist that editing in the candidate window is not "over-the-spot".

Ok, time to do literature review:
According to sun and Mozilla, preedit area is INSERTED into the inputing spot in on-the-spot, the text after the input spot WILL be pushed to the right when preedit area expend; while preedit area is PUT OVER the inputing spot in over-the-spot mode, the text after the input spot WILL NOT be pushed to the right.

Reference from IBM tells different story. Over-the-spot, as the page states, is the mode that candidate window closely followed the input spot, but the preedit string is formed in candidate window.

Java also has its own definition. Below-the-spot is the term for IBM’s over-the-spot.

Summary:

After the intensive web search and discussion, we conclude that we should use some thing like “Embedded preedit in client application” to avoid confusion.

opera qt4 and ibus

Once upon a time, opera does not get along with ibus well by default. Normally, your have to set
QT_IM_MODULE=xim
in order to make the input work.

However, that workaround still come with problem: Whenever you restart ibus, opera will go down with it, though it will manage to crawl back, but what you type is lost forever.

Given ibus provide qt4 interface, I went to find opera qt4 build.
I’ve tried opera 1010 beta qt4 static build, it turned out working well.
But not shared build, according to Peng Huang, it crashed with ibus-qt4.

There is a thread about the differences between qt3 and qt4 at here
, which says that the main difference is merely the skins. That’s incorrect. At least it provides some remedy for input method users.

Up coming I18n package management

Yesterday, Jens and I was discussing about the yum langpack plugin, and a recent bugzilla bug ([Bug 518395] add @input-methods to Package Selection screen). During the discussion, we find out that there are at least following issues to be considered:

  1. Aspect of Language Support:
    1. Language pack
    2. Input Method/Keyboard layouts
    3. Fonts
    4. Spell Checking
    5. ….
  2. Type of package i18n support
    1. Translation files, such as .mo
    2. Language packs. This may involve naming pattern detection.
    3. None

At first stage, we postpone the technical details of handling internationalized packages, but concentrate on obtaining and storing the users’ i18n preference.

For example, Caius wants a set of fonts that covers all language, as far as possible, an he understands English, Chinese, and Japanese, thus he needs corresponding input methods and spell checking. He prefers to see menus and items in Japanese, so Japanese langpacks are required. He also need an English spell checker

Storing

His preference can be stored as:
/etc/sysconfig/lang-setting/langpack:

ja

/etc/sysconfig/lang-setting/input_method:

ja@anthy
zh_HK@cangjie5

/etc/sysconfig/lang-setting/spell_check:

en_US

/etc/sysconfig/lang-setting/font:

*

Usage

Jens’ langpack plugins gets benefit from this approach. The plugin determines whether to add or remove langpacks according to langpacks.

And for anaconda and system-config-language, this approach provide further knobs to tune. For example, IBus will be pulled in as dependency if there is at least one Asian languages in input_methods. Spell checkers and even voice data can be dealt with the similar way.

Fonts are a bit different. By default, we installed a set of fonts that cover languages as much as possible. But on resource-limited systems, system admin can exclude fonts by adding a “-” in front of the languages to be excluded. Or alternatively, just list the language you want. Say, the OLPC to Uganda can be loaded with:
/etc/sysconfig/lang-setting/font:

en
sw

User Interface in setting dialog.

Regarding UI, I suggest something like:

Overview Language Pack Input Method Font
Language Name
English (US)
Chinese (HK)
Japanese (JP)
Language Group↓ Language↓
Language pack Input method
Fonts Spell check
Add Delete Ok Cancel

Please forgive my terrible html table drawing skill. Mmm, may be I should use glade to produce a real dialog. 🙂

On top of the “dialog(?)” are pages of GNotebook. Click on other pages for more advance settings. This page is for overall setting.

In overall setting page, we have an active list for showing current active languages, the degree of support can be either listed here or in individual advance setting.

In the middle of setting page, there is a language group pull-down which lists the geographic division of languages, such as Eastern Asia, Middle East,… etc.

The contents language pull-down is narrow down by the language group. Thus a user do not need to scroll through hundreds of languages.

Below the pull-downs are aspects of support checkboxes. They are self-explaining.

On the bottom there are buttons that add/delete the language from/to the active list.
and Ok/Cancel to confirm the modification.

On ibus-xkb intergration (2)

Today I re-read Hutterer’s blog, I find out he also suggested that language toggle can be done by switching keyboard layout. Unfortunately, input methdos are far more complex than that because of following reasons:

Input methods that don’t really bind to a keyboard layout

Namely, pinyin/phonetic based input methods. Pinyin just combines English alphabets to Chinese characters, no matter on QWERTY, DVORAK, or COLEMAK. A dvorak user expects dvorak “pinyin-layout”. But what should be shown in language bar?

Indeed, Peng Huang can copy-paste en-QWERTY and en-DVORAK to make zh-QWERTY-pinyin and zh-DVORAK-pinyin, but COLEMAK, QWERTZ, AZERTY users won’t be supported until Huang explicitly copies those layouts. Doesn’t that make layout package bloated?

Input methods that share keyboard layout

Some input methods share keyboard layout. For example Cangjie 3, Cangjie 5, Quick 3, Quick 5 all use Cangjie layout. What input method should zh-Cangjie mapped to?

One key, multiple symbols

It’s not uncommon multiple symbols are mapped to one key.
If I take it correctly, the keyboard layout is essentially key position -> keysym.

But think about input metod and keyboard layout on a mobile phone keypad. What symbol should the top right key map to? Normally it should be 3.
But you still need an IM to interpret “33”, which might be either “de” with T9,
or “e” with “standard abc”.

Perhaps, that’s why MS Window use the term “keylayout/input method” for their input support. 🙂

Back to ibus-xkb intergration.
Here I don’t want to discuss the technically details but UI.

There would be a list of language – input method – keyboard layout (or xkb setting) combination. Like:

en,,,us,
en,,,us,dvorak,
en,,,us,dvorak-classic,
en,,,gb,
en,t9,,keypad,
en,morse-code,,straight-key,
jp,anthy,,us,
jp,anthy,,jp,
jp,anthy,,jp,kana
zh,pinyin,,*,
zh,cangjie3,,us,
zh,cangjie5,,us,
zh,chewing,,us,
zh,chewing,et,us,
zh,chewing,hsu,us,

1st field is language.
2nd is input method (can be null if don’t need it ).
3rd is input method variant, like the custom Zhuyin layout (can be null if don’t need it ).
4th is keyboard layout (in xkb sense). ‘*’ is for following the system default layout.
5th is variant (in xkb sense).

The benefit of choosing these combinations are

  • Can invoke input method support on-the-fly. Invoke the IM module if you need one, deactive or free the IM module if you don’t need it.
  • Compatiable to both IBus and xkb.
  • IBus know which layout is needed for the IM.
  • Also support keypad and morse code input. 🙂

Although I prefer to let IBus handles the language combination switching,
I would like to know the better approach.

Usefullness of Keyboard layout in input method.

My previous post elaborates the difficulties and reasons why most of the input methods developers won’t adopt to the xkb key layout framework. To sum up, if your input method needs and relies on exactly one layout (usually en-QWERTY), you can safely hug that layout if ibus is capable of setting that layout for you. That’s why I keep nagging Hutterer about the set/get function of xkb.

However, Peter Hutterer opened my eyes and leaded me to a new aspect to review the input method implementation. Using key layout terminology, Chinese input methods can have up to 6 levels:

  1. Input symbol level: Word roots or Pinyin/Zhuyin symbols.
  2. Lower case full width alphanumber and punctuation marks .
  3. Upper case full width alphanumber and symbols.
  4. English lower case alphanumber. (if IM support temporary English mode)
  5. English upper case alphanumber. (if IM support temporary English mode)
  6. Special/User-defined symbols.

We might also have a quasi-level: Selection key level. Although “1234567890” are widely used to select candidates, some people, however, find “asdfghjkl;” more effective. Doesn’t that deserves a quasi-level? 😛

Althogh libchewing does implement Zhuyin layout conversion and support full width alphanumber input. Using key layout representation has it own benefits:

  • Better screen keyboard support.
  • Better setting ui: so users can bind their own level triggers to every level.

So, thank you, Peter Hutterer!

On ibus-xkb intergration

Couples of days ago, Peng Huang (the IBus author), Peter Hutterer (an xkb guru) and me were talking about integrating xkb and IBus. Peter Hutterer suggested that each input method should registers its own input symbols (e.g. Cangjie / Wubi word roots or Zhuyin symbols) to as an xkb layout. However, Huang did not seem too keen on this.

I, on the other hand, was eager to adopt in this idea. In chewing, there are 8 Zhuyin layouts to be dealt with, even after ignoring the QWERTY and DVORAK influence, still has 6 layouts. By adopting Hutterer’s framework, I can concentrate on converting Zhuyin symbols to characters without worrying the current system layout.
However, as a Chinese IM developer, I kind of understand why Huang was not interested and the humps on the road.

Hutterer’s concerns:

  • Easier input method development. IM can just concentrate on converting their own symbols.
  • Neither keysyms of English nor keycodes are reliable. So are IMs on top of them.

Problems of why most IMs won’t adopt the proposed framework:

  • Perceived awareness: Most of the IM developers only know one or two layouts, they don’t know why this is important.
  • Cross platform: How about the systems that don’t support the proposed framework?
  • Bureaucracy: Currently IMs only need to submit their works to IM framework; with the proposed framework, they also need to submit the corresponding symbols to xkb community, which are alien to them.
  • Input symbols are meaningless: Unless there is a corresponding IM to process them. Some input symbols might not even be in Unicode, so why bothers registering them as xkb layout?
  • Console mode: Even if IM developers were diligent enough to register in xkb, their works would not be appreciated in console mode. FYI, ibus-fbterm is usable now.
  • Selection keys: In Zhuyin, for example, key ‘1’ may either means ‘ㄅ’ or “select the 1st candidate”. Proposed framework does not quite addressing this, because IM developers still need to interpret what does the key means.
  • Assumption of en-QWERTY is always available: Even if keycodes do change, as long as en-QWERTY is available, just hook on en-QWERTY layout keysyms then you will be fine. This is probably the main reason. 😛

We cannot really do much about the perceived awareness, cross platform, meaningless input symbols issues. But, we can join force with console key layout developers/maintainers, so “define once, use everywhere” can be achieved.

The “Selection keys” issue, which I confirmed with Hutterer about, should be dealt within IBus or IM level.

Sync between Caps lock and LED

Caps Lock is used in chewing and other input method as mode switch key. Caps off for Chinese mode; and on for English mode, for example. However, it is not so simple to tame the Caps lock.

Paragan kindly gave me a link of keylockx, a program to detect and set the lock mode. Tagoh also told me that lock-keys-applet, a GNOME applet can also to the similar. While both programs do change and reflect the real Caps mode; unfortunately, the LED on USB keyboard does not reflect such changes.

Things get even wierder when synergy is involved. The lock stage switch happens in synergy client, which is all right and correct. Problem is, synergy server LED does not reflect this.

Ok, I know synergy server lock LEDs does not have to synchronize with client, and might cause more trobles if forced to do so. But it still confuses inexperienced end users, as client do not necessary have lock LED on them.

Mmm, may be that’s why lock-keys-applet has its own position, as it showed the real lock state on the client. But story does not end here, as input methods usualy just process the CapsLock event, do not care what’s current status and where the event from… arrrgaghhhh

Opera Chinese/Japanese/Korean input with iBus

iBus is a next generation input method framework using D-Bus.
The interface is very clean, and it does not have nasty SCIM C++ ABI transition problem.

Opera does not support it out of box. However, since it supports XIM by setting the export QT_IM_MODULE environment variable, inserting:

export QT_IM_MODULE=XIM

in /usr/bin/opera makes iBus works with opera.