Skip to content

dongyuwei/web-pinyin-ime

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

web-pinyin-ime

online pinyin input method

online demo

https://dongyuwei.github.io/web-pinyin-ime/

The pinyin dict source

https://android.googlesource.com/platform/packages/inputmethods/PinyinIME/+/refs/heads/master/jni/data/rawdict_utf16_65105_freq.txt You can download the Android PinyinIME via this link: https://android.googlesource.com/platform/packages/inputmethods/PinyinIME/+archive/refs/heads/master.tar.gz

It is licensed under the Apache License, Version 2.0, see: https://android.googlesource.com/platform/packages/inputmethods/PinyinIME/+/refs/heads/master/NOTICE

The rawdict_utf16_65105_freq.txt and NOTICE are included in ./ime/src/script directory.

The process of the pinyin dict:

  • Convert the file to UTF-8 encoded: iconv -f UTF-16 -t UTF-8 rawdict_utf16_65105_freq.txt > google_pinyin_rawdict_utf8_65105_freq.txt
  • Transform the dict to ./ime/src/pinyin/google_pinyin_dict_utf8_55320.ts, see the nodejs script: ./ime/src/script/dict_preprocess.js
  • Build a packed trie in the transform step. This enables pinyin prefix input.
  • The final pinyin dict is ./ime/src/pinyin/google_pinyin_dict_utf8_55320.ts, which includes the transformed pinyin data and the prepared packed Trie.

Dev prerequisite

  • nodejs(tested with v14.17.0)
  • pnpm(tested with 6.17.2)

Make sure you installed nodejs and pnpm, then istall npm packages: cd ime && pnpm install

For dev

pnpm run dev

The core logic located in ./ime/src/pinyin/ime_engine.js and ./ime/src/pinyin/IME.tsx If you make any changes, make sure to run cd ime && pnpm test, see ./ime/src/pinyin/ime_engine.test.ts.

Tests

pnpm run test

see ./ime/src/pinyin/ime_engine.test.js, 支持

  • 全拼
  • 首字母匹配
  • 拼音前缀匹配
import getCandidates from './ime_engine'; it('should get candidates with full pinyin', () => { expect(getCandidates('xihongshi')).toEqual(['西红柿']); }); it('should get sorted candidates with abbr of pinyin(First chars of pinyin)', () => { // `xhs` maybe abbr of `xin hua she`, or `xi hong shi`, etc. expect(getCandidates('xhs')).toEqual([ '新华社', '西红柿', '小和尚', '小护士', '巡回赛', ]); }); it('should get sorted candidates with pinyin prefix', () => { // `xih` maybe prefix of `xi huan`, or `xi huan ni`, or `xi hong shi`, etc. expect(getCandidates('xih')).toEqual([ '喜欢', '喜欢你', '西湖', '喜好', '细化', '西红柿', '喜欢吃', '稀罕', '喜欢听', '熄火', '西汉', '洗好', '嘻哈', '喜获', '喜欢什么', '喜欢自己', '西海岸', '西化', ]); expect(getCandidates('xiho')).toEqual(['西红柿']); expect(getCandidates('xihon')).toEqual(['西红柿']); expect(getCandidates('xihong')).toEqual(['西红柿']); expect(getCandidates('xihongs')).toEqual(['西红柿']); expect(getCandidates('xihongsh')).toEqual(['西红柿']); expect(getCandidates('xihongshi')).toEqual(['西红柿']); });

Build

pnpm run build

How to customize?

  • You can build a customized UI using the existing pinyin input method engine(see ime_engine.js), there is only one simple API: getCandidates(inputString)

  • You can get some inspiration from the reference implementation (The IME React Component in IME.ts) and the unit test cases in ime_engine.test.ts

About

online pinyin input method. 基于谷歌拼音开源词库的web拼音输入法。

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published