Skip to content
View andjc's full-sized avatar
  • Melbourne, Australia

Block or report andjc

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. enabling-languages/python-i18n enabling-languages/python-i18n Public

    Random notes on Python internationalisation

    Jupyter Notebook 19

  2. enabling-languages/library-i18n enabling-languages/library-i18n Public

    Exploration of internationalisation issues for libraries.

    Jupyter Notebook 1 1

  3. Grapheme tokenisation in Python Grapheme tokenisation in Python
    1
    # Grapheme tokenisation in Python 
    2
     
    3
    When working with tokenisation and break iterators, it is sometimes necessary to work at the character, syllable, line, or sentence levels. Character level tokenisation is an interesting case. By character, I mean a user perceivable unit of text, which the Unicode standard would refer to as a grapheme. The usual way I see developers handling character level tokenisation of English is via list comprehension or typecasting a string to a list:
    4
     
    5
    ```py
  4. enabling-languages/dinka enabling-languages/dinka Public

    Dinka language resources

    JavaScript 2

  5. enabling-languages/nuer enabling-languages/nuer Public

    Nuer language resources

    Rich Text Format 1

  6. enabling-languages/australian_indigenous enabling-languages/australian_indigenous Public

    Keyboard layouts and web support for Aboriginal and Torres Straight Island languages

    4