Satish B. SettyArchiveAboutRSS Feed

UTIL: Unified Transliteration of Indic Languages

:NOTE: Download PDF of this article. You need good Unicode fonts to read the article. I recommend Noto font family from Google. You can apt-get it. IPA symbols are within square brackets, like [ʂ] and transliterated symbols are within slashes, like /ṭ/.

UTIL is a romanization scheme for Indic languages. It is designed as pan-Indian transliteration scheme. It covers 20+ languages: Bengali, Dogra, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Lepcha, Limbu, Manipuri (Meitei), Maithili, Malayalam, Marathi, Nepali, Oriya, Punjabi, Sanskrit, Santali, Sindhi, Sinhala, Tamil, Telugu, Urdu and probably many more.

So, why yet another scheme?

  1. IAST is insufficient. It serves Sanskrit and Pali but is incomplete for pretty much everything else (e.g. Bengali, Gujarati, etc.).
  2. ISO–15919 is also insufficient. It ignores Kashmiri and Sindhi, which are integral Indian languages. Plus, it lacks symbols for newly-assigned Unicode codepoints (e.g. ॹ or ॺ). Also कृष्ण /Kṛṣṇa/ is typographically more consistent than /Kr̥ṣṇa/.
  3. ALA-LC is designed as a single-language model ignoring the inherent similarity of Brahmic scripts. This leads to inconsistencies. For example, Tamil ழ /ḻ/ and Kannada ೞ /l̤/ correspond to same character and sound (“retorflex approximant”) and yet have different representations. Conversely, the same symbol /ṣ/ represents Hindi ष [ʂ] and Urdu ص [sˤ] even though they’re completely different sounds.
  4. Other schemes like Hunterian or Gretil are as bad as the above or even worse sometimes.

So, how is UTIL better?

Vowels

Primary vowels and diphthongs:

a ā i ī u ū e ē ai o ō au

Additional ones, all have a dot below:

ạ̄ ọ̄ ụ̄

Consonants

Consonants with their Sanskrit names:

Plosives Nasal Implosives Fricatives Vibrants Approximants
स्पर्श नासिक विस्पर्श ऊष्मन् द्रव अन्तस्थ
कण्ठ्य
Velar

k

kh

g
घ 
gh


ख़
ḵẖ
ग़
ġ

h
तालव्य
Palatal

c

ch

j
झ 
jh



झ़
zh

y
य़
मूर्धन्य
Retroflex


ṭh

ढ 
ḍh


ड़
ढ़
ṙh


दन्त्य
Dental

t

th

d
ध 
dh

n
स़

l
वर्त्स्य
Alveolar
च़
ċ
छ़
ċh



s
ज़
z

r

ओष्ठ्य
Labial

p

ph

b
भ 
bh

m
ॿ
फ़
f

v
व़
w

Affricate glide ॺ (‘JJYA’) is transcribed /j̄/.

Other symbols

Anusvāra: ṃ Anunāsika:  ̐ Avagraha: ’
Visarga: ḥ Jihvāmūlīya: x̣ Upadhmānīya: ẋ
Vedic Udātta:  ́ Svarita (independent):  ̀ Anudātta:  ̱
Arabic hamza ء: ʼ Arabic ain ع: ʽ
Rising tone: ˊ Falling tone: ˋ Neutral tone: ˙

Udātta and svarita use combining grave and acute accent respectively. Whereas hamza and ain use non-combining modifier letters U+02BC and U+02BD respectively. Tone modifiers are used in Maithili, Dogra and other Pahari languages.

General Notes

Script Notes

Urdu

Perso-Arabic characters are chosen in a non-conflicting way with the Brahmic scripts. Urdu introduces six sounds [f, z, ʒ, q, x, ɣ] on top of Hindi (see Hindustani phonology). Note that [f, z, x, ɣ] are fricatives, just like ष [ʂ], स [s], ह [h]. Excluding these IPA signs, the ones in the below table are indicative only.

Urdu ق ح خ ء ع غ ط ظ ز ذ ض ص ث ش ژ ف و
UTIL q ḵẖ ʼ ʽ ġ z ż zh f w
IPA [q] [ɦ] [x] [ʔ] [ʕ] [ɣ] [tˤ] [zˤ] [z] [ð] [dˤ] [sˤ] [θ] [ʃ] [ʒ] [f] [w]
Devanagari क़ ह़ ख़ ॽ़ ग़ त़ ज़ ज़ ज़ ज़ स़ स़ झ़ फ़ व़

Input methods for IME:s

Of course, a transliteration scheme is not so useful if it cannot be entered into a computer, for which Input Method Editors (IMEs) are used. This can be thought of as an ASCII transliteration of UTIL.

Emacs input method can be found indic-roman-postfix.el, which is a postfix input method (i.e., diacritics are entered after the character).

indic-util.mim is an m17n input method can be used with many IME’s based on libm17n like iBus, uim and fcitx.