Transliterating Sanskrit, and its derivatives such as Pali, remains an annoying problem. The problem isn’t with the language itself; Sanskrit’s wonderfully precise and clear about sounds and letters. Likewise, there’s no issue with scripts or alphabets. You might think that there is some mystical connection between the script that a language is written in and the language itself but that’s really not the case. Sanskrit in India is written in Devanagari but there’s no special reason to use Devanagari for Sanskrit instead of the Latin alphabet or another one. Plus, Sanskrit’s only been written in Devanagari for a comparatively short period of time.
(Surprisingly, the alphabet was only invented once and all alphabets are genetically related to each other, branches from this one root. Devanagari is linked to Latin letters via Brahmi and Aramaic.)
But in order to write Sanskrit correctly, you need some Latin letters not used in English. This is a common-enough situation; think of accent marks, or the French and Portuguese cedilla — ç — or the Spanish enye — ñ — or even Motörhead’s heavy metal umlaut. So, for example, “Devanagari” ought to be written “Devanāgarī” and “Pali” should be “Pāḷi.” The complete set of diacritics for Pali is: ā, ī, ū, ṁ, ṇ, ñ, ṭ, ḍ, ṅ, ḷ .
There’s another, separate but related, issue about when to use these ‘extra’ letters and marks; for native English readers, the argument goes, these ‘extra’ letters and marks — called diacritics — are distracting and make the words harder to read.
Specialists typically prefer to preserve diacritics, because losing them changes the meaning of the word in its original language. The question comes down to: “when do these foreign words become English words?” There’s an active debate going on now on H-Buddhism, an academic Buddhist studies mailing list on this very topic. Dictionaries are split on the issue, with some words preserving diacriticals and others losing them: for more on this, see the list of Buddhist Terms Found in English Print Dictionaries and Gerald Jackson’s series on fonts and diacritics in academic publishing.
The problem arises when you need to write diacritics in your friendly word processing application. This immediately leads to a technical conversation about Unicode and Unicode fonts. Which is when things start to get hairy.
The best starting point for Unicode issues is Alan Wood’s page, It’s worth reading for the introduction, as an overview of the topic of digital transcription. More specifically, for the topic at hand, the Tibetan & Himalayan Digital Library has a good survey of Unicode fonts for transliterating “Indo-Tibetan” languages.
(By Indo-Tibetan they mean Indian languages for Buddhist studies, including Sanskrit, Pali, Gandhari, and so on, plus Tibetan. “Indo-Tibetan” isn’t a language family like Indo-European but the term point to the very close relationship between Tibet and India. Buddhist Tibetan is a specialized language unreadable to a native Tibetan, optimized a thousand years ago to translate Buddhist Sanksrit into Tibetan. Smart people have been dealing with these issues for a long time.)
They make the point that not all Unicode fonts contain the necessary characters, so simply choosing a Unicode font isn’t enough: “To properly display all the diacritic marks used in Indo-Tibetan studies, a Unicode font must contain the following character ranges:
- Basic Latin: U+0000 – U+007F (View Unicode Chart)
- Latin-1 Supplement: U+0080 – U+00FF (View Unicode Chart)
- Latin Extended-A: U+0100 – U+017F (View Unicode Chart)
- Latin Extended-B: U+0180 – U+024F (View Unicode Chart)
- Latin Extended Additional: U+1E00 – U+1EFF (View Unicode Chart)”
|character||ASCII rendering||character name||Unicode number||key combination||HTML code|
From Wikipedia‘s “Pali transliteration on computers.”
The Tibetan Digital Library people also have a good chart of relevant diacritics.
(In the distant past, like five years ago, there were various gnarly work-arounds, including the now-deprecated Times Norman / Normyn font. Nobumi Iyanaga has written a useful library of scripts to convert from Times Norman / Normyn to ‘good’ Unicode.)
So, what are the practical options for a good font for transliterating Sanskrit and Pali today? It seems to me that there are at least five good choices:
Times Ext Roman
The Tibetan Digital Library people really like Times Ext Roman. But the only source for it is the Bahá’i Computer & Communication Association and it’s not clear to me what license its published under so I would be reluctant to recommend it even though I trust that it’s technically valid.
If you can get past SIL’s Christian missionary agenda, they do outstanding linguistics work and their Gentium font is well regarded, seems complete for the purposes of transliterating Sanskrit and Pali, is widely accepted, is under active development and is licensed under a good, if idiosyncratic, open source license. It’s a nice-looking typeface, in my opinion.
John Smith has recently updated this font family. It’s exactly designed for the topic under discussion, “the representation of Indian-language (and similar) material in Roman script using the Unicode character set.” But he’s just one, albeit committed, guy and I don’t know what license he’s publishing these under, so I worry about its long-term supportability. But worth mentioning; sort of in the same category to me as Times Ext Roman.
There’s also a commercial product, TransIndic Transliterator in Unicode, from Linguists Software that seems like it does the job, although I don’t know much about it. It costs $100 per typeface (they have Times, Palatino, Arial, etc.) or $250 for the whole thing. Commercial license. Paying for it has the advantage of having someone on the hook to help you with it, not a small thing.
This nice-looking typeface was originally designed to transcribe the newly discovered Buddhist manuscripts from Afghanistan. (Gandhari is another Middle Indic prakrit like Pali.) Gandhari Unicode is under active development, which is good, and seems widely accepted. (Main page, download.)
The license status of Gandhari Unicode is a little bit troubling; it’s based on work licensed under the “Aladdin Free Public License” which isn’t, despite the name, a free public license. The Free Software Foundation considers it a non-free license. Other parts of Gandhari Unicode are GPL-derived but I don’t understand which takes precedence. Note that the link in Andrew Glass’s documentation to the Aladdin license (at Wisconsin) is out of date.
[26 August 2009 update: According to reliable reports, there are issues with Gandhari Unicode’s spacing, especially italics, when printed.]