MS-Word / Unicode "work around"
This is more of a "work around" than it is a solution.
It isn't perfect, but it is a lot easier than remembering the codes for
characters. It might be a good idea for people to write Microsoft an email
a) they make it possible for users to create custom transliteration tables
for any font, and
b) that they make it possible for people to share their transliteration
tables by email.
This would solve the "user entry" side of data entry for multi-lingual
people (and multi-language documents) once and for all. Pre-defined "Input
Method Editors" (IMEs) are *not* the solution (for a lot of reasons - both
because of weird little problems in the "internal machine representation"
with Unicode/fonts *and* the many differing needs of users entering data).
The alternative to the MS-Word entry method below is creating "macros" to
generate the special characters. The macros would work better, but:
a) you would have to remember the macro names
b) you need to find some keystroke combinations that Word (and your other
macros) does not already use.
c) you would need to know how to create and use macros.
It should take you about about 10 minutes to set up the characters using the
System requirements: The font 'Arial Unicode MS' and 'MS-Word 97' (or
better). If you do not have the font, you can download it from the Microsoft
web site. It is a huge font (about 8Mb? as a download and 40Mb. installed.)
The font comes 'standard' with many Windows computers since the year 2000.
Here is what I discovered:
a) open a new document in MS-Word
b) main, top-level menu "Insert"->"Symbol..."
c) Choose 'Font: ' Arial Unicode MS
d) Choose 'Subset: ' Basic Latin
e) find the '~n' character in the table and click on it
f) press the button 'Insert'
e) press the button 'Close'
g) in the document, select the ~n character with your mouse
h) main, top-level menu "Insert"->"AutoText"->"AutoText..."
i) press on the tab at the top called 'AutoCorrect'
j) now you see the character in a box called 'With: '.
h) to the left of that is a box called 'Replace: '
i) enter ~n in the box called 'Replace: "
j) press the button 'OK'
(Note: Now, to get the character keep this in mind - MS-Word wants to
"auto-correct" a *word* for you. This means that you need to have a blank
character before the ~n and after the ~n.)
So, now type:
a) a blank space
b) ~n (nothing happens)
c) a blank space (and the ~n character is there).
1) you will need to put a blank before *and* after the Pali special
characters, and then go back and remove the extra blanks after you are
finished typing your word.
2) The Unicode fonts are a total marketing fake-out: a) they aren't really
standardized at all and b)not all Unicode fonts have all of the characters!
Oh well... I know that 'Arial Unicode MS' works for Pali.
3) You may wish to use !n for the 'overdot n' (instead of "n). Otherwise,
every time you enter a real quote that starts with the character n then the
text will transliterate.
4) You only need to enter the characters once. This sets the transliteration
characters for all MS-Word documents, not just the document that you are
currently working on.
5) This works for any characters in any language in the Unicode fonts (in
case you need other characters from other languages and you want to use
'transliteration' codes for them).
6) You will *not* be able to get capitalized Pali characters using this
Here is where you can find the Pali special characters in the font 'Subset:
~n Basic Latin
aa Latin Extended-A
ii Latin Extended-A
uu Latin Extended-A
(Subset: Latin Extended Additional is way down the list of subsets)
.d Latin Extended Additional
.l Latin Extended Additional
.m Latin Extended Additional
.n Latin Extended Additional
"n Latin Extended Additional
.t Latin Extended Additional
Last but not least: Palitrans 2.0 supports easy, natural, direct
transliteration and the following Pali fonts:
a) Times CSX+
b) VriRomanPali (used on CSCD)
c) Leedsbit PaliTranslit
d) Times Norman
e) Skt Times
So, if Unicode isn't that important to you, it is probably easier just to
use Palitrans 2.0.
So, it would help everybody in the world if people would at least email
Microsoft and ask for customizable and shareable transliteration tables in
From: Piya Tan [mailto:libris@...]
Sent: Sunday, April 28, 2002 8:59 AM
Subject: Re: [Pali] Unicode / transliteration / MS-Word
> Has everyone figured out how to set up MS-Word so that you can get the
> special characters using Pali transliteration? If not, I can post the
> instructions on this list. <snip>
- Dear Andy & others,
Below is a message which might interest you. Charles Muller has built
up a "autocorrect" file which can be used within MS Word. It *might*
save you some time!
Date: Wed, 1 May 2002 13:51:18 +0900
From: Charles Muller <acmuller@...>
Reply-To: H-NET Buddhist Scholars Information Network
Subject: TECHNICAL>Adding Sanskrit Diacritical Function to AutoCorrect
From: Charles Muller
Subject: Adding Sanskrit Diacritical Function to AutoCorrect
For users of MS-Word:
As you may be aware, Microsoft Word includes the function of correcting
misspelled words automatically as you type. If you have not yet
experienced this, the setting may not yet be turned on. In that case
please open up Word and to the menu Tools >> AutoCorrect, and check the
box that says "Replace text as you type."
If you look at the table of words contained below in that same settings
box, you will see that you can edit, add, and delete the entries contained
for this purpose. I have been adding Sanskrit and other
diacritically-marked terms to this function for a number of years now, so
that when I am working in Word, and I type a term such as "prajna",
"samsara", etc., as soon as I hit the space bar after the word the
appropriate diacritical marks will be attached.
When one makes changes in this function, these changes are saved (for
English-US users) in a file named "mso1033.acl" (for users of other
variants of English, or other European languages, the file is named
differently, perhaps something like "mso2156.acl" etc.).
You can locate this file by searching it from the Windows desktop:
Start >> Search
Once you can locate it, you can delete it, and replace it with the ACL
file that I have created. You can find it manually as well.
For Windows 2000 users, use the desktop Explorer or My Computer to go to
c:\Documents and Settings\<your username>\Application
For those who don't use US-English, you can edit this file and input it to
your own language's Autocorrect file by using the Autocorrect Backup macro
that Microsoft supplies on its Office web site. But you need to have some
familiarity with Word VBA macros in order to do this.
My own mso1033.acl file can be downloaded from:
1) Many of the diacritics contained here, such as underdot characters, are
only available in a Unicode font, such as TITUS or Arial Unicode. If you
don't have one of these fonts applied, you will see square boxes where you
should see fonts.
2) This file contains tags for automatic XML markup that I use in my work.
Thus, if you type in Lotus Sutra, you will get <title>Lotus Sutra</title>.
You can edit/eliminate the entries that contain these tags by opening up
the file while in Word >> Tools >> Autocorrect. Those who have experience
with XML and understand the purpose of such tags may want to keep them.
Toyo Gakuen University
Digital Dictionary of Buddhism and CJK-English Dictionary
- Hello Andy and everybody,
A> The alternative to the MS-Word entry method below is creating "macros" to
A> generate the special characters. The macros would work better,
Below is an explanation how to set up macros.
A> System requirements: The font 'Arial Unicode MS' and 'MS-Word 97' (or
There are also compatible Unicode fonts at
They are compact, for example, CN-Times is only 0,9 Mb in size. Few
people will be willing to download huge Arial Unicode MS just to read
a single document.
How to create keyboard shortcuts for Unicode Pali letters:
A> a) open a new document in MS-Word
A> b) main, top-level menu "Insert"->"Symbol..."
A> c) Choose 'Font: ' Arial Unicode MS (or another Unicode font)
A> d) Choose 'Subset: ' Basic Latin
A> e) find the '~n' character in the table and click on it
A> f) press the button 'Insert'
A> e) press the button 'Close'
A> g) in the document, select the ~n character with your mouse
A> h) main, top-level menu "Insert"->"AutoText"->"AutoText..."
3. "Tools"->"Customize..."->button "Keyboard..."
4. in "Categories:" list - select "AutoText"
5. in "AutoText" list select the character [it may show as a small, empty
square - or even as a completely different character in the list] to which
no 'current shortcut keys' are assigned.
6. position cursor in "Press new shortcut key" area
7. press desired keystroke (<Alt>-a, <Alt>-t, or whatever), button "Assign",
button "Close", button "Close"
Notes: with the AutoText method, you will get the character only in the font
A> Here is where you can find the Pali special characters in the font 'Subset:
A> ' groups:
A> ~n Basic Latin
A> aa Latin Extended-A
A> ii Latin Extended-A
A> uu Latin Extended-A
A> (Subset: Latin Extended Additional is way down the list of subsets)
A> .d Latin Extended Additional
A> .l Latin Extended Additional
A> .m Latin Extended Additional
A> .n Latin Extended Additional
A> "n Latin Extended Additional
A> .t Latin Extended Additional
Why Unicode? I use it because such fonts:
- fully support multilingual document, without overlapping of
- one can search and replace any character (solving the problem with
.t retroflexive in other fonts);
- one can easily convert Word documents to HTML format without losing
- most Pali characters in such HTML documents (except retroflexive)
will be displayed correctly even without the Unicode font.