The Romanization of Toponyms in the Countries of South Asia

UNITED NATIONS GROUP OF EXPERTS ON GEOGRAPHICAL NAMES
Meeting of the Working Group on Romanization Systems
Tallinn, 9-11 October 2006

The Romanization of Toponyms in the Countries of South Asia¹

A: Background to the Hunterian System of Romanization

(1) The Hunterian System for the writing of proper names was developed in the 1860s by William Wilson Hunter, Director-General of Statistics for India, and published in Hunter’s Guide to the Orthography of Indian Proper Names (Calcutta, 1871). The Government of India accepted the system with some modifications in 1872, and it was used in the official Imperial Gazetteer of India (1881 onwards; 24 volumes), a work initiated by Hunter.

(2) The Hunterian System is tabulated in several works of the late 19^th century², which note that it was “not intended for Burmese, Arabic, Tibetan or Chinese names”. In 1873 the British India Department of Agriculture, Revenue & Commerce specifically stipulated that the system was not to be used in “British Burma”. The Hunterian System was thus designed principally for the languages of present-day India, Pakistan, Nepal, Bhutan, Bangladesh, and Sri Lanka, and it was used by the Survey of India over these areas for the romanization of geographical names.

(3) In the late 19^th century sources, the system marks long vowels with an acute accent, and renders the letters k and q both as k. However, when the system was again published in 1954³, alterations had been made. Long vowels were now marked with a macron⁴ and the q-k distinction was maintained. Use of the macron was in fact evident in Survey of India sources dating from the turn of the twentieth century. The 1954 rules also required that any instances of ‘ayn and hamza should be recorded – both by means of an inverted apostrophe (‘) – but that this symbol should not be reproduced on the final cartographic product.

(4) The Hunterian System was unashamedly geared towards an English-language receiver audience, principally of course in Britain. It was based on a principle of uniform transliteration, irrespective of the language of origin. All the various types of consonant occurring in the languages of this region were reduced to a single form, conforming to the standard straightforward consonants present in the alphabet of the English language. The only sign alien to this alphabet was the macron, utilised to indicate vowel length. This sign could potentially be present on any vowel, though in practice it was usually limited to a, i, and u where these were long (and it was not employed on a long vowel occurring in word-final position). The resulting transliterated name forms are very easy to read, and very easy to accommodate within contemporary digital structures, though of course they normally lack the facility of reversibility.

B: Procedures in India

(5) The Central Institute of Indian Languages http://www.ciil.org identifies 18 “scheduled languages” of India, taken from the 1991 Census. These are languages which correspond to one of the states or union territories of India or (like Sanskrit) make an important literary contribution. Fifteen of these languages have a significant toponymic impact; these are listed below⁵.

Language Speakers Language Family Script

Assamese 13.1m Indo-European → Indo-Aryan Devanagari

Bengali 69.6m Indo-European → Indo-Aryan Devanagari

Gujarati 40.7m Indo-European → Indo-Aryan Devanagari

Hindi⁶ 337.3m Indo-European → Indo-Aryan Devanagari

Kannada 32.8m Dravidian Dravidian

Kashmiri 4.4m Indo-European → Indo-Aryan Devanagari⁷

Malayalam 30.4m Dravidian Dravidian

Marathi 62.5m Indo-European → Indo-Aryan Devanagari

Nepali 2.1m Indo-European → Indo-Aryan Devanagari

Oriya 28.1m Indo-European → Indo-Aryan Dravidian [sic]

Punjabi 23.4m Indo-European → Indo-Aryan Devanagari

Sindhi 2.1m Indo-European → Indo-Aryan Perso-Arabic

Tamil 53.0m Dravidian Dravidian

Telugu 66.0m Dravidian Dravidian

Urdu 43.4m Indo-European → Indo-Aryan Perso-Arabic

(6) Names are field collected in the original language, eg Bengali, Marathi, Telugu, Urdu. This procedure is usually carried out by the authorities of the relevant state or union territory. These names are then transcribed by the central Survey of India authority into a standard Devanagari script form. This form basically equates to Hindi, but that language label is not applied since – although Hindi is constitutionally the official language of India as a whole – it does not find favour in all the states and union territories. This Devanagari form is then romanized by the Survey of India via the Hunterian System, to produce the final official form, labelled “English”. There is a small and decreasing number of names still spelt in an anglicised manner (eg Mangalore) rather than in a Hunterian romanized manner (which in this example would be Mangalūru).

(7) Note that the Survey of India is the final authority for Devanagari spellings, for romanized spellings, and for anglicised spellings. It is also the final authority for name changes (eg Madras to Chennai), though the request for such changes will usually have originated in the state or union territory where the place or feature is located.

(8) As recently as May 2006, the Survey of India has confirmed to the United Kingdom its continued use of the practices above, including its continued use of the Hunterian System.

C: Procedures in Pakistan

(9) Pakistan has seven languages of significance, as listed below⁸:

Language Speakers Language Family Script

Baluchi 5.7m Indo-European → Iranian Perso-Arabic

Kashmiri under 1m Indo-European → Indo-Aryan Perso-Arabic⁹

Punjabi 60.7m Indo-European → Indo-Aryan Perso-Arabic

Pashto 18.9m Indo-European → Iranian Perso-Arabic

Seraiki 13.8m Indo-European → Indo-Aryan Perso-Arabic

Sindhi 18.5m Indo-European → Indo-Aryan Perso-Arabic

Urdu 10.7m Indo-European → Indo-Aryan Perso-Arabic

(10) Pakistan inherited the Hunterian System and continues to use it today. Toponyms are field collected in the language of origin and then transcribed by the central Survey of Pakistan authority into a standard Urdu form, Urdu being the national language of Pakistan. Note that the Survey of Pakistan uses the label “Urdu” to apply to the script of all transcribed toponyms, not just toponyms from the Urdu language. The Survey of Pakistan then uses the Hunterian System to romanize the standard Urdu form into a final official form, which is labelled “English”.

(11) As recently as the 23^rd Session of UNGEGN (April 2006), Pakistan has confirmed that these procedures remain in official government use, with the Hunterian System of romanization still considered ideal for Pakistan’s national requirements.

D: Procedures in Nepal

(12) Originally, spellings derived by means of the Hunterian System were used in Nepal, most notably on the first basic two-sheet map, the 8 miles to the inch Series U462 dating from the 1960s. In 1985, the Nepal Survey Department (NSD) produced a three-sheet 1:500,000-scale map, in Roman script with diacritics but with names differing from those produced by the Hunterian System. These sheets are published by the Topographical Survey Branch, Survey Department, Government of Nepal, and remain the most convenient general source for toponyms.

(13) The NSD has also published a 1:50,000-scale series during the 1990s, jointly with the government of Finland. There is also a 1:25,000-scale series in work. These two series are in the same Roman spellings as used in the 1985 1:500,000-scale map noted in paragraph 12 above.

(14) At the 8^th UN Conference in Berlin in 2002, Nepal outlined the processes involved in determining the geographical names for the 1:50,000 and 1:25,000 series noted in paragraph 13 above. The stages, beginning (perhaps surprisingly) with Roman-script Hunterian material, were:

Collection of geographical names from the existing romanized topographical maps at the scale of 1 inch to a mile.
“Translation” [sic] of the geographical names into Nepali.
Field verification of the geographical names by surveyors, with the help of local people.
Correction and approval of the geographical names by the local authorities.
Office transliteration of the geographical names by survey officers [not seen].
The base map with these names is approved by the Mapping Sub-Committee (there is no authorised body specifically for the standardization of geographical names).

(15) The NSD is also in the process of producing nationwide coverage at 1:100,000 scale, but this is believed simply to consist of unaltered reductions from the existing 1:50,000 scale series. Apart from an occasional small-scale map, all NSD products are published in Roman script.

E: Procedures in Bhutan

(16) The language of Bhutan, Dzongkha, is closely related to Tibetan, and is written in the same script, but Bhutan is keen to maintain that the two languages are distinct. Bhutan is trying to use Dzongkha (rather than English) in official correspondence; both Dzongkha and English are being taught in schools. Establishing Bhutan’s procedures for deriving its toponyms is problematic. For many years it was necessary to rely on British colonial sources for spellings, but in 1994 Bhutan’s Ministry of Agriculture published a Roman-script 1:250,000 map, showing land cover. This map claims to be based on a set of Survey of Bhutan 1:50,000s, which are indexed in a coverage diagram shown in the margin of the 1:250,000 map. The United Kingdom has not seen this 1:50,000-scale map series.

(17) The spellings on the 1:250,000 map seem to be loosely based on a romanization system, but according to the Bhutanese representative at the 8^th UNCSGN in 2002 there may have been a newer 1997 “official” romanization system: see http://www.eki.ee/wgrs/rom2_dz.pdf. This same representative indicated that the 1997 system might be officially authorised for the romanization of geographical names in 2005, but there has been no indication of this. Meanwhile, there seems little alternative to taking the Roman-script spellings on the 1:250,000 map of 1994 as official.

F: Procedures in Bangladesh

(18) Most of the toponyms of Bangladesh are in the Bengali language. In July 2004, Bangladesh confirmed to the United Kingdom its continuing use of the Hunterian System for the romanization of geographical names, though since the 1980s the Survey of Bangladesh has no longer incorporated the macron to indicate vowel length.

G: Procedures in Sri Lanka

(19) There are two official languages in Sri Lanka: Sinhalese and Tamil. At the 22^nd Session of UNGEGN (April 2004), Sri Lanka confirmed that it was happy to continue with its long-standing practice of publishing official mapping in three separate editions: Sinhalese, Tamil and English. The sets of spellings in the three languages are parallel, and no transcription or romanization from any one language to another is involved¹⁰.

H: Summary & Conclusion

(20) India and Pakistan continue to use the same romanization system – the Hunterian System – as they used before their independence. The two countries even still use the same toponymic record form – “Form 22 Topo” – inherited from the colonial era. And as of 2006, both countries have reaffirmed their ongoing commitment to these practices and procedures. Bangladesh also uses the Hunterian System, though without the macron. Nepal is revising its toponyms as a result of more recent field collection; the resulting romanizations still use the macron to indicate vowel length but are no longer necessarily Hunterian spellings. Procedures in Bhutan probably involve romanization, but the precise situation is unclear. Sri Lanka does not romanize at all; it produces mapping in parallel editions in three languages.

(21) The separate romanization tables for the individual languages of the region, known as the Sharma tables¹¹ and adopted by resolution at the 2^nd UNCSGN in 1972¹² as the UN romanization system for geographical names can be seen at the website of the UNGEGN Working Group on Romanization Systems at http://www.eki.ee/wgrs. But, as this present paper has demonstrated, these systems are not implemented in practice in any of the relevant countries; they are wholly spurious. The UNGEGN must find a means of extricating the UN from its commitment to these UNCSGN resolutions.

PCGN, United Kingdom
June 2006

Notes
¹ India, Pakistan, Nepal, Bhutan, Bangladesh, Sri Lanka.
² eg: A Manual of Surveying for India, H L Thuillier & R Smyth, Calcutta 1875; Handbook of Professional Instructions for the Topographical Branch, Survey of India Department, 2^nd edition, Dehra Dun, 1896.
³ Handbook of Topography, 8^th edition, Survey of India, Dehra Dun, 1954.
⁴ A long bar above the relevant vowel letter.
⁵ Information from http://www.ciil.org.
⁶ Hindi here incorporates “languages” sometimes considered as partly separate: eg Bhojpuri, Chhattisgarhi, Magahi, Maithili, Rajasthani.
⁷ Written in Devanagari script in India; written in Perso-Arabic script in Pakistan.
⁸ Information from http:///www.ethnologue.com.
⁹ Written in Perso-Arabic script in Pakistan; written in Devanagari script in India.
¹⁰ A similar situation exists in Myanmar, where official toponymic sources are produced in parallel Burmese and English editions; there is no romanization from the former language to the latter.
¹¹ The tables were presented to the 2^nd UNCSGN by Colonel Sharma of the Survey of India.
¹² Resolution II/11 of the 2^nd UNCSGN: Transliteration into Roman and Devanagari of the languages of the Indian group. Also relevant is Resolution III/12 of the 3^rd UNCSGN: Transliteration into Roman and Devanagari scripts of the Indian Division, which acted as a supplement to Resolution II/11.

Language	Speakers	Language Family	Script
Assamese	13.1m	Indo-European → Indo-Aryan	Devanagari
Bengali	69.6m	Indo-European → Indo-Aryan	Devanagari
Gujarati	40.7m	Indo-European → Indo-Aryan	Devanagari
Hindi⁶	337.3m	Indo-European → Indo-Aryan	Devanagari
Kannada	32.8m	Dravidian	Dravidian
Kashmiri	4.4m	Indo-European → Indo-Aryan	Devanagari⁷
Malayalam	30.4m	Dravidian	Dravidian
Marathi	62.5m	Indo-European → Indo-Aryan	Devanagari
Nepali	2.1m	Indo-European → Indo-Aryan	Devanagari
Oriya	28.1m	Indo-European → Indo-Aryan	Dravidian [sic]
Punjabi	23.4m	Indo-European → Indo-Aryan	Devanagari
Sindhi	2.1m	Indo-European → Indo-Aryan	Perso-Arabic
Tamil	53.0m	Dravidian	Dravidian
Telugu	66.0m	Dravidian	Dravidian
Urdu	43.4m	Indo-European → Indo-Aryan	Perso-Arabic