Pāṇini and Information theory – back to the future

Professor Gérard Huet is a French Computer Scientist, Mathematician and Computational Linguist. CSP connected with Professor Huet to ask him about Sanskrit and Computational Linguistics.

Recipient of the prestigious EATCS Award in 2009, Professor Huet is Emeritus at Inria (the French National Institute for Research in Computer Science and Automation) and was Directeur de Recherche de Classe Exceptionnelle from 1989 to 2013. He is a member of the French Academy of Sciences and of Academia Europaea.

From the year 2000 onwards, he has worked and contributed immensely to Computational Linguistics. Author of a Sanskrit-French hypertext dictionary, he has developed various tools for the phonetical, morphological and lexical analysis of Sanskrit, such as the Zen toolkit. From this research evolved a new paradigm for relational programming, inspired from Samuel Eilenberg’s X-machines.

Professor Huet was Program Chair and local organizer of the First International Sanskrit Computational Symposium in Paris in October 2007, member of the Program Committee of the second one at Brown University in 2008, co-Program Chair of the Third International Sanskrit Computational Symposium in Hyderabad in January 2009, the fourth one at JNU in Delhi in 2010 and the fifth one at IIT Bombay in January 2013, the 6th one at IIT Kharagpur in October 2019. He is founding member of the Steering Committee of this series of symposia.  

Pro – Vice Chancellor of University of Hyderabad Professor Rajasekhar with Professor Huet

He is principal investigator on the French side of a joint team on Sanskrit Computational Linguistics between Inria and University of Hyderabad since 2007.  Professor Huet’s talk at the University of Hyderabad this week, titled Pāṇini’s Machine, was about how Pāṇini’s grammar may be thought to be the operational manual of an abstract machine. A note on the lecture by the university, says “this machine performs the grammatical operations prescribed or permitted in the Aṣṭādhyāyī sutras. It produces recursively a correct Sanskrit enunciation as a sign pairing the phonetic signifier and its signified sense. Its proper operation yields both the utterance as a phonetic stream and the intended meaning of a correct Sanskrit sentence. This view places Pāṇini as a precursor in a long list of automata inventors such as Turing, Babbage, Pascal, thus adding to his fame as a renowned linguist.”

In his talk, Professor Huet briefly explained how ‘formal methods used in Aṣṭādhyāyī are anticipating computer sciences control and data structure and show a keen understanding of information theory’.

What interests you most about Sanskrit? What was your first introduction to the language?

Professor Huet: I was interested in Sanskrit as a key to understand the traditional culture of ancient India, and was fascinated by the fact that this culture is still alive, as opposed to say Greek culture, where all that remains are frozen artefacts like ruins of ancient monuments, and Homeric literature that has lost its connections to the present.

How can the design and implementation of computer-aided processing tools help in analysing the enormous store of knowledge and literature available as Sanskrit text?

Professor Huet: These tools may help in several ways.

Firstly, they will allow texts preservation in a better way than just letting physical documents deteriorate with time – a lot of manuscripts are still only available in fragile form such as palm leaves or birch bark, documents which have been digitalized under photographic form are less useful than searcheable character-level representations, themselves less useful than word-level segmented documents, etc.

Our tools allow the representation of marked-up documents, where words are indicated with their lemmatization, indicating their morphological parameters (case, number, gender, person, tense, voice, etc) or even their semantic parameters (dependency graphs, anaphora antecedents, word disambiguation, name-entity links, etc). They can be considered as some kind of first-level interpretation of the texts. For instance, सेनाभाव may be segmented as senā-bhāva (existence of army) or as senā-abhāva (inexistence of army). Choosing one or the other gives opposite meanings. Even a text such as Bhagavadgītā is not segmented in the same way by Śaṅkara and Madhva.

This allows the progressive establishment of data banks of marked-texts, which may be subject to error-correction, alignment of versions, establishment of phylogenetic trees for use by philologists in dating versions, detecting inter-textuality relations, and preparing critical editions.

Our grammar-informed tools are thus preparing the ground for the use of more automated statistical or neuronal analysers, trained on our tagged corpus, which will be able to scan and analyse massive quantities of texts.

Another use of our tools is to give new methods for teaching the language, alleviating the burdensome initial investment in learning the script, learning complex phonology rules, complex un-sandhi analysis, complex morphology: the student may dive directly into the text, and concentrate on its meaning with the help of dictionaries linked to the analysed texts. This is very important, since it is next to impossible to translate Sanskrit in non-Indian languages. Not only terms like dharma, karma, moksha, etc. are very hard to translate without their context, but poetry uses complex figures of speech (alaṃkāra) such as upamā, yamaka, rūpaka, sasaṃdeha, paryāyokta, śleṣa, virodha, etc. which are totally untranslatable and must be enjoyed in the original text.

Please can you explain briefly your segmenter for Sanskrit.

Professor Huet: The segmenter is lexicon-directed and uses finite-state transducers technology. That is, I build a database of inflected forms by expanding morphology generation processes on a lexicon of elementary word stems and roots, and then I build specialized transducers that segment the text by guessing sandhi transitions between padas. The full technical explanation and justification is explained in http://gallium.inria.fr/~huet/PUBLIC/SALA.pdf.

How does Paninian Grammar anticipate and show an understanding of information theory

Professor Huet: This is not easy to explain succinctly. You have to look into Paninian encodings and see how these encodings can be put in the context of coding theory in the sense of Shannon and minimizing entropy. In a nutshell, you may explain that Panini used encodings that permitted optimal compression of his notations, and allowed to express the grammar in 4000 terse sutras, whereas a more naive organisation would have necessitated a much larger repertory of rules, and thus forbid the complete memorizing of the grammar.

Another remark of this nature is that the Shivasutras are a way of expressing very concisely all the subsets of phonemes that are necessary to express regularities in the grammar, like « for all nasals, do this » where nasals is expressed in the condensed definition (pratyāhāra) ñam.

The optimality of the representation of the Shivasutras has been recently demonstrated by the German scholar Wiebke Petersen.

Sanskrit cannot be reduced to a universal system of signs, it is also co-extensive with Indian culture? How can structural semantics take this into account?

Structural semantics is universal, and in this sense is not sufficient to represent cultural context. Paninian methods are also to an extent universal, and have been used to express other languages by the Akshar Bharati group of IIIT Hyderabad (Rajiv Sangal, Chatanya, Amba Kulkarni, Dipti Sharma), and thus Paninian methods are not specific to Indian cultural aspects. Cultural aspects go beyond the grammar. They are of semiotic nature, beyond linguistics. You must go into literary theory (Anandavardhana, Abhinavagupta, Dandin, etc) and aesthetics in order to account for cultural aspects.

Leave a Reply

Your email address will not be published. Required fields are marked *

three − one =