Machine Recognition and Morphological Analysis of Subanta-padas
Abstract Category: Other Categories
Course / Degree: M.Phil.
Institution / University: Jawaharlal Nehru University, New Delhi, India
Published in: 2006
The Indian Heritage Group of the Centre for Development of Advanced Computing (CDAC) has developed a system called DESIKA, which claims to process all the words of Sanskrit and includes generation and analysis (parsing). It claims to have an exhaustive database based on amarako?a, a rule-base using the grammar rules of Panini's AshTadhAyI and heuristics based on nyAya & mimAsA ShAstras for semantic and contextual processing. However, the system (as available at the TDIL site ) has subanta generation only and even that does not work properly.
Huet has developed a Grammatical Analyzer System, which tags subanta-padas by analyzing sandhi, samÄÂsa and sup affixation. This system is available online at: http://pauillac.inria.fr/~huet/SKT/sanskrit.html. The system suffers from weaknesses in terms of not being rooted in the Panini's system. As a result, there are so many errors that it practically becomes unusable. Secondly, the Huet's system takes phrases and not full sentences or texts.
The Rashtriya Sanskrit Vidyapeeth, Tirupathi under the leadership of Prof. K. V. Ramakrishnamacharyulu (currently Vice Chancellor of Rajasthan Sanskrit University) has done commendable work on the Sansk-net project. This Project was proposed by the Indian Heritage Group (IHG), Real-Time Systems Group (RTSG), and Center for Development of Advanced Computing (C-DAC), Bangalore, to be an initiative with Rashtriya Sanskrit Vidyapeetha (RSVP), Tirupati. The objectives of this project are- to present the database available in different institutions in a computer framework, develop the hardware, software and the technical capability to place the information in the modern technical framework, computer linkage among different institutions so that each institution can have access to the database available in the other institutions, make use of the principles and techniques available in nyAya, vyAkara-vedAnta and vedAnga for developing new paradigms for the computer, packages for training for the faculties in the scientific work and ??straic world for making best use of the infrastructural facility and facilitate preservation of the information on rare manuscript, Vedic literature and shAstras. Prof. Vineet Chaitanya and Amba Kulkarni are visiting the institution and are currently guiding several Sanskrit R&D initiatives with far reaching consequences.
Vanasthali Vidyapeeth, Vanasthali, Rajasthan, has also been working on Sanskrit. Jawaharlal Nehru University finished the CASTLE (Computer Assisted Sanskrit Teaching and Learning Environment) project and some related work in this area like Sanskrit processing tools and Sanskrit authoring system. Some of these may be available on the TDIL website http//tdil.mit.gov.in.
Academy of Sanskrit Research, Melkote, Mysore has been actively involved in bringing scholars doing technology R&D for Sanskrit and shAstras on a single platform. In 1993, it organized a seminar on Sanskrit and computer based linguistics and in 1994, a seminar on Interface Mechanisms in ??stras and Computer Science. The latter, among other things, brought out similarities in the traditional Indian theories and principles of Artificial Intelligence.
The Special Centre for Sanskrit Studies, Jawaharlal Nehru University, New Delhi is currently engaged in the following R&D - kAraka Analyzer, sandhi splitter and analyzer, verb analyzer, NP gender agreement, POS tagging of Sanskrit, online Multilingual amarakoaha, Panni's AshTadhyAyI search engine, online MahAbhArata indexing and Jha (2006) presented a model of Sanskrit Analysis System (SAS) . The RCILTS project under Prof. G.V. Singh at the School of Computer and Systems Sciences has prepared useful linguistic resources for Sanskrit.
Morphological analyzers for Sanskrit, Telugu, Hindi, Marathi, Kannada and Punjabi have been developed by Akshara Bharathi Group at Indian Institute of Technology , Kanpur, and University of Hyderabad funded by Ministry of Information Technology the project claims to have 95% coverage for Telugu (arbitrary text in modern standard Telugu), and 88% coverage for Hindi. This system is available on the site for downloading as well as online at: http://www.iiit.net/ltrc/morph/index.htm
Anusaaraka (developed by Akshar Bharati group, IIIT, Hyderabad) is a computer software which renders text from one Indian language into another, a sort of machine translation. It produces output which is comprehensible to the reader, although at times it might not be grammatical. The system is available at the IIIT Hyderabad site )
How is this work different?
The work is different from existing research in the following ways ?
1. no online RDBMS based recognizer-analyzer is available till date, which accepts and displays results in Unicode Devanagari script but this system takes Unicode Devanagri text and displays results in Devanagari,
2. this system takes Devanagari utf-8 text as input and delivers Devanagari utf-8 text output using a Java servlet ? Apache-Tomcat - JDBC - RDBMS technology,
3. gives a comprehensive computational analysis of subanta-padas in a Sanskrit text, and does basic tagging of verbs and avyayas too,
4. uses a hybrid approach of P??an formalism and example-based techniques to process input text. It works on the morphological nature of bases and applies the vibhakti information for processing,
5. the system can be used for larger processing of Sanskrit for text simplification and machine translation
Summary of chapters
Chapter I discusses morphological analyzers, current status of R&D in this field, structure and organization of of AshTAdhyAyI (AD), and subanta of Panini.
Chapter II discusses subanta formalism of Panini and mechanisms to recognize verb, avyaya and subanta in Sanskrit text.
Chapter III discusses the analysis of subanta-padas.
Chapter IV discusses the implementation aspects ? the front end, Java objects, databases, linguistic resources (corpus and rule bases and example bases), how they work and what is basic requirement of the system and how to apply sandhi and subanta rule where ever necessary.
Conclusion discusses future R&D, limitations of the system and result analysis.
Limitations
1. Some verbs have the same form as subantas, for example paThati,the system will exclude such subantas as verbs.
2. The morphological ambiguity of several vibhaktis like raamaabhyaam persist when processed in isolation. These can be solved at the larger level.
3. It does not split the samAsas into constituent subantas by way of reverse sandhi so that reverse subanta can be done. This will be implemented with the samAsas component.
4. In some cases, the recognition of the base form is ambiguous, for example, for the 'h' ending and 'sh' ending prAtipadikas, the last characters change in 'T' and 'D'respectively. So system ca not recognize correct prAtipadikas. In this condition, the system will give other possible results. For example: liT liD, tAdRUsh etc.
Thesis Keywords/Search Tags:
Computational Linguistics, Morphological analyser, Languages technology
This Thesis Abstract may be cited as follows:
No user preference. Please use the standard reference methodology.
Submission Details: Thesis Abstract submitted by Subhash Chandra from India on 20-Mar-2007 11:04.
Abstract has been viewed 3096 times (since 7 Mar 2010).
Subhash Chandra Contact Details: Email: subhash.jnu@gmail.com
Disclaimer
Great care has been taken to ensure that this information is correct, however ThesisAbstracts.com cannot accept responsibility for the contents of this Thesis abstract titled "Machine Recognition and Morphological Analysis of Subanta-padas". This abstract has been submitted by Subhash Chandra on 20-Mar-2007 11:04. You may report a problem using the contact form.
© Copyright 2003 - 2025 of ThesisAbstracts.com and respective owners.