A Deep syntactic parser for the Tamil language
Loading...
Date
2022
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Natural Language Processing (NLP) applications have become integral to human life.
A syntactic parser is a vital linguistic tool that shows syntactic relations between the
words in a sentence. These may then be mapped to a tree, a graph, or a formal structure.
Syntactic parsers are helpful for building other NLP applications. In addition,
they help linguists to understand a language better and perform cross-lingual linguistic
analysis. A syntactic parser that performs a deeper analysis and captures argumentative,
attributive and coordinative relations between the words of a given sentence is
called a deep syntactic parser. Tamil is considered a low-resourced language in terms of
tools, applications, and resources available for others to use and build NLP applications
or carry out linguistic analyses. Not many resources, such as treebanks and annotated
corpora, or linguistic analysis tools such as POS taggers or morphological analysers, are
publicly available for Tamil. Available off-the-shelf language-agnostic syntactic parsers
show comparatively low performance because of the rich morphosyntactic properties of
Tamil. This study elaborates on how I developed the first grammar-driven parser for
Tamil, which uses the Lexical-Functional Grammar formalism, and a state-of-the-art
data-driven parser using the Universal Dependencies framework. I have also proposed
an approach to evaluate a syntactic parser’s syntactical coverage, experimented with
transition-based and graph-based approaches, and for the first time, tried multi-lingual
training to develop a data-driven parser for Tamil. A part of speech tagger, a morphological
analyser cum generator, pre-processing tools, and treebanks are the other tools
and resources I have developed to facilitate the development of the parsers. While all
these tools give the current best score for their respective tasks, these resources are
also available online for others to build upon. Moreover, the study also documents my
contributions toward understanding different linguistic aspects of the Tamil language.
Description
Keywords
DEEP SYNTACTIC PARSER, MORPHOLOGICAL ANALYSE, GRAMMAR-DRIVEN PARSER, DATA-DRIVEN PARSER, PART OF SPEECH TAGGER, INFORMATION TECHNOLOGY -Dissertation, COMPUTER SCIENCE -Dissertation
Citation
Sarveswaran, K. (2022). A Deep syntactic parser for the Tamil language [Doctoral dissertation, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/21176