A Deep syntactic parser for the Tamil language

dc.contributor.advisorDias G
dc.contributor.advisorButt M
dc.contributor.authorSarveswaran K
dc.date.accept2022
dc.date.accessioned2022
dc.date.available2022
dc.date.issued2022
dc.description.abstractNatural Language Processing (NLP) applications have become integral to human life. A syntactic parser is a vital linguistic tool that shows syntactic relations between the words in a sentence. These may then be mapped to a tree, a graph, or a formal structure. Syntactic parsers are helpful for building other NLP applications. In addition, they help linguists to understand a language better and perform cross-lingual linguistic analysis. A syntactic parser that performs a deeper analysis and captures argumentative, attributive and coordinative relations between the words of a given sentence is called a deep syntactic parser. Tamil is considered a low-resourced language in terms of tools, applications, and resources available for others to use and build NLP applications or carry out linguistic analyses. Not many resources, such as treebanks and annotated corpora, or linguistic analysis tools such as POS taggers or morphological analysers, are publicly available for Tamil. Available off-the-shelf language-agnostic syntactic parsers show comparatively low performance because of the rich morphosyntactic properties of Tamil. This study elaborates on how I developed the first grammar-driven parser for Tamil, which uses the Lexical-Functional Grammar formalism, and a state-of-the-art data-driven parser using the Universal Dependencies framework. I have also proposed an approach to evaluate a syntactic parser’s syntactical coverage, experimented with transition-based and graph-based approaches, and for the first time, tried multi-lingual training to develop a data-driven parser for Tamil. A part of speech tagger, a morphological analyser cum generator, pre-processing tools, and treebanks are the other tools and resources I have developed to facilitate the development of the parsers. While all these tools give the current best score for their respective tasks, these resources are also available online for others to build upon. Moreover, the study also documents my contributions toward understanding different linguistic aspects of the Tamil language.en_US
dc.identifier.accnoTH5064en_US
dc.identifier.citationSarveswaran, K. (2022). A Deep syntactic parser for the Tamil language [Doctoral dissertation, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/21176
dc.identifier.degreeDoctor of Philosophyen_US
dc.identifier.departmentDepartment of Computer Science and Engineeringen_US
dc.identifier.facultyEngineeringen_US
dc.identifier.urihttp://dl.lib.uom.lk/handle/123/21176
dc.language.isoenen_US
dc.subjectDEEP SYNTACTIC PARSERen_US
dc.subjectMORPHOLOGICAL ANALYSEen_US
dc.subjectGRAMMAR-DRIVEN PARSERen_US
dc.subjectDATA-DRIVEN PARSERen_US
dc.subjectPART OF SPEECH TAGGERen_US
dc.subjectINFORMATION TECHNOLOGY -Dissertationen_US
dc.subjectCOMPUTER SCIENCE -Dissertationen_US
dc.titleA Deep syntactic parser for the Tamil languageen_US
dc.typeThesis-Abstracten_US

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
TH5064-1.pdf
Size:
314.5 KB
Format:
Adobe Portable Document Format
Description:
Pre-Text
Loading...
Thumbnail Image
Name:
TH5064-2.pdf
Size:
161.46 KB
Format:
Adobe Portable Document Format
Description:
Post-Text
No Thumbnail Available
Name:
TH5064.pdf
Size:
2.44 MB
Format:
Adobe Portable Document Format
Description:
Full-theses