Evaluation of different classifiers for sinhala pos tagging

dc.contributor.authorFernando, S
dc.contributor.authorRanathunga, S
dc.contributor.editorChathuranga, D
dc.date.accessioned2022-09-01T09:37:38Z
dc.date.available2022-09-01T09:37:38Z
dc.date.issued2018-05
dc.description.abstractThis paper presents a comparative evaluation of three state-of-the-art classifiers for Sinhala Parts-of-Speech (POS) tagging. Support Vector Machines (SVM), Hidden Markov Models (HMM) and Conditional Random Fields (CRF) based POS tagger models are generated and tested using different combinations of a corpus of news articles and a corpus of official government documents. CRF is used for the first time in Sinhala POS tagging, thus the best feature set is experimentally derived. To further improve the accuracy of POS tagging, a majority voting based ensemble tagger is created using three individual taggers. This ensemble tagger achieved the highest accuracy in POS tagging than any individual tagger. The two domains (news, and official government documents) used in this study have noticeable differences in writing style and vocabulary. Generating domain specific POS taggers is time consuming and costly due to the overhead involved in creating and manually tagging domain specific corpora, for low resourced languages in particular. Therefore, this study also evaluates the possibility and successfulness of using corpora of different domains in training and testing phases of aforementioned machine learning techniques.en_US
dc.identifier.citationS. Fernando and S. Ranathunga, "Evaluation of Different Classifiers for Sinhala POS Tagging," 2018 Moratuwa Engineering Research Conference (MERCon), 2018, pp. 96-101, doi: 10.1109/MERCon.2018.8421997.en_US
dc.identifier.conference2018 Moratuwa Engineering Research Conference (MERCon)en_US
dc.identifier.departmentEngineering Research Unit, University of Moratuwaen_US
dc.identifier.doi10.1109/MERCon.2018.8421997en_US
dc.identifier.facultyEngineering
dc.identifier.pgnospp. 96-101en_US
dc.identifier.placeMoratuwa, Sri Lankaen_US
dc.identifier.proceedingProceedings of 2018 Moratuwa Engineering Research Conference (MERCon)en_US
dc.identifier.urihttp://dl.lib.uom.lk/handle/123/18833
dc.identifier.year2018en_US
dc.language.isoenen_US
dc.publisherIEEEen_US
dc.relation.urihttps://ieeexplore.ieee.org/document/8421997en_US
dc.subjectSinhalaen_US
dc.subjectParts-of-Speech (POS)en_US
dc.subjectHMMen_US
dc.subjectCRFen_US
dc.subjectEnsembleen_US
dc.subjectEnsembleen_US
dc.titleEvaluation of different classifiers for sinhala pos taggingen_US
dc.typeConference-Full-texten_US

Files

Collections