ACTSEA : annotated corpus for Tamil & Sinhala emotion analysis

dc.contributor.authorJenarthanan, R
dc.contributor.authorSenarath, Y
dc.contributor.authorThayasivam, U
dc.date.accessioned2019-10-22T05:41:43Z
dc.date.available2019-10-22T05:41:43Z
dc.description.abstractThe purpose of text emotion analysis is to detect and recognize the classification of feeling expressed in text. In recent years, there has been an increase in text emotion analysis studies for English language since data were abundant. Due to the growth of social media large amount data are now available for regional languages such as Tamil and Sinhala as well. However, these languages lack necessary annotated corpus for many NLP tasks including emotion analysis. In this paper, we present our scalable semi-automatic approach to create an annotated corpus named ACTSEA for Tamil and Sinhala to support emotion analysis. Alongside, our analysis on a sample of the produced data and the useful findings are presented for the low resourced NLP community to benefit. For ACTSEA, data were gathered from twitter platform and annotated manually after cleaning. We collected 600280 (Tamil) and 318308 (Sinhala) tweets in total which makes our corpus largest data collection which is currently available for these languages.en_US
dc.identifier.conferenceMoratuwa Engineering Research Conference - MERCon 2019en_US
dc.identifier.departmentDepartment of Computer Science and Engineeringen_US
dc.identifier.facultyEngineeringen_US
dc.identifier.placeMoraruwa, Sri Lankaen_US
dc.identifier.urihttp://dl.lib.mrt.ac.lk/handle/123/15169
dc.identifier.year2019en_US
dc.language.isoenen_US
dc.subjectNLPen_US
dc.subjectEmotion Analysisen_US
dc.subjectSentiment Analysisen_US
dc.subjectEmotion Corpusen_US
dc.subjectMorphological Generatoren_US
dc.subjectCorpus Generatoren_US
dc.titleACTSEA : annotated corpus for Tamil & Sinhala emotion analysisen_US
dc.typeConference-Abstracten_US

Files

Collections