Automatic creation of a word aligned Sinhala-Tamil parallel corpus

dc.contributor.authorMohamed, MZ
dc.contributor.authorIhalapathirana, A
dc.contributor.authorHameed, RA
dc.contributor.authorPathirennehelage, N
dc.contributor.authorRanathunga, S
dc.contributor.authorJayasena, S
dc.contributor.authorDias, G
dc.date.accessioned2018-07-31T18:48:24Z
dc.date.available2018-07-31T18:48:24Z
dc.date.issued2017
dc.description.abstractA parallel corpus aligned at both sentence and word level is an important prerequisite in statistical machine translation. However, manual creation of such a parallel corpus is time consuming, and requires experts fluent in both languages. This paper presents the first ever empirical evaluation carried out to identify the best unsupervised word alignment technique for Sinhala and Tamil. It also presents a novel approach that combines the output of individual aligners, which outperforms the solitary use of these aligners. Sentence aligned parallel text from annual reports and letters of Sri Lankan Government institutions, and order papers from the Parliament of Sri Lanka were used in the evaluation.en_US
dc.identifier.conferenceMoratuwa Engineering Research Conference - MERCon 2017en_US
dc.identifier.departmentDepartment of Computer Science and Engineeringen_US
dc.identifier.emailmaryamzi.12@cse.mrt.ac.lken_US
dc.identifier.emailanusha.12@cse.mrt.ac.lken_US
dc.identifier.emailriyafa.12@cse.mrt.ac.lken_US
dc.identifier.emailpnadeeshani.12@cse.mrt.ac.lken_US
dc.identifier.emailsurangika@cse.mrt.ac.lken_US
dc.identifier.emailsanath@cse.mrt.ac.lken_US
dc.identifier.emailgihan@cse.mrt.ac.lken_US
dc.identifier.facultyEngineeringen_US
dc.identifier.placeMoratuwa, Sri Lankaen_US
dc.identifier.urihttp://dl.lib.mrt.ac.lk/handle/123/13337
dc.identifier.year2017en_US
dc.subjectword alignment; parallel corpus; sinhala; tamilen_US
dc.titleAutomatic creation of a word aligned Sinhala-Tamil parallel corpusen_US
dc.typeConference-Abstracten_US

Files

Collections