Automatic creation of a word aligned Sinhala-Tamil parallel corpus

Mohamed, MZ; Ihalapathirana, A; Hameed, RA; Pathirennehelage, N; Ranathunga, S; Jayasena, S; Dias, G

Automatic creation of a word aligned Sinhala-Tamil parallel corpus

dc.contributor.author	Mohamed, MZ
dc.contributor.author	Ihalapathirana, A
dc.contributor.author	Hameed, RA
dc.contributor.author	Pathirennehelage, N
dc.contributor.author	Ranathunga, S
dc.contributor.author	Jayasena, S
dc.contributor.author	Dias, G
dc.date.accessioned	2018-07-31T18:48:24Z
dc.date.available	2018-07-31T18:48:24Z
dc.date.issued	2017
dc.description.abstract	A parallel corpus aligned at both sentence and word level is an important prerequisite in statistical machine translation. However, manual creation of such a parallel corpus is time consuming, and requires experts fluent in both languages. This paper presents the first ever empirical evaluation carried out to identify the best unsupervised word alignment technique for Sinhala and Tamil. It also presents a novel approach that combines the output of individual aligners, which outperforms the solitary use of these aligners. Sentence aligned parallel text from annual reports and letters of Sri Lankan Government institutions, and order papers from the Parliament of Sri Lanka were used in the evaluation.	en_US
dc.identifier.conference	Moratuwa Engineering Research Conference - MERCon 2017	en_US
dc.identifier.department	Department of Computer Science and Engineering	en_US
dc.identifier.email	maryamzi.12@cse.mrt.ac.lk	en_US
dc.identifier.email	anusha.12@cse.mrt.ac.lk	en_US
dc.identifier.email	riyafa.12@cse.mrt.ac.lk	en_US
dc.identifier.email	pnadeeshani.12@cse.mrt.ac.lk	en_US
dc.identifier.email	surangika@cse.mrt.ac.lk	en_US
dc.identifier.email	sanath@cse.mrt.ac.lk	en_US
dc.identifier.email	gihan@cse.mrt.ac.lk	en_US
dc.identifier.faculty	Engineering	en_US
dc.identifier.place	Moratuwa, Sri Lanka	en_US
dc.identifier.uri	http://dl.lib.mrt.ac.lk/handle/123/13337
dc.identifier.year	2017	en_US
dc.subject	word alignment; parallel corpus; sinhala; tamil	en_US
dc.title	Automatic creation of a word aligned Sinhala-Tamil parallel corpus	en_US
dc.type	Conference-Abstract	en_US

Collections

MERCon - 2017

Automatic creation of a word aligned Sinhala-Tamil parallel corpus

Files

Collections