Browsing by Author "Hameed, RA"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
- item: Article-AbstractAutomatic Creation of a Sentence Aligned Sinhala-Tamil Parallel CorpusHameed, RA; Pathirennehelage, N; Ihalapathirana, A; Mohamed, MZ; Ranathunga, VSD; Jayasena, S; Dias, G; Fernando, SA sentence aligned parallel corpus is an important prerequisite in statistical machine translation. However, manual creation of such a parallel corpus is time consuming, and requires experts fluent in both languages. Automatic creation of a sentence aligned parallel corpus using parallel text is the solution to this problem. In this paper, we present the first ever empirical evaluation carried out to identify the best method to automatically create a sentence aligned Sinhala-Tamil parallel corpus. Annual reports from Sri Lankan government institutions were used as the parallel text for aligning. Despite both Sinhala and Tamil being under-resourced languages, we were able to achieve an F-score value of 0.791 using a hybrid approach that makes use of a bilingual dictionary.
- item: Conference-AbstractAutomatic creation of a word aligned Sinhala-Tamil parallel corpus(2017) Mohamed, MZ; Ihalapathirana, A; Hameed, RA; Pathirennehelage, N; Ranathunga, S; Jayasena, S; Dias, GA parallel corpus aligned at both sentence and word level is an important prerequisite in statistical machine translation. However, manual creation of such a parallel corpus is time consuming, and requires experts fluent in both languages. This paper presents the first ever empirical evaluation carried out to identify the best unsupervised word alignment technique for Sinhala and Tamil. It also presents a novel approach that combines the output of individual aligners, which outperforms the solitary use of these aligners. Sentence aligned parallel text from annual reports and letters of Sri Lankan Government institutions, and order papers from the Parliament of Sri Lanka were used in the evaluation.