Short Tamil sentence similarity calculation using knowledge-based and corpus-based similarity measures

Selvarasa, A; Thirunavukkarasu, N; Rajendran, N; Yogalingam, C; Ranathunga, S; Dias, G

Short Tamil sentence similarity calculation using knowledge-based and corpus-based similarity measures

dc.contributor.author	Selvarasa, A
dc.contributor.author	Thirunavukkarasu, N
dc.contributor.author	Rajendran, N
dc.contributor.author	Yogalingam, C
dc.contributor.author	Ranathunga, S
dc.contributor.author	Dias, G
dc.date.accessioned	2018-08-20T21:00:28Z
dc.date.available	2018-08-20T21:00:28Z
dc.description.abstract	Sentence similarity calculation plays an important role in text processing-related research. Many unsupervised techniques such as knowledge-based techniques, corpus-based techniques, string similarity based techniques, and graph alignment techniques are available to measure sentence similarity. However, none of these techniques have been experimented with Tamil. In this paper, we present the first-ever system to measure semantic similarity for Tamil short phrases using a hybrid approach that makes use of knowledge-based and corpus-based techniques. We tested this system with 2000 general sentence pairs and 100 mathematical sentence pairs. For the dataset of 2000 sentence pairs, this approach achieved a Mean Squared Error of 0.195 and a Pearson Correlation factor of 0.815. For the 100 mathematical sentence pairs, this approach achieved an 85% of accuracy.	en_US
dc.identifier.conference	Moratuwa Engineering Research Conference - MERCon 2017	en_US
dc.identifier.department	Department of Computer Science and Engineering	en_US
dc.identifier.email	anutharsha.12@cse.mrt.ac.lk	en_US
dc.identifier.email	nilathiru.12@cse.mrt.ac.lk	en_US
dc.identifier.email	niveathika.12@cse.mrt.ac.lk	en_US
dc.identifier.email	chinthoorie.12@cse.mrt.ac.lk	en_US
dc.identifier.email	surangika@cse.mrt.ac.lk	en_US
dc.identifier.email	gihan@uom.lk	en_US
dc.identifier.faculty	Engineering	en_US
dc.identifier.place	Moratuwa, Sri Lanka	en_US
dc.identifier.uri	http://dl.lib.mrt.ac.lk/handle/123/13405
dc.identifier.year	2017	en_US
dc.language.iso	en	en_US
dc.subject	Sentence similarity, Tamil, Knowledge-based, corpus-basedcorpus-based	en_US
dc.title	Short Tamil sentence similarity calculation using knowledge-based and corpus-based similarity measures	en_US
dc.type	Conference-Abstract	en_US

Collections

2014-9th

Short Tamil sentence similarity calculation using knowledge-based and corpus-based similarity measures

Files

Collections