Short Tamil sentence similarity calculation using knowledge-based and corpus-based similarity measures

dc.contributor.authorSelvarasa, A
dc.contributor.authorThirunavukkarasu, N
dc.contributor.authorRajendran, N
dc.contributor.authorYogalingam, C
dc.contributor.authorRanathunga, S
dc.contributor.authorDias, G
dc.date.accessioned2018-08-20T21:00:28Z
dc.date.available2018-08-20T21:00:28Z
dc.description.abstractSentence similarity calculation plays an important role in text processing-related research. Many unsupervised techniques such as knowledge-based techniques, corpus-based techniques, string similarity based techniques, and graph alignment techniques are available to measure sentence similarity. However, none of these techniques have been experimented with Tamil. In this paper, we present the first-ever system to measure semantic similarity for Tamil short phrases using a hybrid approach that makes use of knowledge-based and corpus-based techniques. We tested this system with 2000 general sentence pairs and 100 mathematical sentence pairs. For the dataset of 2000 sentence pairs, this approach achieved a Mean Squared Error of 0.195 and a Pearson Correlation factor of 0.815. For the 100 mathematical sentence pairs, this approach achieved an 85% of accuracy.en_US
dc.identifier.conferenceMoratuwa Engineering Research Conference - MERCon 2017en_US
dc.identifier.departmentDepartment of Computer Science and Engineeringen_US
dc.identifier.emailanutharsha.12@cse.mrt.ac.lken_US
dc.identifier.emailnilathiru.12@cse.mrt.ac.lken_US
dc.identifier.emailniveathika.12@cse.mrt.ac.lken_US
dc.identifier.emailchinthoorie.12@cse.mrt.ac.lken_US
dc.identifier.emailsurangika@cse.mrt.ac.lken_US
dc.identifier.emailgihan@uom.lken_US
dc.identifier.facultyEngineeringen_US
dc.identifier.placeMoratuwa, Sri Lankaen_US
dc.identifier.urihttp://dl.lib.mrt.ac.lk/handle/123/13405
dc.identifier.year2017en_US
dc.language.isoenen_US
dc.subjectSentence similarity, Tamil, Knowledge-based, corpus-basedcorpus-baseden_US
dc.titleShort Tamil sentence similarity calculation using knowledge-based and corpus-based similarity measuresen_US
dc.typeConference-Abstracten_US

Files

Collections