Identifying harmful comments for Tamil language on social media

Thumbnail Image

Date

2022

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The era of social media, such as YouTube, Facebook, and Twitter adding comments to posts are being fun in the daily life of people. But this is also used to spread hate speech and organize hate based activities increasingly nowadays. Harmful and offensive text identification on social media platforms is being a trending research area over the last few years. In a country like Sri Lanka with multiple native languages, people like to comment on social media mostly in their native language. Tamil is one of the Languages commonly used and spoken in the North and East part of Sri Lanka. In recent years people like to comment not only in their native language they also comment in more than one language. In Sri Lanka, people use Singlish (Sinhala + English ) or Tanglish (Tamil + English). Because of the rapid growth of hateful content on social media, there is an immediate need for an efficient and effective method to identify harmful content. A huge number of researches have been done and are being done for automated harmful content detection online. The complication of the Natural Language constructs builds this task very challenging. A maximum of the research are done in the English Language. This research work aims to classify the code-mixed Tamil comments on social media by categorizing them as harmful and non-harmful by using machine learning models.

Description

Keywords

HARMFUL CONTENT, TEXT MINING, SOCIAL MEDIA, TAMIL LANGUAGE – Tools, INFORMATION TECHNOLOGY- Dissertation, COMPUTER SCIENCE - Dissertation

Citation

SivalIngam, D. (2022). Identifying harmful comments for Tamil language on social media [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/20325

DOI