Classification of cyberbullying Sinhala language comments on social media

Loading...
Thumbnail Image

Date

2020-07

Journal Title

Journal ISSN

Volume Title

Publisher

IEEE

Abstract

Due to technological revolution over the years, bullying which was confined to physical boundaries has now moved online. Denigration or insult is one form of cyberbullying. According to Sri Lanka Computer Emergency Readiness Team, social media cyberbullying incidents are escalating. Insulting words are dynamic, and same word can have several meanings according to the context. Simply because a comment contains such a word, it cannot be classified as bullying. Hence, when labeling comments, simple keyword spotting techniques are inadequate. Other languages have addressed this issue using lexical databases such as WordNet which provides synonyms and homonyms of words. Since there is no proper lexical database developed for Sinhala language, detecting a word as bullying is a challenge. Therefore, we used rules to overcome this issue. Twitter comments with profane words were collected, outliers were removed, and remaining tweets were pre-processed. To determine insult in the text, five rules were used for feature extraction. Afterward, we applied Support Vector Machine (SVM), K-nearest neighbor (KNN) and Naïve Bayes algorithms. The results show that SVM with an RBF kernel performs better with an F1-score of 91%. Novelty of this research is the focus on Sinhala language cyberbully detection which has not been addressed before.

Description

Keywords

cyberbullying, social media, text mining, sentiment analysis, machine learning

Citation

H. M. A. Ishara Amali and S. Jayalal, "Classification of Cyberbullying Sinhala Language Comments on Social Media," 2020 Moratuwa Engineering Research Conference (MERCon), 2020, pp. 266-271, doi: 10.1109/MERCon50084.2020.9185209.

Collections