Analysing information quality of Wikipedia articles

dc.contributor.advisorAhangama Supunmali
dc.contributor.advisorAhangama Sapumal
dc.contributor.authorSirisoma WCS
dc.date.accept2022
dc.date.accessioned2022
dc.date.available2022
dc.date.issued2022
dc.description.abstractUser Generated Content (UGC) is growing in significance for information sharing along with the introduction of Web 2.0. Being one of the largest UGC databases in the world, Wikipedia also stands as the largest community-based collaborative encyclopedia ever created. However, Wikipedia's open-source and collaborative structure presents a serious information quality (IQ) concern. Malicious users take advantage of Wikipedia's popularity on the World Wide Web (WWW) when conducting malicious activities such as link spamming. Wikipedia is therefore often discouraged for use in academic-related activities and research. However. there are some high-quality articles that are both rich in information and quality. Statistical models and machine learning algorithms have been used in existing methods for determining Wikipedia's IQ. However, the outcomes of these models are not satisfactory. Therefore, in this study a novel theoretical model for evaluating IQ is presented, based on Google's E-A-T framework. The model comprises three IQ constructs Expertise, Authority and Trustworthiness. A collection of IQ dimensions that affect the aforementioned three IQ constructs as well as 45 IQ attributes to assess the IQ dimensions were identified and presented based on empirical findings and study results. A Selenium 3.14 web automation script was used to automatically and inexpensively extract the IQ attributes from Wikipedia articles' content and metadata statistics. The data study employed a sample of 2000 articles from six WikiProjects, including 1000 Featured Articles (FA) and 1000 non-FA articles. The suggested model's classification and clustering accuracies were compared to those of three previously published models. The proposed model was compared with three previously published models in terms of classification and clustering accuracy. It received classification and clustering accuracies of 95% and 93% respectively, which is a drastic improvement over the existing models. Furthermore, an average inter-rater agreement of 84% was observed. Accordingly, this comprehensive experiment fairly validates the effectiveness of the suggested model. This study contributes to the related knowledge area by introducing a novel framework to assess Wikipedia articles’ IQ.en_US
dc.identifier.accnoTH5079en_US
dc.identifier.citationSirisoma, W.C.S. (2022). Analysing information quality of Wikipedia articles [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa.http://dl.lib.uom.lk/handle/123/22436
dc.identifier.degreeMSc in Information Technology By researchen_US
dc.identifier.departmentDepartment of Information Technologyen_US
dc.identifier.facultyITen_US
dc.identifier.urihttp://dl.lib.uom.lk/handle/123/22436
dc.language.isoenen_US
dc.subjectINFORMATION QUALITYen_US
dc.subjectWIKIPEDIAen_US
dc.subjectARTICLESen_US
dc.subjectCOMPUTER SCIENCE- Dissertationen_US
dc.subjectINFORMATION TECHNOLOGY- Dissertationen_US
dc.subjectCOMPUTER SCIENCE & ENGINEERING - Dissertationen_US
dc.titleAnalysing information quality of Wikipedia articlesen_US
dc.typeThesis-Abstracten_US

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
TH5079-1.pdf
Size:
156.85 KB
Format:
Adobe Portable Document Format
Description:
Pre-Text
Loading...
Thumbnail Image
Name:
TH5079-2.pdf
Size:
162.11 KB
Format:
Adobe Portable Document Format
Description:
Post- Text
No Thumbnail Available
Name:
TH5079.pdf
Size:
1.93 MB
Format:
Adobe Portable Document Format
Description:
Full theses

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: