Doctor of Philosophy (Ph.D.)

Permanent URI for this collectionhttp://192.248.9.226/handle/123/12348

Browse

Recent Submissions

Now showing 1 - 6 of 6
  • item: Thesis-Full-text
    A cross platform framework for social media information diffusion analysis
    (2023) Caldera, HMM; Perera, GIUS
    In the current digital era, social media platforms have emerged as one of the most effec- tive channels for the diffusion of information. People may readily access and exchange information, news, and opinions from anywhere worldwide because of increasing social media usage. Information diffusion across multiplex social media platforms is one of the most prominent research problems ever. Social media content generators diffuse information on multiplex social media platforms by targeting many objectives such as popularity, online presence, hate targets, and customer engagement. Regardless of the ”content” posted on social media platforms, evaluating the dissemination velocity of each piece of content published on those platforms is essential. It will help to get an overall picture of ”how it flows” throughout the social media platforms. Most social media platforms have a platform-specific algorithm for calculating the degree of information diffusion on those platforms. The main objective of this research was to develop a method to calculate the velocity of information diffusion across multiplex social media platforms. Existing literature on information diffusion strategies, effects, and measurements was used to develop the proposed algorithm. The information diffusion velocity of so- cial media influencers varies according to the content. The platform-specific algorithms for diffusion strength detection vary based on the platform. Somehow, these platform- specific algorithms influence the community to engage with the trending content. i.e., platforms support increasing the strength of information diffusion. Conventional information diffusion algorithms were designed to measure content diffusion speed on a simplex social media platform, which might be content-specific. The missing dimension is ubiquitous nature. Hence, regardless of the platform, it is mandatory to calculate a ubiquitous information diffusion velocity over multiplex social media platforms. Both structured information diffusion in a graph for diffusion in a closed network and unstructured patterns in an open-ended coarse-grained information diffusion model check the importance of information diffusion on multiplex social media platforms. Time is another critical factor in defining velocity. i.e., a time series of information diffusion provides a rich picture of information diffusion. Event-driven architecture is a well-known software architectural approach that fa- cilitates the implementation of microservice-based solutions. The suggested algorithm utilizes an event-driven architecture to manage the information flow by processing social media events. Eventually, this research uses the event-triggering process to understand how information is propagated through an event-driven microservice architecture. Data science and artificial intelligence are being employed in information diffusion studies. Understanding how information spreads and the variables and features that influence it is another crucial study area of this research. There are several techniques for studying information dissemination using artificial intelligence. Applying artificial intelligence to information diffusion studies might improve our knowledge of ”How information travels” and ”how to disseminate information” in various circumstances efficiently. The research used natural language processing to evaluate the textual content of the social media post. That is to find a general textual meaning given by the end-user reactions. Event-driven architecture is one of the best possible for information diffusion an- alytics. Using event-driven architecture, data may be delivered in real-time to vari- ous analytics services, allowing for the speedy and effective processing of enormous amounts of data. This is especially true in today’s data-driven world, when businesses and organizations must make quick, well-informed decisions based on real-time data. Because of its event-driven nature, it is also simple to interface with other systems and services, making it a highly adaptable and versatile option for information distribution analytics. Since the diffusion of information starts with an event’s occurrence, it fol- lows numerous steps to flow among the community. An event-driven micro-services architecture that uses artificial intelligence methods (like natural language processing to evaluate textual information) has been experimented with to propose a simple solution for this complex problem. As per the research work, I can summarize the key findings. I have proposed a tree-structured diffusion tree that can explain how information flows through multiplex social networks. Under this multiplex context, I have experimented with multiple trees and a more robust graph that focused on the diffusion of information. The diffusion strength was based on the SIR model, and the time series analysis focused on how quickly information spread throughout the network. The proposed solution was tested in several real-world cases. Technique-specific tests like seasonality and autocorrelation were conducted to evaluate how the time-series model works in a graph context. Further tests like cohesiveness and robustness were tested, and the proposed algorithm achieved good robustness (an average of 75%) and cohesiveness (an average of 70%) in each case. The best experimental results show an average of more than 80% accuracy in any given instance, and it constructs the tree in less than a second. Most of the predicted values generated an average accuracy of around 70%. In summary, social media platforms have emerged as prominent channels for in- formation propagation within the contemporary digital landscape. Quantifying the velocity at which information propagates across diverse social networks presents a no- table challenge in research. While algorithms tailored to specific platforms influence community engagement, a ”universal metric for information dissemination strength” is necessary across multiple social media platforms. The envisioned algorithm considers time series data, integrating structured and unstructured patterns during construction. Keywords: Information diffusion analysis, Social Media Data Analytics, Graph Learning, Time series analysis, Event-driven micro-services, Artificial Intelligence, Natural Language Processing.
  • item: Thesis-Full-text
    Measuring trustworthiness of workers in the crowdsourced collection of subjective judgements
    (2023) Meedin, GSN; Perera, GIUS
    Social media platforms have become integral parts of our lives, enabling people to connect, share, and express themselves on a global scale. Alongside the benefits, there are also substantial challenges that arise from the unfiltered and unrestricted nature of these platforms. One such challenge is the presence of inappropriate and hateful content on social media. While platforms employ algorithms and human moderators to identify and remove inappropriate content, they often struggle to keep up with the constant flood of new posts. Social media posts are written in a variety of languages and multimedia formats. As a result, social media platforms find it more difficult to filter these before reaching a more diverse audience range, as moderation of these social media platform posts necessitates greater contextual, social, and cultural insights, as well as language skills. Social media platforms use a variety of techniques to capture these insights, and linguistic expertise to effectively moderate social media posts. These techniques help platforms better understand the degrees of content and ensure that inappropriate or harmful posts are accurately identified and addressed. These techniques include Natural Language Processing (NLP) algorithms, keyword and phrase detection, image and video recognition, contextual analysis, cultural sensitivity training, machine learning, AI improvement etc. Data annotation forms the foundation for training these algorithms and identifying and classifying various types of content accurately. Often crowdsourcing platforms such as Mechanical Turk and Crowd Flower are used to get the datasets annotated in these techniques. The accuracy of the annotation process is crucial for effective content moderation on social media platforms. Crowdsourcing platforms take several trust measures to maintain the quality of annotations and to minimize errors. In addition to these procedures, determining the trustworthiness of workers on crowdsourcing platforms is critical for ensuring the quality and reliability of the contributions they give. Accuracy metrics, majority voting, completion rate, inter-rater agreement, and reputation scores are a few such measurements used by existing researchers. Even though majority voting is used to ensure consensus, existing research shows that the annotated results do not reflect the actual user perception and hence the trustworthiness of the annotation is less. In this research, a crowdsourcing platform was designed and developed to allow the annotation process by overcoming the limitations of measuring trustworthiness which would facilitate identifying inappropriate social media content using crowd responses. Here the research focus was limited to social media content written in Sinhala and Sinhala words written in English (Singlish) letters as the most popular Mechanical Turk and Crowd Flower do not allow workers from Sri Lanka. As outcomes of this research, a few novel approaches were proposed, implemented, and evaluated for hate speech annotation, hate speech corpus generation, measuring user experience, identifying worker types and personality traits and hate speech post-identification. In addition, the implemented crowdsourcing platform can extend the task designs to other annotation tasks; language and inappropriate content identification, text identification from images, hate speech propagator ranking and sentiment analysis. When evaluating the quality of the results for accuracy and performance, it was identified that the consensus-based approach of ensuring the trustworthiness of crowdsourcing participants is highly affected by the crowd’s biases and the Hawthorne effect. Therefore, a comparison and analysis of the annotation quality of the crowdsourcing platform with consensus, reputation, and gold v standard-based approaches were conducted and a model to measure the trustworthiness of crowd response was developed. The major outcome of this research is the crowdsourcing platform that can be used for local annotation processes with the assurance of worker reliability. The number of tasks completed by the workers within a given period, the number of tasks attempted by each worker within a given period, the percentage of tasks completed compared to tasks attempted, time taken to complete tasks, the accuracy of responses considering golden rules, time taken to submit responses after each task assignment and the consistency of response time provided were identified as the quantitative measurements to assess the trustworthiness of workers. After this identification, the relationship between reputation score, performance score and bias score was formulated by analysing the worker responses. The worker behaviour model and trust measurement model showed an accuracy of 87% and 91% respectively after comparing with the expert response score which can be further improved by incorporating contextual analysis, worker belief and opinion analysis. The proposed methodology would accelerate data collection, enhance data quality, and would promote the development of high-quality labelled datasets. Keywords: Annotation, Collaboration, Crowdsourcing, Human-Computer Interaction, Trustworthiness
  • item: Thesis-Abstract
    A Deep syntactic parser for the Tamil language
    (2022) Sarveswaran K; Dias G; Butt M
    Natural Language Processing (NLP) applications have become integral to human life. A syntactic parser is a vital linguistic tool that shows syntactic relations between the words in a sentence. These may then be mapped to a tree, a graph, or a formal structure. Syntactic parsers are helpful for building other NLP applications. In addition, they help linguists to understand a language better and perform cross-lingual linguistic analysis. A syntactic parser that performs a deeper analysis and captures argumentative, attributive and coordinative relations between the words of a given sentence is called a deep syntactic parser. Tamil is considered a low-resourced language in terms of tools, applications, and resources available for others to use and build NLP applications or carry out linguistic analyses. Not many resources, such as treebanks and annotated corpora, or linguistic analysis tools such as POS taggers or morphological analysers, are publicly available for Tamil. Available off-the-shelf language-agnostic syntactic parsers show comparatively low performance because of the rich morphosyntactic properties of Tamil. This study elaborates on how I developed the first grammar-driven parser for Tamil, which uses the Lexical-Functional Grammar formalism, and a state-of-the-art data-driven parser using the Universal Dependencies framework. I have also proposed an approach to evaluate a syntactic parser’s syntactical coverage, experimented with transition-based and graph-based approaches, and for the first time, tried multi-lingual training to develop a data-driven parser for Tamil. A part of speech tagger, a morphological analyser cum generator, pre-processing tools, and treebanks are the other tools and resources I have developed to facilitate the development of the parsers. While all these tools give the current best score for their respective tasks, these resources are also available online for others to build upon. Moreover, the study also documents my contributions toward understanding different linguistic aspects of the Tamil language.
  • item: Thesis-Abstract
    RABAN - a software implementation process for robotic process automation (RPA) projects
    (2022) Padmini KVJ; Perera GIUS; Bandara HMND
    Robotic Process Automation (RPA), the next level of business process automation, provides adaptive and transformative solutions to replace timeconsuming, non-value-adding, and repetitive human tasks in a Business Process (BP). RPA based BP transformation projects differ from typical software development projects because RPA bots are developed on stable code. It is counterproductive to use existing software processes in RPA projects. A process template (i.e., software implementation process and metrics to track the project) is yet to be derived for RPA projects. The estimated initial RPA project failure rates are 30-50%, and the lack of a fitting implementation process is attributed as one of the key contributors to failure. We addressed this gap and derived a novel process for RPA projects named Raban and metrics to track RPA projects. Scrum was used to formulate the Raban. Focus group discussions were conducted with scrum teams and identified 80 challenges. Those analyzed in Straussian grounded theory are grouped into six categories (i.e., lack of agile mindset, inconsistency in story estimation, client management issues, lack of adherence to agile practices, scope change in requirement freeze, and lack of quantitative measurement). Prioritized 15 burning challenges were classified based on significance, and taxonomy was developed. Derived steps to estimate RPA use-cases and a framework to achieve customer satisfaction adopting design thinking practices in agile projects. Moreover, 17 software metrics and three artifacts were derived and validated in five scrum projects. Raban was derived based on the solutions identified and further fine-tuned based on the feedback from follow-up interviews with the stakeholders and two workshops conducted with the other RPA project teams. After that, 14 metrics and two artifacts were derived for Raban and validated in a RPA project. Moreover, to select the right candidate BP for RPA transformation, predictive machine learning model was developed, where the decision made as yes/no on RPA suitability. We used 16 factors and a two-class decision forest classification model to develop the model.
  • item: Thesis-Abstract
    Improving the effectiveness of MOOCs to meet the 21st century challenges
    (2021) Gamage SD; Fernando MSD; Perera GIUS
    Massive Open Online Courses (MOOCs) are a type of online course designed using principles of education technology. It enables a massive number of participants to learn online in any course at any time. This affordance of scaling and open access to education is considered as the globalized solution for acquiring 21st century skills. However, unrealistic to the vision, pragmatically, MOOCs are facing challenges. Mainly the content-driven pedagogical structure with limited system design implications caused fewer interactions and isolations, thereby resulted in higher dropouts. Since MOOCs are introduced recently, the problems faced by participants or its effectiveness are less understood. Thus, a systematic understanding of arising problems and solutions to this newly emerged phenomenon is well needed. In this thesis, I explored MOOCs with a holistic view of understanding emerging problems with empirical pieces of evidence—whether MOOCs meet the 21st century skill requirements; what factors are affecting the effectiveness of a MOOC; how can we improve the effectiveness of MOOCs. By exploring the above questions, this thesis mainly contributes to 1) provide empirical evidence of the challenges that MOOCs are facing, 2) solicit a framework to identify the effectiveness of MOOCs, 3) design a novel peer review mechanism, and 4) develop the novel system PeerCollab to improve effectiveness of MOOCs. The research begun with exploratory research methods with active data collection using MOOC users. The analysis conducted using a combined approach of qualitative and quantitative methods to understand the challenges and explore the factors affecting the effectiveness of MOOCs. Initially, surveys were used to identify whether MOOC platforms are providing necessary 21st century skills such as collaborative skills, creativity skills, communications skills, and critical thinking skills. Next, a longitudinal qualitative study was used to gather MOOC experience using participants over 24 months period of time. Results of the qualitative study were incorporated to build an instrument to evaluate MOOCs' effectiveness. The instrument was empirically verified and validated using 121 MOOC participants. The initial survey to explore 21st century skills yielded results from 391 MOOC participants across six platforms. Descriptive statistics depicted that majority of participants reflect the gap in MOOCs to provide 21st century skills. Next, the qualitative analysis using Grounded Theory (GT) and quantitative analysis using Factor Analysis (FA) resulted in a detailed10-dimensional framework to evaluate MOOC effectiveness. Based on the high ranked dimensions in the framework such as Technology, Collaborativeness, Interactivity and Assessment, two systems were designed and developed to demonstrate the improved effectiveness in MOOCs. First, the “Identified Peer Review” (IPR) system demonstrated how peer identity, incentive algorithm, and effective communication in peer review enhance the MOOC's effectiveness. Next, the PeerCollab system demonstrated how social presence can integrate using theories of communities of practices (CoP) into MOOCs and thereby improve effectiveness. This system also demonstrated an articulation of CoP to MOOCs by a novel process named Rapid Communities on MOOCs (RCoM) design with four phases, viz. Cluster, Orient, Focus, and iii Network. Evaluations of the systems demonstrated the challenges and possibilities of integrating such systems into MOOCs and provided a direction to build effective interventions. These systems collectively empower interactions in isolated distributed individuals and form communities to work collectively bridging the gap to meet the 21st century skills. The work of this thesis actively contributes to the nuance of technologies that can be used in society specifically for large scale open and distributed learning contexts.
  • item: Thesis-Full-text
    Routing and control mechanisms for dense mobile adhoc networks
    (2016-09) Sooriyaarachchi, SJ; Gamage, CD
    It is not an exaggeration to mention that mobile devices have become ubiquitous and they are used for variety of purposes ranging from personal communication to disaster management and more. These devices are capable of establishing mobile ad hoc networks (MANETs) for multihop communication without the support of infrastructure. This enables more interesting and useful applications of mobile devices, for example for collaborative leaners in large classrooms, shoppers in crowded shopping malls, spectators in sports stadiums, online gamers and more. MANETs have not sufficiently developed to a deployable level yet. Routing in MANETs is a major problem. It is challenging to devise routing protocols for MANETs due to dynamic topology resulting from mobility, limited battery life and impairments inherent in wireless links. Traditional routing approach is to tweak the existing routing protocols that are designed for wired networks. Therefore, it is common to appoint special nodes to perform routing controls and gather global state information such as routing tables. We identify this approach as the fixed-stateful routing paradigm. Fixed stateful routing does not scale with the density of MANETs because the routes will get obsolete quickly due to the dynamic topology causing frequent routing updates. The overhead for these frequent updates will be unacceptable when the MANETs become dense. For example, the control overhead of routing updates in most of the traditional routing protocols are of magnitude O(N) or O(N2), where N is the number of nodes in the network. We name the routing approach that does not require to maintain global network states and does not appoint key nodes for routing and control as mobile-stateless routing paradigm. We propose a novel concept called endcast that leverages message flooding for end to end communication in MANETs in mobile-stateless manner. However, flooding causes heavy amounts of redundant messages, contention and collisions resulting in a situation known as broadcast storm problem. When flooding is utilized for end to end communication, the messages will flood beyond the destination. We call this situation broadcast flood problem. Repetitive rebroadcasting in simple flooding is analogous to biological cell division in the growth of human organs. Chalone mechanism is a regulatory system to control the growth of the organs. In this mechanism, each biological cell secretes a molecule called chalone and the concentration of chalones in the environment increases when the number of cells increases. When the chalone concentration exceeds a threshold the cells stop dividing themselves. Counter based flooding is one of the efficient flooding schemes, in which a node decides not to rebroadcast a received message if the message is subsequently heard multiple times exceeding a predefined threshold during a iv v random wait period. Inspired by the chalone mechanism in the growth of the organs we selected counter based flooding to unicast messages in a MANET. We proposed an inhibition scheme to stop the propagation of message beyond the destination to mitigate the broadcast flood problem. In this scheme, the destination transmits a smaller size control message that we call inhibitor that also propagates using counter based flooding but with a smaller random wait period than in the case of data message. Furthermore, inhibitors are limited to the region of the MANET covered by data flooding. The proposed endcast scheme outperforms simple flooding in such a way that over 45% of redundant messages are saved in all the network configurations starting from 100-node network in ideal wireless conditions when the nodes were placed on a playground of 600m 400m and each node was configured to have 200m of transmission radius. Similarly, the protocol manages to save over 45% of redundant messages for all node densities ranging from 10 to 300 in realistic wireless conditions simulated by IEEE 802.11g standard wireless MAC implementation with power saving transmission radius of 40m. This saving increases rapidly as networks grow by size in both the ideal and realistic wireless network conditions. The inhibition scheme of the protocol was also found to be effective, for example, redundant messages grow in number at a rate about 8 frames per every 25 nodes added to the network when there is inhibition in operation whereas the growth rate is about 170 frames per every 25 nodes when the protocol operates without inhibition in the simulated network scenario. The major contribution of this research is the analytical model that we developed to design and evaluate endcast schemes. We developed a graph theoretic model to evaluate the propagation of messages in endcast, based on a preliminary model developed by Viswanath and Obraczka [2]. We modified the model by (i) improving its method of estimating the number of new nodes reached by each level of rebroadcast (ii) modeling the impact of node mobility and (iii) incorporating time domain representation to model the flooding schemes that involve random assessment delays (iii) enabling it to represent efficient flooding schemes such as counter based flooding. We present the process of estimating the area covered by the propagation of flooding messages using a geometric method. Time domain is represented by indesing the edges of the flooding graph by time. The counter value and the threshold in counter based flooding are converted into a rebroadcasting probability and estimated using a probability mass function that we constructed by considering the overlapping of radio range circles of the nodes.