Browsing by Author "Ranathunga, S"
Now showing 1 - 20 of 48
- Results Per Page
- Sort Options
- item: Article-Full-textAdapter-based fine-tuning of pre-trained multilingual language models for code-mixed and code-switched text classification(Springer, 2022) Rathnayake, H; Sumanapala, J; Rukshani, R; Ranathunga, SCode-mixing and code-switching are frequent features in online conversations. Classification of such text is challenging if one of the languages is low-resourced. Fine-tuning pre-trained multilingual language models is a promising avenue for code-mixed text classification. In this paper, we explore adapter-based fine-tuning of PMLMs for CMCS text classification. We introduce sequential and parallel stacking of adapters, continuous fine-tuning of adapters, and training adapters without freezing the original model as novel techniques with respect to single-task CMCS text classification. We also present a newly annotated dataset for the classification of Sinhala–English code-mixed and code-switched text data, where Sinhala is a low-resourced language. Our dataset of 10000 user comments has been manually annotated for five classification tasks: sentiment analysis, humor detection, hate speech detection, language identification, and aspect identification, thus making it the first publicly available Sinhala–English CMCS dataset with the largest number of task annotation types. In addition to this dataset, we also tested our proposed techniques on Kannada–English and Hindi–English datasets. These experiments confirm that our adapter-based PMLM fine-tuning techniques outperform or are on par with the basic fine-tuning of PMLM models.
- item: Conference-Full-textAir-conditioners condensate recovery system for buildings(2013-11-09) Siriwardhena, K; Ranathunga, SMost conventional cooling systems produce water as a byproduct, which can be recovered and put to good use. In order to product cool air from a compressed refrigerant, a set of coils allow a hot, high-pressured refrigerant to dissipate its heat and condense into a liquid. An expansion valve is then typically used to evaporate and cool the refrigerant. This cool gas then runs through a set of coils that allows it to absorb heat and cool the air, which is blown over the coils and into the inside of the building. This process cools the warm coils, so when the warm air blowing past the coils reaches its dew point the moisture in the air condenses onto the coils, producing what is essentially distilled water. This byproduct of air conditioning units is called as Condensate drain water. The quality of condensate created by Air Conditioners is typically very high, having low amounts of suspended solids, a neutral to slightly acidic pH, and low temperatures. These characteristics make the condensate adequate for several non-potable uses such as irrigation, cooling tower make-up or toilet flushing. In addition to quality water, high recovery capacity is a major benefit of these systems. Although the amount of condensate produced can vary greatly and depends on the size and operational load of the Air-conditioning system as well as the ambient temperature and humidity within a particular region. A rule of thumb created by Karen Guz (Director of the Conservation Department for the San Antonio Water System, USA) is that 0.1 to 0.3 gallons of condensate per ton of air being chilled is produced every hour that the system is operating. Seizing this opportunity by replacing or supplementing potable water with the recovered condensate can considerably reduce a building’s demand for potable water. By implementing a condensate recovery system free, clean and unused water will be replacing costly, treated, high demand potable water. Decreasing the use of potable water within buildings plays a major role in conserving municipal sources. Moreover following potential LEED credits can be achieved. Water Efficient Landscaping a. WE Credit 1.1 b. WE Credit 1.2
- item: Conference-Full-textAnanya - a named-entity-recognition (ner) system for sinhala language(IEEE, 2016-04) Manamini, SAPM; Ahamed, AF; Rajapakshe, RAEC; Reemal, GHA; Jayasena, S; Dias, GV; Ranathunga, S; Jayasekara, AGBP; Bandara, HMND; Amarasinghe, YWRNamed-Entity-Recognition (NER) is one of the major tasks under Natural Language Processing, which is widely used in the fields of Computer Science and Computational Linguistics. However, the amount of prior research done on NER for Sinhala is very minimal. In this paper, we present data-driven techniques to detect Named Entities in Sinhala text, with the use of Conditional Random Fields (CRF) and Maximum Entropy (ME) statistical modeling methods. Results obtained from experiments indicate that CRF, which provided the highest accuracy for the same task for other languages outperforms ME in Sinhala NER as well. Furthermore, we identify different linguistic features such as orthographic word level and contextual information that are effective with both CRF and ME Algorithms.
- item: Conference-AbstractAnomaly detection in complex trading systems(2017) Ranaweera, L; Vithanage, R; Dissanayake, A; Prabodha, C; Ranathunga, S; classification; feature selection; trading systemsSystem availability is one of the major requirements expected from systems in the trading domain. In order to prevent system outages that can deteriorate system availability, anomaly detection must be able to assess the status of the system and detect anomalies that can lead to failures on a real-time basis. This paper presents a framework for anomaly detection for complex trading systems based on supervised learning approaches. Multiple feature reduction techniques were experimented with, in order to eliminate the noisy features that were initially derived from the system parameters. A classification technique based on Radial Basis Function (RBF) kernel Support Vector Machine (SVM) along with a feature selection technique built on a tree-based ensemble displayed the most promising results.
- item: Conference-Full-textAspect detection in sportswear apparel reviews for opinion mining(IEEE, 2022-07) Rajapaksha, S; Ranathunga, S; Rathnayake, M; Adhikariwatte, V; Hemachandra, KManufacturers and brand owners apply sentiment analysis techniques on customer reviews to identify customer opinions on their products and services. Sentiment analysis at the document level or sentence level does not provide a complete view of the customer opinion because customers may express their opinion on different aspects of the product or service within a single review. This issue has inspired aspect-level opinion mining. Two core tasks are involved with aspect-level opinion mining: aspect detection and aspect-based sentiment analysis. This research is aimed at the first task - aspect detection. The focused domain is sportswear apparel, which has been largely overlooked in the field of opinion mining. Accordingly, this paper presents a new dataset produced with manual annotations by domain experts, according to a newly defined aspect taxonomy. This research compares the performance of a set of pre-trained language models for the considered task, and achieves state-of the-art performance for sportswear apparel reviews using a novel ensemble method.
- item: Thesis-Full-textAssessment and error identification of answers to mathematical word problemsKadupitiya, JCS; Ranathunga, S; Dias, GIn Mathematics, the term “word problem” is often used to refer to any mathematical exercise where significant background information on the problem is presented as text rather than in mathematical notation. This research focuses on word problems that have simple numerical and/or algebraic answers. These types of word problems can be further categorized according to the domain, such as interest calculation questions, percentages, shares and mensuration. These word problems can be found in many international examinations. Existing research has produced solutions that focus on questions only for some of the aforementioned categories. Moreover, they have not focused on assessment based on a marking rubric. This thesis presents a system that is capable of assessing answers to both numerical and algebraic type word problems using a (teacher-provided) marking rubric. We automatically identify the exact errors (if any) made by students by using the marking rubric. This system is modularized and can be extended to support different types of word problems. If the answer contains a short sentence phrase along with the numerical or algebraic expression, it is also evaluated in order to check whether the student has actually understood the question. Our main focus is the questions from the GCE Ordinary Level (O/L) Mathematics syllabus in Sri Lanka. Many students take this examination in Sinhala (an official language in Sri Lanka). Therefore short sentence evaluation had to be done for Sinhala. This requirement led us to conduct the first research on short sentence similarity measurement for Sinhala. The unsupervised similarity measurement technique we used showed comparable results to that of English. The system was thoroughly evaluated with student answers to questions from GCE O/L examination. It was further tested for answers to some word problems from the Cambridge Ordinary Level and the Australian year 10 international examinations, which demonstrated that the system is able to deal with variations in questions in different examinations.
- item: Article-AbstractAutomated Assessment of Multi-Step Answers for Mathematical Word ProblemsDias, G; Ranathunga, S; Kadupitiya, JCSWe present a system to automatically grade the mathematical word questions. The questions that we currently consider are at the level of GCE (General Certificate of Education) Ordinary Level (O/L) Mathematics paper standard in Sri Lanka. The solutions to these questions are open-ended multi step answers. The system uses a regular expression based information retrieval approach to validate the expressions in the answers. The implemented system properly evaluates student answers using a marking rubric and awards full/partial marks. We have tested the performance of the system using 500 answer scripts for five different questions from 50 students. The grades given by the system are compared against the manual grading marks and only one answer was graded wrongly. Therefore the accuracy of the system is 99.8%.
- item: Conference-Extended-AbstractAutomatic assessment and error identification of multi-step answers for matrix questions(2017) Thirunavukkarasu, N; Selvarasa, A; Rajendran, N; Yogalingam, C; Ranathunga, S; Dias, GThis paper presents an automatic assessment and error identification system for student answers with matrix expressions, and which may have multiple steps. Teacher’s intervention is needed only during the question set-up stage, to provide the marking rubric. The system currently supports four types of matrix questions: multiplying a matrix by a constant number, matrix addition and subtraction, finding unknown elements within a matrix, and finding the unknown matrix from an equation. A CAS (Computer Algebra System) is used to evaluate each step of the student’s answer. The system is capable of giving full/partial marks according to a marking rubric. Errors commonly made by students were identified and categorized by analyzing sample student answers. Using this categorization, the system is capable of identifying the exact error(s) made by a student.
- item: Conference-AbstractAutomatic assessment of student answers for geometric theorem proving questions(2017) Mendis, C; Lahiru, D; Pamudika, N; Madushanka, S; Ranathunga, S; Dias, GIn this paper, we present a system to automatically assess multi-step answers for geometric theorem proving questions in high school Mathematics. The system is capable of allocating partial marks for steps considering a marking rubric. Moreover, the system evaluates the natural language reasoning part in each step. Currently, 30 theorems related to straight lines have been implemented as inference rules. The system has been tested with 100 student answers for two geometric theorem proving questions.
- item: Conference-AbstractAutomatic creation of a word aligned Sinhala-Tamil parallel corpus(2017) Mohamed, MZ; Ihalapathirana, A; Hameed, RA; Pathirennehelage, N; Ranathunga, S; Jayasena, S; Dias, GA parallel corpus aligned at both sentence and word level is an important prerequisite in statistical machine translation. However, manual creation of such a parallel corpus is time consuming, and requires experts fluent in both languages. This paper presents the first ever empirical evaluation carried out to identify the best unsupervised word alignment technique for Sinhala and Tamil. It also presents a novel approach that combines the output of individual aligners, which outperforms the solitary use of these aligners. Sentence aligned parallel text from annual reports and letters of Sri Lankan Government institutions, and order papers from the Parliament of Sri Lanka were used in the evaluation.
- item: Thesis-Full-textAutomatic evaluation and error identification of solutions to single-variable algebraic questionsErabadda, ELBH; Ranathunga, S; Dias, GThere are two types of single-variable equation solving questions that are present in the Ordinary Level mathematics curriculum in Sri Lanka: linear equations with fractions and quadratic equations. Answers to these questions are open-ended and multi-step in nature. This thesis describes a mechanism that evaluates answers to these two types of questions and awards full/ partial credit. It is quite common that students make mistakes in their answers, which results in partial credit. They may repeat the same errors if they do not receive feedback on their mistakes. Therefore feedback in student errors is important for any subject. This thesis introduces a method to automatically identify the errors that the students make in their answers for the aforementioned two types of questions. To the best of our knowledge, this is the first work on automatically identifying student errors in complex multi-step solutions to single-variable equation solving questions. Our evaluations show that the system we have implemented is capable of awarding full/ partial credit to student answers according to a marking scheme and also to identify errors in student answers with minimal teacher intervention. These evaluations were carried out using student answers from different sources.
- item: Thesis-Full-textAutomatic model answer generation for simple linear algebra-based mathematics questions(2018) Sakthithasan, R; Ranathunga, SThis research is focused on automating the process of generating answers to simple linear equation related mathematical problems. Simple linear algebra based questions are a part of most Mathematics examinations. These linear algebra questions can appear as word type problems, where the question description is given in a textual form. Addition, subtraction, multiplication, division and ratio calculation are some of the known categories for linear equation based word type problems. Addition and subtraction based problems can be further divided based on their textual information as change type (join-separate type), compare type, and whole-part type. This research focuses on linear equation questions belonging to these three categories. Mainly four approaches are followed by existing research for answer generation for linear algebra questions. These are rule/inference based, ontology based, statistical based, and hybrid based approaches. In this research, a statistical approach is selected to automatically generate answers for simple linear algebra based model questions. The implemented system shows better accuracy than the other statistical systems reported in previous research for the same types of questions. This result is achieved by using ensemble classifiers and smart feature selection. Also, a new data set is created for training and evaluation purposes.
- item: Conference-Full-textBilingual lexical induction for sinhala-english using cross lingual embedding spaces(IEEE, 2021-07) Liyanage, A; Ranathunga, S; Jayasena, S; Adhikariwatte, W; Rathnayake, M; Hemachandra, KBilingual lexicons are an important resource in Natural Language Processing (NLP). Such resources are scarce for Low Resource languages (LRLs) such as Sinhala. However, research on Bilingual Lexical Induction (BLI) on low resource settings is limited. This paper presents the first-ever implementation of BLI for the Sinhala-English language pair. Following the recently introduced VecMap model, we map the vectors of words belonging to both Sinhala and English into a shared vector space and measure the Cross Lingual (CL) similarity between the words. The closest English word for a given Sinhala word in this CL vector space is taken as the corresponding similar word. Currently, there is no detailed evaluation with respect to the size and the nature of the dataset used to create the word vectors, type of the evaluation dictionary, or the technique used to create the word vectors. This paper presents a comprehensive analysis of how these factors affect BLI for Sinhala and English languages and shows that the BLI results have a heavy dependency on these factors.
- item: Conference-Full-textCategorizing food names in restaurant reviews(2016-04) Prakhash, S; Nazick, A; Panchendrarajan, R; Brunthavan, M; Ranathunga, S; Pemasiri, A; Jayasekara, AGBP; Bandara, HMND; Amarasinghe, YWRThere are many aspects such as food, service, and ambience that a customer would look for, when deciding on a restaurant to dine in. Among these aspects, the type of food it sells and the food quality are the most important. Therefore, when automatically rating restaurants based on customer reviews, the food aspect plays a major role. There exists some research on rating individual food items in a restaurant. However, a potential customer requires not the ranking of an individual food item, but the ranking of a particular food category in general. In order to do that, a categorization of food names is required. This paper presents two techniques for food name categorization using document similarity measurements.
- item: Conference-Full-textCheap food or friendly staff? weighting hierarchical aspects in the restaurant domain(IEEE, 2016-05) Panchendrarajan, R; Murugaiah, B; Prakhash, S; Ahamed, MNN; Ranathunga, S; Pemasiri, A; Jayasekara, AGBP; Bandara, HMND; Amarasinghe, YWRIn aspect-level opinion mining, each aspect is assigned a rating based on customer reviews. More often than not, these aspects exhibit a hierarchical relationship, and the restaurant domain is no difference. With the existence of such hierarchical relationships, rating of an aspect is based on the composite score of its sub-elements. However, the influence of these sub-aspects on the score of a parent aspect is not uniform, since some sub-aspects are perceived more important than others. Therefore, when calculating the composite score for an aspect, influence of each sub-aspect should be weighted according to its perceived importance. Identifying weights for different aspects is addressed as the problem of multi-attribute weighting. However the existing approaches do not utilize the relationships between aspects to find weights. This paper presents an approach to find weights for aspects that exhibit hierarchical relationships in restaurant domain using an improved version of the Analytic Hierarchy Process (AHP), one of the Multi Attribute Decision Making Techniques (MADTs). Different aspects of the restaurant domain are modeled as a hierarchy and weights for aspects are calculated using AHP. Occurrence counts of aspects in restaurant reviews are used to obtain the relative importance of aspects. This approach provides acceptable consistency ratios for the pairwise comparison matrices obtained for each level in the hierarchy of aspects.
- item: Conference-Full-textClustering sinhala news articles using corpus- based similarity measures(IEEE, 2018-05) Nanayakkara, P; Ranathunga, S; Chathuranga, DNews aggregators help readers to handle large numbers of news items in a convenient manner by collecting them into a single place with meaningful groupings. Such news aggregators/clusters are available for English and some other popular languages. However, no such tools are available for Sinhala language. To address this void, this paper presents a system to collect news articles published across the web and group related articles using corpus-based similarity measures. Despite the simplicity of the technique and morphological richness of Sinhala, we achieved very promising results that prove the viability of the presented technique.
- item:Comprehensive Part-Of-Speech Tag Set and SVM Based POS Tagger for SinhalaFernando, S; Ranathunga, S; Jayasena, S; Dias, GThis paper presents a new comprehensive multi-level Part-Of-Speech tag set and a Support Vector Machine based Part-Of-Speech tagger for the Sinhala language. The currently available tag set for Sinhala has two limitations: the unavailability of tags to represent some word classes and the lack of tags to capture inflection based grammatical variations of words. The new tag set, presented in this paper overcomes both of these limitations. The accuracy of available Sinhala Part-Of-Speech taggers, which are based on Hidden Markov Models, still falls far behind state of the art. Our Support Vector Machine based tagger achieved an overall accuracy of 84.68% with 59.86% accuracy for unknown words and 87.12% for known words, when the test set contains 10% of unknown words.
- item: Conference-AbstractComputer Aided Evaluation of Multi-Step Answers to Algebra QuestionsErabadda, B; Ranathunga, S; Dias, GThis paper presents a system that automatically assesses multi-step answers to algebra questions. The system requires teacher involvement only during the question set-up stage. Two types of algebra questions are currently supported: questions with linear equations containing fractions, and questions with quadratic equations. The system evaluates each step of a student's answer and awards full/partial marks according to a marking scheme. The system was evaluated for its performance using a set of student answer scripts from a government school in Sri Lanka and also by undergraduate students. The system accuracy was over 95.4%, and over 97.5%, respectively for the aforementioned data sets.
- item: Conference-Full-textAn Episode-based approach to Identify Website user access patterns(2016-03-08) Udantha, M; Ranathunga, S; Dias, GMining web access log data is a popular technique to identify frequent access patterns of website users. There are many mining techniques such as clustering, sequential pattern mining and association rule mining to identify these frequent access patterns. Each can find interesting access patterns and group the users, but they cannot identify the slight differences between accesses patterns included in individual clusters. But in reality these could refer to important information about attacks. This paper introduces a methodology to identify these access patterns at a much lower level than what is provided by traditional clustering techniques, such as nearest neighbour based techniques and classification techniques. This technique makes use of the concept of episodes to represent web sessions. These episodes are expressed in the form of regular expressions. To the best of our knowledge, this is the first time to apply the concept of regular expressions to identify user access patterns in web server log data. In addition to identifying frequent patterns, we demonstrate that this technique is able to identify access patterns that occur rarely, which would have been simply treated as noise in traditional clustering mechanisms.
- item: Conference-Full-textEvaluation of different classifiers for sinhala pos tagging(IEEE, 2018-05) Fernando, S; Ranathunga, S; Chathuranga, DThis paper presents a comparative evaluation of three state-of-the-art classifiers for Sinhala Parts-of-Speech (POS) tagging. Support Vector Machines (SVM), Hidden Markov Models (HMM) and Conditional Random Fields (CRF) based POS tagger models are generated and tested using different combinations of a corpus of news articles and a corpus of official government documents. CRF is used for the first time in Sinhala POS tagging, thus the best feature set is experimentally derived. To further improve the accuracy of POS tagging, a majority voting based ensemble tagger is created using three individual taggers. This ensemble tagger achieved the highest accuracy in POS tagging than any individual tagger. The two domains (news, and official government documents) used in this study have noticeable differences in writing style and vocabulary. Generating domain specific POS taggers is time consuming and costly due to the overhead involved in creating and manually tagging domain specific corpora, for low resourced languages in particular. Therefore, this study also evaluates the possibility and successfulness of using corpora of different domains in training and testing phases of aforementioned machine learning techniques.
- «
- 1 (current)
- 2
- 3
- »