Browsing by Author "Wimalasuriya, DC"

Now showing 1 - 7 of 7

item: Article-Abstract
Building a WordNet for Sinhala
Wijesiri, I; Gallage, M; Gunathilaka, B; Lakjeewa, M; Wimalasuriya, DC; Dias, G; Paranavithana, R; De Silva, N
Sinhala is one of the official languages of Sri Lanka and is used by over 19 million people. It belongs to the Indo-Aryan branch of the In-do-European languages and its origins date back to at least 2000 years. It has developed into its current form over a long period of time with influences from a wide variety of lan-guages including Tamil, Portuguese and Eng-lish. As for any other language, a WordNet is extremely important for Sinhala to take it into the digital era. This paper is based on the pro-ject to develop a WordNet for Sinhala based on the English (Princeton) WordNet. It de-scribes how we overcame the challenges in adding Sinhala specific characteristics which were deemed important by Sinhala language experts to the WordNet while keeping the structure of the original English WordNet. It also presents the details of the crowdsourcing system we developed as a part of the project - consisting of a NoSQL database in the backend and a web-based frontend. We con-clude by discussing the possibility of adapting this architecture for other languages and the road ahead for the Sinhala WordNet and Sin-hala NLP.
item:
Development of an ontology construction component for the OBCIE (ontology-based components for information extraction) approach
(2015-06-19) Wimalasuriya, DC; Lewke Bandara, MS
Information extraction systems identify and retrieve certain types of information from natural language text. A recent development in the field of information extraction is the emergence of ontology-based information extraction as a sub-filed, where ontologies are used to guide the information extraction process and to present the extracted information. One of the challenges faced by fields of ontology-based information extraction and information extraction is the difficulty of reuse of prior work in developing new systems. A component-based approach for information extraction named OBCIE (Ontology-Based Components for Information Extraction) has been previously developed to address this issue. This paper presents the progress in developing an ontology construction component for the OBCIE approach, which identifies classes and relationships for a given domain. It is centered on discovering the information contained within the loose structure of Wikipedia pages.
item: Conference-Full-text
Generic log file data extraction
(2013) Bandara, TPSH; Chandrasekara, WKMSP; Chathunga, JAR; Chiranjeewa, KAL; Wimalasuriya, DC; Fernando, MBTL; Jayathilake, PWDC
Automated software log file analysis holds an important position in software maintenance. Currently available analysis tools are not generic. They tend to focus on specific software or servers and their flexibilities are minimal. Furthermore, costs of commercially available log analysis tools are not affordable for small and medium scale firms. This has left a void in the market for generic, customizable and open source log file analysis tools. The impediment to such a tool emerging is the unavailability of a generic log file data extraction mechanism. A generic log file format definition language and an underlying persistent data storage system is a solution to this problem. Log file structures could be defined by the aforementioned language and the data extracted would be stored in the persistent storage. This methodology enables generic log file analysis on top of the extracted data. Through the research and implementations carried out, it was identified that a modified version of simple declarative language is suitable for the log file format definition language. II would have the capability of handling and defining all patterns of text based log files. Additionally. the results revealed that the appropriate storage mechanism would be an Extensible Markup Language (XML) database mainly because of the similarities between the hierarchical nature of XML and common log file structures.
item: Thesis-Abstract
Introducing adaptability to natural-language systems through user modeling
Wijesooriya, VJ; Wimalasuriya, DC
A primary way of improving spoken dialogue technology towards incorporating more natural human-computer interaction is by making the systems user-adaptive, which indicates the ability of dialogue systems to conduct user-tailored interaction. Such an adaptation of the system to an individual user can be achieved by building a system’s model of the user and exploiting this model to provide responses customized for the individual user. In this case, the system will adapt its part of the dialogue interaction according to the user’s goals, plans and beliefs inferred by its user model enabling communication with the user in a manner more convenient for addressing his/ her requirements. Also, the system is required to change its user model dynamically, in order to reflect the specific characteristics of the particular user. The prototype system has been built as an attempt at exploring how dialogue systems can be made user adaptive through the above concept of building and exploiting dynamic models of users. The system specifically demonstrates how this technology can be harnessed to cater for large-scale business domains such as insurance and banking industries where the call centers can make use of such a system to attend the typically overwhelming amounts of customer queries, saving hugely the employee time and effort.
item: Conference-Full-text
Lexical Enrichment and Sense Disambiguation of Ontology Concepts
Premarathna, PHSR; Indrajee, KHH; Mahawithana, P; De Silva, LYSG; Wimalasuriya, DC
This paper presents a model to measure semantic similarity between custom ontology concepts and the taxonomy of WordNet and introduces a new ontology concept similarity measure. The similarity measure is based on a measure of weighted overlap of semantic cotopy of a concept in two taxonomies. The model can be applied to automatically enhance the vocabulary of terms in ontologies embedding equivalence classes of terms and other linguistic information directly in the ontology. This model is applied to the products and services domain where a Product Ontology is lexically enhanced and the effectiveness of the model is evaluated.
item: Conference-Extended-Abstract
Lexical enrichment and sense disambiguation of ontology concepts
(2014-01-16) Prernarathna, PHSR; Indrajee, KHH; Mahawithana, P; De Silva, LYSG; Wimalasuriya, DC
This paper presents a model to measure semantic similarity between custom ontology concepts and the taxonomy of WordNet and introduces a new ontology concept similarity measure. The similarity measure is based on a measure of weighted overlap of semantic cotopy of a concept in two taxonomies. The model can be applied to automatically enhance the vocabulary of terms in ontologies embeddingequivalence classes of terms and other linguistic information directly in the ontology. This model is applied to the products and services domain where a Product Ontology is lexically enhanced and the effectiveness of the model is evaluated.
item: Thesis-Abstract
Web services for ontology based information extraction
Silva, LCT; Wimalasuriya, DC
The amount of data contained in a textual format has increased rapidly in the recent past. Such data includes web sites, documents of business organizations, etc., and contain lots of information. Information Retrieval (IR) is a field that allows identifying relevant document for a given query out of all these available documents. Information Extraction is taking another step in this direction. Instead of returning the set of documents that contains the relevant information, IE recognizes and returns the information among the natural text in these documents. Ontology is defined as the “formal, explicit specification of a shared conceptualization”. It contains classes, properties, individuals and values to represent data in a certain domain. Most of the time in Ontology-Based Information Extraction, an IE technique is used to discover individuals for classes and values for properties to build ontology for a given domain. However, sometimes these classes and properties also identified as part of the IE technique rather than using a template with the pre-identified classes and properties in the Ontology. A traditional Ontology Based Information Extraction system contains two main operations, ontology construction and ontology population. In the component-based approach defined in the “Ontology-Based Components for Information Extraction (OBCIE)”, the operation of constructing ontology is not changed. However, the operation to populate the ontology is refined in to a pipeline of three separate components: pre-processors, information extractors and aggregators. By developing these components as web services, we have provided the ability for other applications to use them to extract the information out of any text based document. To demonstrate this concept, we have developed an application that accepts a set of text documents, and extracts useful information. It uses “metadata files”, which are dependent of the domain in which the ontology is created and populate the given ontology.