Browsing by Author "Wickramarachchi, A"

Now showing 1 - 4 of 4

item: Conference-Abstract
An Adapter Architecture for heterogeneous data processing in bioinformatics pipelines
Lenadora, D; Wickramarachchi, A; Meedeniya, D; Mallawaarachchi, V; Perera, L
Bioinformatics is a growing field focused on both the domains of computer science and biology. A range of bioinformatics data processing tools exists at present, which takes inputs and produces outputs in varying formats depending on the algorithms and processes being used. The undesirable situation where such processes would produce outputs that may not allow the pipelining of other processes, calls for a generic bioinformatics data format converter. Though such converters currently exist, most of them are limited to text conversions and provide limited functionality. In addition, such functions have the potential capability of supporting parallelism to increase the overall throughput. A solution that can provide the said conversion functions as well as utility functions, while processing with a high throughput via parallelism is proposed through this paper. A utility function of this system requires storing bioinformatics data locally. In addition to facilitating this, an average compression rate of 26% achieved in data storage. Evaluation of the system using a set of 7,000,000 gene data showed the maximum time consumption for retrieval as 400ms.
item: Article-Full-text
Managing complex workflows in bioinformatics: an interactive toolkit with GPU acceleration
(2018) Welivita, A; Perera, I; Meedeniya, D; Wickramarachchi, A; Mallawaarachchi, V
Bioinformatics research continues to advance at an increasing scale with the help of techniques such as next-generation sequencing and the availability of tool support to automate bioinformatics processes. With this growth, a large amount of biological data gets accumulated at an unprecedented rate, demanding high-performance and high-throughput computing technologies for processing such datasets. Use of hardware accelerators, such as graphics processing units (GPUs) and distributed computing, accelerates the processing of big data in highperformance computing environments. They enable higher degrees of parallelism to be achieved, thereby increasing the throughput. In this paper, we introduce BioWorkflow, an interactive workflow management system to automate the bioinformatics analyses with the capability of scheduling parallel tasks with the use of GPU-accelerated and distributed computing. This paper describes a case study carried out to evaluate the performance of a complex workflow with branching executed by BioWorkflow. The results indicate the gains of ×2.89 magnitude by utilizing GPUs and gains in speed by average ×2.832 magnitude (over n =5 scenarios) by parallel execution of graph nodes during multiple sequence alignment calculations. Combined speed-ups are achieved ×1.71 times for complex workflows. This confirms the expected higher speed-ups when having parallelism through GPU-acceleration and concurrent execution of workflow nodes than the mainstream sequential workflow execution. The tool also provides a comprehensive user interface with better interactivity for managing complex workflows; a system usability scale score of 82.9 is confirmed high usability for the system.
item: Conference-Full-text
Metagraph: plasmid/chromosome classification enhancement using graph neural networks
(IEEE, 2022-07) Alahakoon, S; Dassanayake, G; Nandasiri, C; Wickramarachchi, A; Perera, I; Rathnayake, M; Adhikariwatte, V; Hemachandra, K
Chromosomes and plasmids are the main sites of genetic information in microorganisms. Separately identifying plasmids and chromosomes is essential for further metagenomic analysis. Computational tools have achieved significant results in classifying DNA into plasmids and chromosomes. However, there is often a trade-off between recall and precision in the currently available tools. Several graph-based tools have been proposed to improve the prediction precision and recall simultaneously by improving upon the results produced by existing tools. We propose MetaGraph, a Graph Neural Network (GNN) based tool for plasmid/chromosome classification enhancement. It uses the high confidence predictions of existing plasmid/chromosome prediction tools and improves the prediction accuracy of low confidence predictions using plasmid probabilities as features for the GNN. We evaluated MetaGraph for a set of simulated DNA sequences. The results significantly improved over stateof-the-art tools like PlasFlow and PlasClass. The results were increased up to 20% from the initial PlasClass predictions. The source code for MetaGraph is freely available at: https://github.com/MetaGSC/MetaGraph
item: Conference-Full-text
Metapcbin: plasmid/chromosome classification for metagenomic contigs using machine learning techniques
(IEEE, 2022-07) Nandasiri, C; Alahakoon, S; Dassanayake, G; Wickramarachchi, A; Perera, I; Rathnayake, M; Adhikariwatte, V; Hemachandra, K
Chromosomes and plasmids are the major carriers of genetic material in microorganisms such as bacteria. Separating chromosomal and plasmid DNA from large datasets is important as plasmids and chromosomes affect functions and other environmental adaptations. Bioinformatics methodologies have been developed for plasmid classification with the advancements in sequencing technologies. The usage of normalized short k-mer counts with machine learning models has been popular in the characterization of plasmids and chromosomes. Furthermore, bio-markers from DNA sequences as features have also been studied in plasmid classification. However, both approaches suffer from the trade-off between precision and recall. MetaPCbin is a plasmid detection tool that combines computational and genetic approaches into a hybrid method of plasmid prediction. MetaPCbin uses an artificial neural network that uses k-mer counts as features and a random forest model that uses biomarkers. MetaPCbin evaluates the precision and the recall of the classification of real-world DNA sequences from the RefSeq database and simulated sequences. The results show that it is capable of performing plasmid classification while maintaining high precision and recall compared to the state of the art. MetaPCbin is freely available at: https://github.com/MetaGSC/MetaPCbin