Schema-independent scientific data cataloging framework

Nakandala, S; Withana, SD; Kumarasiri, D; Jayawardena, H; Bandara, HMND; Perera, S; Marru, S; Pamidighantam, S

Schema-independent scientific data cataloging framework

dc.contributor.author	Nakandala, S
dc.contributor.author	Withana, SD
dc.contributor.author	Kumarasiri, D
dc.contributor.author	Jayawardena, H
dc.contributor.author	Bandara, HMND
dc.contributor.author	Perera, S
dc.contributor.author	Marru, S
dc.contributor.author	Pamidighantam, S
dc.date.accessioned	2018-09-21T20:58:02Z
dc.date.available	2018-09-21T20:58:02Z
dc.description.abstract	Modern scientific experiments generate vast volumes of data which are hard to keep track of. Consequently, scientists find it difficult to reuse and share these data sets. We address this problem by developing a schema-independent data cataloging framework for efficient management of scientific data. The proposed solution consists of an agent which automatically identifies new data products and extract metadata from them, as well as a server which indexes the metadata using a NoSQL database and provides a REST API for querying, sharing, and reusing the data sets. The novelty of our solution lies in the pluggable metadata extraction logic, extensible data product generation monitors, use of a NoSQL database, and the ability to dynamically add new metadata fields. The use of Apache Solr as the backend database enables the proposed solution to index and search data products much fatser than a solution based on relational databases. For example, our Apache Solr based implementation can resolve full text, sub-string, prefix, and suffix queries 91% - 99% faster than a MySQL-based implementation.	en_US
dc.identifier.department	Department of Computer Science and Engineering	en_US
dc.identifier.faculty	Engineering	en_US
dc.identifier.place	Moratuwa, Sri Lanka	en_US
dc.identifier.uri	http://dl.lib.mrt.ac.lk/handle/123/13583
dc.identifier.year	2015	en_US
dc.language.iso	en	en_US
dc.subject	indexing; metadata catalog; scientific data management	en_US
dc.title	Schema-independent scientific data cataloging framework	en_US
dc.type	Conference-Abstract	en_US

Collections

MERCon - 2015

Schema-independent scientific data cataloging framework

Files

Collections