Efficient depiction of video for semantic retrieval applications by dimensionality reduction of visual feature space

dc.contributor.advisorRanathunga L
dc.contributor.advisorAbdullah N A
dc.contributor.authorBandara AMRR
dc.date.accept2021
dc.date.accessioned2021
dc.date.available2021
dc.date.issued2021
dc.description.abstractThe retrieval of temporal digital visual data, either by a text or visual query, requires automatic interpretation, which includes high-level annotation by object detection and recognition for text query-based retrieval and low-level abstraction for visual querybased retrieval. Both the accuracy and the speed of the interpretation become crucial factors in real-world applications, due to the high density of visual data. This study has focused on reducing the complexity of visual data efficiently by dimensionality reduction techniques for the detection and recognition of objects in videos for both textual annotation and visual query-based video frame retrieval. The contribution of the study includes three approaches, i.e., a novel visual feature descriptor based on colour dithering – namely Salient Dither Pattern Feature (SDPF), novel object segmentation method based on the proposed feature descriptor – namely Refining Superpixel and Histogram of oriented optical flow Clustering (RSHC) –, and a novel self-supervised local descriptor – namely Network-in-Network with Restricted Boltzmann Machine (NIN-RBM). The experimental results make it evident that the SDPF is rotation and scale invariant and computationally efficient yet shows similar object recognition accuracy to the state-of-the-art methods with minimum supervision. The results further revealed that RSHC has successfully utilized SDPF for accurately segmenting individual objects by using a very shallow history of motion. Furthermore, according to the results, NIN-RBM has shown the state-of-the-art correspondence matching performance over the existing deep-learned self-supervised binary descriptors, keeping the computation time at the minimum. The overall results support the conclusions that RSHC is capable of accurately segment objects in a video, and then SDPF can be successfully used for recognizing the segmented objects. Moreover, NIN-RBM can be used to reliably and rapidly retrieve video frames related to any visual query. Since NIN-RBM is a local descriptor, it can be further used for locating of high-level objects and estimating their poses precisely, to improve the details of semantics retrieved from video data.en_US
dc.identifier.accnoTH5063en_US
dc.identifier.citationBandara, A.M.R.R. (2021). Efficient depiction of video for semantic retrieval applications by dimensionality reduction of visual feature space [Doctoral dissertation, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/21175
dc.identifier.degreeDoctor of Philosophyen_US
dc.identifier.departmentDepartment of Information Technologyen_US
dc.identifier.facultyITen_US
dc.identifier.urihttp://dl.lib.uom.lk/handle/123/21175
dc.language.isoenen_US
dc.subjectDIMENSIONALITY REDUCTIONen_US
dc.subjectBINARY DESCRIPTORen_US
dc.subjectCORRESPONDENCE MATCHINGen_US
dc.subjectOBJECT RECOGNITIONen_US
dc.subjectVIDEO SEGMENTATIONen_US
dc.subjectCOLOUR DITHERINGen_US
dc.subjectDEEP LEARNINGen_US
dc.subjectINFORMATION TECHNOLOGY -Dissertationen_US
dc.subjectCOMPUTER SCIENCE -Dissertationen_US
dc.titleEfficient depiction of video for semantic retrieval applications by dimensionality reduction of visual feature spaceen_US
dc.typeThesis-Abstracten_US

Files