Browsing by Author "Pasqual, AA"

Now showing 1 - 20 of 27

item: Conference-Extended-Abstract
Augmented reality for mobile devices to show information of exhibits at a museum
(2011) Abeysinghe, C; Jayasiri, V; Sandaruwan, K; Wijerathna, B; Pasqual, AA
Augmented Reality (AR) is one of the emerging technologies being used for enhancing user experience in many applications. Specific devices for various tasks are been replaced by Smart Phones making them a unique platform to implement all functionalities in one device. In this paper we present a real-time AR framework for mobile devices that takes into consideration key technical challenges such as limited processing power and battery life. The framework is having four separate modules, Marker Detection, Identification, Camera pose calculation and Embedding visual information. A 2D pattern called a marker is used to uniquely identify the objects. The marker in the camera view is detected and tracked so that information contained in the marker is exploited. Tracked four corners of the marker are used to calculate the 6 DOF of camera position, which is further processed to place 3D graphics on the real scene with accurate rotation and translation. To demonstrate the capabilities of our AR Framework we have developed an application for iPhones, which highlights significant information in exhibits.
item: Article-Full-text
Combined static and motion features for deep-networks-based activity recognition in videos
(IEEE, 2019) Ramasinghe, S; Rajasegaran, J; Jayasundara, V; Ranasinghe, K; Rodrigo, R; Pasqual, AA
Activity recognition in videos in a deep-learning setting—or otherwise—uses both static and pre-computed motion components. The method of combining the two components, whilst keeping the burden on the deep network less, still remains uninvestigated. Moreover, it is not clear what the level of contribution of individual components is, and how to control the contribution. In this work, we use a combination of CNNgenerated static features and motion features in the form of motion tubes. We propose three schemas for combining static and motion components: based on a variance ratio, principal components, and Cholesky decomposition. The Cholesky decomposition based method allows the control of contributions. The ratio given by variance analysis of static and motion features match well with the experimental optimal ratio used in the Cholesky decomposition based method. The resulting activity recognition system is better or on par with existing state-of-theart when tested with three popular datasets. The findings also enable us to characterize a dataset with respect to its richness in motion information.
item: Thesis-Abstract
Content-based image retrieval using large centre regions
Senaratne, RS; Pasqual, AA
Among all the visual features used for content-based image retrieval, colour is perhaps the most dominant and distinguishing one in many applications. Therefore in this research project, the concentration was focused on the colour property of images. In this work, a new histogram refinement technique, Large Centre Regions (LCR) Refinement, and a new region representation technique, LCR Sets, based on colour regions are presented. These methods extract a selected number of largest regions around the centre of the image and match other images emphasizing this property. Two assumptions are made. First is, that it can be assumed that the significant objects oritems of an image are often located at the centre. These objects can often be characterized by their colour. Hence an image retrieval technique which extracts the colours of large centre regions of an image would improve the retrieval performance for images with significant objects at the centre. The second is, that the techniques were tested on an image data base predominantly consisting of red images, but they perform similarly for other colours as well. The presented histogram refinement descriptor, Large-Centre-Regions Vector, effectively represents large centre regions of an image. In addition to this, LCR Sets represent basic information about the shape of a region. In the prototype, firstly, all the regions in an image were extracted depending on the similarity of the colour of the pixels. A centre zone was defined on the image and a selected number of largest regions which overlap with this centre zone at least by 50% of the region area were selected as the Large-Centre-Regions for histogram refinement basis. In addition to large centre regions, LCR Sets represent the areas of a selected umber of largest regions lying outside the centre zone and the width to height ratio of the minimum bounding rectangle of each region. Since the largest regions at the centre are given the emphasis for matching, effect of the background can be minimized as well because most part of the background often lies outside the centre zone. Extra distinguishing capability among different images can be achieved with LCR Sets.Experimental results of LCR Refinement show much improved retrieval performance, especially for images with significant regions at the centre. Results show 20% average improvement in ranks with LCR Refinement compared to Histogram. By combining LCR Sets with either Histogram or LCR Refinement, this can be further improved upto 26% or 22%, respectively.
item: Thesis-Abstract
Enhancing the use of school net for learning and teaching
(2014-08-08) Priyashantha, WC; Pasqual, AA
School Net is a network connecting secondary schools in Sri Lanka, which has been setup to support effective use of information and communication technologies (ICT), particularly the Internet, for enhancing primary and secondary school education, and for encouraging greater communication and co-operation among school communities. In school learning environment, teachers play a central role and the objective of this research is to assess whether School Net has made a positive impact to enhance this role. The evaluation of impact of School Net on teachers and the strategies they employ to facilitate the environment is critical. The research made use of a questionnaire to identify areas where School Net could have made a difference and includes an in-depth analysis of the level of significance of such impact. The analysis shows few relationships that need to be highlighted as improvements in the areas could lead to a more efficient and effective use of School Net for teaching/learning activities. Number of hypotheses which lead to the development of the questionnaire was evaluated based on the collected data. These evaluations, detailed analysis of the results as well as a set of recommendations are presented in this thesis. Three of the relationships that came into light are highlighted due to their significance: The medium of information used in School Net, School Net utilization in provinces, the use of School Net based on gender of teacher. There is an identifiable relationship between the use of the School Net and medium/media used in it. All three languages (i.e. Sinhala, Tamil and English) should be used for the contents of the School Net while localizing the operating environment of the School Net. This indicates the importance of local language use in ICT. The teachers who have undergone ICT training use the School Net more than the others, indicating the need for well structured training programs. Even though the study provided greater insights into the ways School Net is currently being used in Teaching/Learning activities in Schools, it should be noted that the sample size does not permit to generalize conclusions. A detailed survey, which takes into account provincial aspects, needs to be undertaken as a continuation of this work.
item: Conference-Abstract
FPGA based custom accelerator architecture framework for complex event processing
Ekanayaka, KUB; Pasqual, AA
Complex Event Processing (CEP) is an emerging field in high performance computing paradigm where real time (low latency) computing capability is expected over big data processing (high throughput). Significant number of software architectures have been developed to improve the throughput while reduce the latency but maintaining of the both aspects reaches the limits of the software platforms. This paper proposes a novel custom hardware accelerator architecture framework for CEP in big data domain. The proposed design improves the throughput performance more than 10 times over the software counterpart while keeping the latency value at less than 100 nano seconds. Same Structured Query Language(SQL) type queries used in reference software architecture were used to improve the flexibility. A query compiler based on the same query language grammar was designed to convert the queries in to Hardware Description Language(HDL) modules. All modules were parameterized to improve the scalability of the design. Those generated modules were synthesized through vendor tools and programmed in to Field Programmable Gate Array(FPGA) platform in order to implement the system. Proposed hardware architecture framework was verified using a sensor network data set of a football field and the results were compared with software counterpart to show the performance improvement.
item: Conference-Abstract
FPGA implementation of normalized cross-correlation for real-time template matching in dynamic search windows
(2011) Cabral, A; Pasqual, AA
In computer vision crass-correlation is a standard approach for local feature matching in feature tracking applications. For this Normalized cross-correlation (NCC) is implemented on spatial domain, because it does not have a simple and efficient frequency domain expression. Therefore in applications which demand real-time processing, a dedicated hardware implementation of NCC is essential to meet the computational cost in spatial domain. In this paper we present a Field Programmable Gate Array (FPGA) implementation of NCC based on the multi-port memory controller together with Fast Normalize Cross-correlation. Practical experimentation shows that our system can achieve frame rates closer to 100 for a search window size of 100x100 and template size of 15x15, with only using two dual-port memories.
item: Conference-Abstract
FPGA implementation of sound signal localization
(2010) Shazni, MNM; Perera, DYC; Maduranga, DAK; Dhileeban, R; Pasqual, AA
This paper proposes an algorithm, which can he used to compute Direction of Arrival of a sound source in real time Comparing to existing algorithms this algorithm takes lesser time for implementation and therefore is much suitable for real time applications Algorithm essentially computes the lime Delay of Arrival (TDOA) which is subsequently used to find the direction of the sound source. This algorithm results in better accuracy where our experimentation showed the accuracy is around 8 degrees. Statistical analysis further revealed dim the error is more often distributed around 0 degrees.
item: Conference-Abstract
FPGA-Based compact and flexible architecture for real-time embedded vision systems
Samarawickrama, M; Pasqual, AA; Rodrigo, BKRP
A single-chip FPGA implementation of a vision core is an efficient way to design fast and compact embedded vision systems from the PCB design level. The scope of the research is to design a novel FPGA-based parallel architecture for embedded vision entirely with on-chip FPGA resources. We designed it by utilizing block-RAMs and IO interfaces on the FPGA. As a result, the system is compact, fast and flexible. We evaluated this architecture for several mid-level neighborhood algorithms using Xilinx Virtex-2 Pro (XC2VP30) FPGA. Our algorithm uses a vision core with a 100 MHz system clock which supports image processing on a low-resolution image of 128×128 pixels up to 200 images per second. The results are accurate. We have compared our results with existing FPGA implementations. The performance of the algorithms could be substantially improved by applying sufficient parallelism.
item: Conference-Extended-Abstract
FPGA-based system-on-chip architecture for real-time embedded vision systems
(2009) Samarawickrama, M; Pasqual, AA; Rodrigo, R
A single-chip FPGA implementation of a vision core is an efficient way to design fast and compact embedded vision systems from the PCB design level. The scope of the research is to design a novel FPGA-based parallel architecture entirely with on-chip FPGA resources. We designed it by utilizing block RAMs and lO interfaces on the FPGA. As a result, the system is compact, fast and flexible. We tested this architecture for spatial domain filtering algorithms using a Xilinx Virtex-2 Pro (XC2VP30) FPGA. Our algorithm uses a vision core with a 100 MHz system clock which supports image processing on a low-resolution image of 128x128 pixels up to 200 images per second. The results are accurate and fast as the fastest FPGA implementations reported to date. The performance of the algorithms could be substantially improved by applying sufficient parallelism.
item: Conference-Abstract
A Generalized preprocessing and feature extraction platform for scalp EEG signals on FPGA
Wijesinghe, LP; Wickramasuriya, DS; Pasqual, AA
Brain-computer interfaces (BCIs) require real-time feature extraction for translating input EEG signals recorded from a subject into an output command or decision. Owing to the inherent difficulties in EEG signal processing and neural decoding, many of the feature extraction algorithms are complex and computationally demanding. Presently, software does exist to perform real-time feature extraction and classification of EEG signals. However, the requirement of a personal computer is a major obstacle in bringing these technologies to the home and mobile user affording ease of use. We present the FPGA design and novel architecture of a generalized platform that provides a set of predefined features and preprocessing steps that can be configured by a user for BCI applications. The preprocessing steps include power line noise cancellation and baseline removal while the feature set includes a combination of linear and nonlinear, univariate and bivariate measures commonly utilized in BCIs. We provide a comparison of our results with software and also validate the platform by implementing a seizure detection algorithm on a standard dataset and obtained a classification accuracy of over 96%. A gradual transition of BCI systems to hardware would prove beneficial in terms of compactness, power consumption and much faster response to stimuli.
item: Conference-Abstract
GPU based non-overlapping multi-camera vehicle tracking
Gamage, TD; Samarawickrama, JG; Pasqual, AA
Vehicle tracking and surveillance is an area which is having a considerable attention in the context of security and safety. The detection and tacking of moving vehicles through multiple cameras is considered as a method of vehicle surveillance. This work addresses a problem of detecting and matching vehicles through multiple cameras. The power of GPUs are used to increase the number of video streams which can be processed using a single computer. In the detection process the Gabor filter is used as a directional filter and the SURF is used by the matcher to uniquely represent the vehicle.
item: Conference-Abstract
Hardware assisted IP stack
(2011) Ellawala, NM; Kehelwala, KGDC; Koggalahewa, NA; Kotagodahetti, RPK; Pasqual, AA
TCP/IP performance improvement is a major concern in low latency network applications. The network interface speed is a significant factor to be concerned on, when working with applications that require high speeds such as datacenter applications in cloud computing environments. This paper presents on the processing speed improvement of the IP layer of the TCP/IP stack. This will he mainly realized through hardware implementation of the IP layer functions. The system is implemented on a FPGA based hardware platform. Our Main objective is to study about the achie\>able performance gains by implementing a scalable hardware based IP layer.
item: Conference-Abstract
HEVC Inverse transform architecture utilizing coefficient sparsity
Abeydeera, M; Pasqual, AA
The inverse transform function of the HEVC Decoder has grown greatly in complexity with the addition of larger transform sizes and recent works have focused on efficient architectures that can achieve the required throughput. In this work we make the observation that a majority of the coefficients in a typical transform operation is zero, and therefore has no impact on the final outcome. We propose an architecture that can efficiently operate on such sparse matrices and introduce a scheduling strategy which completes a 2D IDCT with bare minimum iterations, with the added advantage of being able to integrate seamlessly with the entropy decoder without a coefficient reordering buffer. Experiments show that although the performance of this approach is scalable with the bit rate, a 120 MHz operating frequency is sufficient to handle QHD @ 48 Mbps, which is less than one third of the frequency requirement of prior work.
item: Thesis-Abstract
High performance parallel packet classification architecture with popular rule caching
(2015-07-09) Lakshitha, OGS; Pasqual, AA
As the Internet evolves novel services and applications are being introduced which require di erent levels of Quality of Service or non best e ort service for proper functionality. Thus, there is a requirement for future networking equipment to distinguish between tra c ows belonging to di erent applications. The enabling function for such di erentiation is multi eld packet classi cation. Traditional software and hardware approaches to multi eld Packet Classi cation are being challenged due to the exponential growth of internet tra c and data rates. Cur- rent growth rates of silicon technologies will not be able to handle the growth rates of Internet tra c, data rates, and rule database storage requirements in the future. Even though many research have been done in this area, packet classi- cation technologies that support scalability in both line rates and rule sets is scarce. We try to address these issues through a hardware architectural approach for packet classi cation. We identify that classifying multiple packet streams simul- taneously by utilizing immense parallelism o ered by modern hardware technolo- gies while sharing a common rule database among several packet classi cation modules is the solution to the ever widening gap between Internet data rates and silicone speeds. Main contribution of this work is design and implementation of a packet classi cation architecture which has following characteristics: scalability in terms of both throughput and number of rules, capability of classifying parallel packet streams simultaneously, capability of using temporal locality of Internet tra c to increase the classi cation throughput by identifying classi cation rules which are popular among incoming packets and caching them in private caching entities in classi cation modules to avoid contentions at the shared rule database. Simulation results revealed that proposed architecture is capable of achieving a throughput of more than 200Gbps for worst case packet size of 40 bytes. Proposed architecture was implemented on NetFPGA platform and the classi cation was done at full line rate.
item: Conference-Full-text
High performance software application acceleration using field programmable gate arrays
(2010) Lakshan, HMRD; Liyanage, CM; Perera, TTD; Wijesundera, DSS; Pasqual, AA
From the early Fanes of foreign trade which consisted of direct exchange of commodities, financial exchanges have evolved to process transactions using high end computer systems. In highly competitive financial markets low latency and high throughput have become the utmost concern for all solution providers for financial exchanges The Financial Information eXchange (FIX) Protocol is one of the most commonly used protocols in financial trading systems The processing of this protocol is current/} being done on software and the advancements have been such that data processing on software has reached its saturation and solution providers for stock exchanges are nowadays researching for possibilities of improving the latency and throughput. The high level of parallelism in hardware implementations compared to software has made hardware the only possible solution for this increasingly high demand Thus, our solution is an implementation of the FIX protocol on Field Programmable Gate Array (FPGA) s which offloads the processing of the FIX protocol to a FPG. I Board interfaced through PC I Express. This processing core successfully implemented on a Xilinx Virtex 5 FPGA, consists of a Decoder and an Encoder for version 4.2 of the FIX protocol. It processes 5 million messages per second for encoding and 3.8 million messages per second for decoding and has latencies of only 170-330 nanoseconds for encoding and 180-360 nanoseconds for decoding where as the best figures obtained so far in the software approach is a throughput of20,000 messages per second and a latency of 50 microseconds.
item: Conference-Abstract
Image filtering with MapReduce in pseudo-distribution mode
Gamage, TD; Samarawickrama, JG; Rodrigoz, R; Pasqual, AA
The massive volume of video and image data,compels them to be stored in a distributed file system. To process the data stored in the distributed file system, Google proposed a programming model named MapReduce. Existing methods of processing images held in such a distributed file system, requires whole image or a substantial portion of the image to be streamed every time a filter is applied. In this work an image filtering technique using MapReduce programming model is proposed, which only requires the image to be streamed only once. The proposed technique extends for a cascade of image filters with the constrain of a fixed kernel size. To verify the proposed technique for a single filter a median filter is applied on an image with salt and pepper noise. In addition a corner detection algorithm is implemented with the use of a filter cascade. Comparison of the results of noise filtering and corner detection with the corresponding CPU version show the accuracy of the method.
item: Conference-Abstract
Image Filtering with MapReduce in Pseudo-Distribution Mode
(2015-08-14) Gamage, TD; Samarawickrama, JG; Rodrigo, R; Pasqual, AA
The massive volume of video and image data, compels them to be stored in a distributed file system. To process the data stored in the distributed file system, Google proposed a programming model named MapReduce. Existing methods of processing images held in such a distributed file system, requires whole image or a substantial portion of the image to be streamed every time a filter is applied. In this work an image filtering technique using MapReduce programming model is proposed, which only requires the image to be streamed only once. The proposed technique extends for a cascade of image filters with the constrain of a fixed kernel size. To verify the proposed technique for a single filter a median filter is applied on an image with salt and pepper noise. In addition a corner detection algorithm is implemented with the use of a filter cascade. Comparison of the results of noise filtering and corner detection with the corresponding CPU version show the accuracy of the method.
item: Thesis-Abstract
Intuitive reasoning for epistemic uncertainty
(2016-05-25) Senaratne, DN; Pasqual, AA; Jayasinghe, JAKS; Kulasekera, EC
Epistemic uncertainty, characterized by subjectiveness and partial availability of information, is associated with the domains of multi-sensor fusion, evidence processing, etc. Mathernatical Theory of Evidence, pioneered by Glenn Shafer [1], is a branch of study seeking to analytically model and manipulate the epistemic uncertainty entertained by an agent. The field is relatively young and cluttered with schemes that are used non-cohesively and often counter-intuitively. Further the issue is made worse as none of the schemes seems to be capable of modeling fully the uncertainty met in practice. Interestingly, humans are capable of intuitively handling such uncertainty in statements in their day-to-day activities. It is apparent that the reasons for deficiencies in existing models is the unjustified nature of their application approaches. This research seeks to enhance the intuitiveness and flexibility in mathematical modeling of epistemic uncertainty. It identifies three aspects, which any model should address cautiously. Namely, the manner in which real world propositions are mathematically represented, the manner in which uncertainty entertained by an agent is conveyed as a number assignment, and the manner in which expressed uncertainty is combined and conditioned. Novel strategies that parallels the way humans reason and hence enhance the intuitiveness, are introduced to overcome shortcomings in the existing mathematical representation. Further it proposes representing support as functions of propositions which are formed using a Boolean algebra on a set of hypotheses representing the context. Handling of some complicated propositions is simplified by introducing N-of, a logical operator formalizing the human notion 'N of n'. The research also proposes methods that can be used to select an appropriate combination strategy based on the contextual relationships between the frames, for a given evidence. It is further noted that the counter intuitive results obtained using the existing combination function's is a direct consequence of one being unaware of this relationship. Latter part of the research focuses on the concept of pre-conditioning, where evidence is conditioned based on external information one deems as certain. It also examines how comparable evidence may be averaged based on pre-determined weights. Although, a couple of novel concepts are introduced, the concepts retains backward compatibility with what is already established in this domain. Being modular, the proposed schemes can be selectively integrated with the existing techniques.
item: Conference-Abstract
Layered depth image based HEVC multi-view codec
Kirshanthan, S; Lajanugen, L; Panagoda, PND; Wijesinghe, LP; De Silva, DVSX; Pasqual, AA
Multi-view video has gained widespread popularity in the recent years. 3DTV, surveillance, immersive teleconferencing and free view-point television are few notable applications of multi-view video. Excessive storage and transmission bandwidth requirements are the major challenges faced by the industry in facilitating multi-view video applications. This paper presents efficient tools for coding of multi-view video based on the state of the art single view video coding standard H.265/HEVC (High Efficiency Video Coding). Our approach employs the LDI (Layered Depth Image) representation technique which is capable of compactly representing 3D scene content. We propose techniques and algorithms for LDI construction, view synthesis, efficient coding of LDI layers and associated auxiliary information. Subjective assessments indicate that our approach offers more than 50% reduction in bitrate compared to HEVC simulcast for the same subjective quality under practical operating bitrates.
item: Conference-Abstract
Line rate parallel packet classification module for NetFPGA platform
Gamage, S; Pasqual, AA
Multi field packet classification is the enabling function for many novel and emerging network applications. Exponential growth of Internet traffic and classification rule sets demand novel hardware based architectural approaches to packet classification. Even though this is an immensely studied area, packet classification that supports scalability in both line rates and rule sets is scarce. In this paper we present experience gained while implementing a parallel packet classification engine architecture on a popular reconfigurable router platform. The architecture exploits parallelism offered in modern hardware technologies to classify multiple packets simultaneously to increase the throughput. The architecture is also capable of utilizing temporal locality present in internet traffic to increase the throughput. The Architecture was implemented on NetFPGA platform and packet classification was done at full line rate without degrading the data rate or the round trip time.