Browsing by Author "Ambegoda, TD"

Now showing 1 - 3 of 3

item: Conference-Full-text
Adversarial learning to improve question image embedding in medical visual question answering
(IEEE, 2022-07) Silva, K; Maheepala, T; Tharaka, K; Ambegoda, TD; Rathnayake, M; Adhikariwatte, V; Hemachandra, K
Visual Question Answering (VQA) is a computer vision task in which a system produces an accurate answer to a given image and a question that is relevant to the image. Medical VQA can be considered as a subfield of general VQA, which focuses on images and questions in the medical domain. The VQA model’s most crucial task is to learn the question-image joint representation to reflect the information related to the correct answer. Medical VQA remains a difficult task due to the ineffectiveness of question-image embeddings, despite recent research on general VQA models finding significant progress. To address this problem, we propose a new method for training VQA models that utilizes adversarial learning to improve the question-image embedding and illustrate how this embedding can be used as the ideal embedding for answer inference. For adversarial learning, we use two embedding generators (question–image embedding and a question-answer embedding generator) and a discriminator to differentiate the two embeddings. The questionanswer embedding is used as the ideal embedding and the question-image embedding is improved in reference to that. The experiment results indicate that pre-training the question-image embedding generation module using adversarial learning improves overall performance, implying the effectiveness of the proposed method.
item: Conference-Full-text
Chest X-Ray Caption Generation with CheXNet
(IEEE, 2022-07) Wijerathna, V; Raveen, H; Abeygunawardhana, S; Ambegoda, TD; Rathnayake, M; Adhikariwatte, V; Hemachandra, K
Chest X-rays are provided with descriptive captions that summarize the crucial radiology findings in them in natural language. Although chest X-Ray image captioning is currently done manually by radiologists, automating it has received growing research interest in the medical domain because it is a tedious task and the high number of medical reports that are to be generated daily. In this paper, we propose an automatic chest X-ray captioning system consisting of two main components: an image feature extractor and a sentence generator. We did our experiment in two approaches. First, we tried using LXMERT, which is originally designed for question answering, as the sentence generator in our model combined with the Faster RCNN model. Second, we used CheXNet and a memory-driven transformer as the feature extractor and the sentence generator respectively. We trained and tested our model using the IU chest X-ray dataset. We evaluated the model using the BLUE, ROUGE-L and METEOR metrics which shows the CheXNet based approach outperforms the latter models.
item: Conference-Full-text
Sinhala fingerspelling sign language recognition with computer vision
(IEEE, 2022-07) Weerasooriya, AA; Ambegoda, TD; Rathnayake, M; Adhikariwatte, V; Hemachandra, K
Computer vision based sign language translation is usually based on using thousands of images or video sequences for model training. This is not an issue in the case of widely used languages such as American Sign Language. However, in case of languages with low resources such as Sinhala Sign Language, it’s challenging to use similar methods for developing translators since there are no known data sets available for such studies.In this study we have contributed a new dataset and developed a sign language translation method for the Sinhala Fingerspelling Alphabet. Our approach for recognizing fingerspelling signs involve decoupling pose classification from pose estimation and using postural synergies to reduce the dimensionality of features. As shown by our experiments, our method can achieve an average accuracy of over 87%. The size of the data set used is less than 12% of the size of data sets used in methods which have comparable accuracy. We have made the source code and the dataset publicly available.