Cross-vit: cross-attention vision transformer for image duplicate detection

Chandrasiri, MDN; Talagala, PD

Cross-vit: cross-attention vision transformer for image duplicate detection

dc.contributor.author	Chandrasiri, MDN
dc.contributor.author	Talagala, PD
dc.contributor.editor	Piyatilake, ITS
dc.contributor.editor	Thalagala, PD
dc.contributor.editor	Ganegoda, GU
dc.contributor.editor	Thanuja, ALARR
dc.contributor.editor	Dharmarathna, P
dc.date.accessioned	2024-02-06T08:36:41Z
dc.date.available	2024-02-06T08:36:41Z
dc.date.issued	2023-12-07
dc.description.abstract	Duplicate detection in image databases has immense significance across diverse domains. Its utility transcends specific applications, adapting seamlessly to a range of use cases, either as a standalone process or an integrated component within broader workflows. This study explores cutting-edge vision transformer architecture to revolutionize feature extraction in the context of duplicate image identification. Our proposed framework combines the conventional transformer architecture with a groundbreaking cross-attention layer developed specifically for this study. This unique cross-attention transformer processes pairs of images as input, enabling intricate cross-attention operations that delve into the interconnections and relationships between the distinct features in the two images. Through meticulous iterations of Cross-ViT, we assess the ranking capabilities of each version, highlighting the vital role played by the integrated cross-attention layer between transformer blocks. Our research culminates in recommending a final optimal model that capitalizes on the synergies between higher-dimensional hidden embeddings and mid-size ViT variations, thereby optimizing image pair ranking. In conclusion, this study unveils the immense potential of the vision transformer and its novel cross-attention layer in the domain of duplicate image detection. The performance of the proposed framework was assessed through a comprehensive comparative evaluation against baseline CNN models using various benchmark datasets. This evaluation further underscores the transformative power of our approach. Notably, our innovation in this study lies not in the introduction of new feature extraction methods but in the introduction of a novel cross-attention layer between transformer blocks grounded in the scaled dot-product attention mechanism.	en_US
dc.identifier.conference	8th International Conference in Information Technology Research 2023	en_US
dc.identifier.department	Information Technology Research Unit, Faculty of Information Technology, University of Moratuwa.	en_US
dc.identifier.email	dncnawodya@gmail.com	en_US
dc.identifier.email	priyangad@uom.lk	en_US
dc.identifier.faculty	IT	en_US
dc.identifier.pgnos	pp. 1-6	en_US
dc.identifier.place	Moratuwa, Sri Lanka	en_US
dc.identifier.proceeding	Proceedings of the 8th International Conference in Information Technology Research 2023	en_US
dc.identifier.uri	http://dl.lib.uom.lk/handle/123/22194
dc.identifier.year	2023	en_US
dc.language.iso	en	en_US
dc.publisher	Information Technology Research Unit, Faculty of Information Technology, University of Moratuwa.	en_US
dc.subject	Duplicate image detection	en_US
dc.subject	Vision transformers	en_US
dc.subject	Attention	en_US
dc.title	Cross-vit: cross-attention vision transformer for image duplicate detection	en_US
dc.type	Conference-Full-text	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Cross-ViT.pdf
Size:: 926.34 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

ICITR - 2023