TEC-Net: Vision Transformer Embrace Convolutional Neural Networks for Medical Image Segmentation

Sun, R; Lei, T; Wan, Y; Xia, Y; Nandi, AK

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/29480

Full metadata record

DC Field	Value	Language
dc.contributor.author	Sun, R	-
dc.contributor.author	Lei, T	-
dc.contributor.author	Wan, Y	-
dc.contributor.author	Xia, Y	-
dc.contributor.author	Nandi, AK	-
dc.date.accessioned	2024-08-02T13:09:58Z	-
dc.date.available	2024-08-02T13:09:58Z	-
dc.date.issued	2023-12-20	-
dc.identifier	ORCiD: Tao Lei https://orcid.org/0000-0002-2104-9298	-
dc.identifier	ORCiD: Asoke K. Nandi https://orcid.org/0000-0001-6248-2875	-
dc.identifier	arXiv:2306.04086v3 [eess.IV]	-
dc.identifier.citation	Sun, R. et al. (2023) 'TEC-Net: Vision Transformer Embrace Convolutional Neural Networks for Medical Image Segmentation', arXiv:2306.04086v3 [eess.IV] (preprint), pp. 1 - 12. doi: 10.48550/arXiv.2306.04086.	en_US
dc.identifier.uri	https://bura.brunel.ac.uk/handle/2438/29480	-
dc.description	The code is publicly available at https://github.com/SR0920/TEC-Net .	en_US
dc.description	Comments: arXiv admin note: substantial text overlap with arXiv:2306.03373, https://arxiv.org/abs/2306.03373 .	-
dc.description.abstract	The hybrid architecture of convolution neural networks (CNN) and Transformer has been the most popular method for medical image segmentation. However, the existing networks based on the hybrid architecture suffer from two problems. First, although the CNN branch can capture image local features by using convolution operation, the vanilla convolution is unable to achieve adaptive extraction of image features. Second, although the Transformer branch can model the global information of images, the conventional self-attention only focuses on the spatial self-attention of images and ignores the channel and cross-dimensional self-attention leading to low segmentation accuracy for medical images with complex backgrounds. To solve these problems, we propose vision Transformer embrace convolutional neural networks for medical image segmentation (TEC-Net). Our network has two advantages. First, dynamic deformable convolution (DDConv) is designed in the CNN branch, which not only overcomes the difficulty of adaptive feature extraction using fixed-size convolution kernels, but also solves the defect that different inputs share the same convolution kernel parameters, effectively improving the feature expression ability of CNN branch. Second, in the Transformer branch, a (shifted)-window adaptive complementary attention module ((S)W-ACAM) and compact convolutional projection are designed to enable the network to fully learn the cross-dimensional long-range dependency of medical images with few parameters and calculations. Experimental results show that the proposed TEC-Net provides better medical image segmentation results than SOTA methods including CNN and Transformer networks. In addition, our TEC-Net requires fewer parameters and computational costs and does not rely on pre-training.	en_US
dc.format.extent	1 - 12	-
dc.format.medium	Electronic	-
dc.language	English	-
dc.language.iso	en_US	en_US
dc.publisher	Cornell University	en_US
dc.relation.uri	https://github.com/SR0920/TEC-Net	-
dc.relation.uri	https://arxiv.org/abs/2306.04086	-
dc.rights	Copyright © 2023 The Authors. This is a preprint made available under the arXiv.org - Non-exclusive license to distribute, see: https://arxiv.org/licenses/nonexclusive-distrib/1.0/license.html. This arXiv preprint (version 3) has been prepared for publication in a future issue of a journal, but has not been fully edited. Content may change prior to final publication. It has not been certified by peer review.	-
dc.rights.uri	https://arxiv.org/licenses/nonexclusive-distrib/1.0/license.html	-
dc.subject	image and video processing (eess.IV);	-
dc.subject	computer vision and pattern recognition (cs.CV)	-
dc.title	TEC-Net: Vision Transformer Embrace Convolutional Neural Networks for Medical Image Segmentation	en_US
dc.type	Article	en_US
dc.date.dateAccepted	2023-06-07	-
dc.identifier.doi	https://doi.org/10.48550/arXiv.2306.04086	-
dc.relation.isPartOf	arXiv	-
dc.identifier.eissn	2331-8422	-
dc.rights.license	https://arxiv.org/licenses/nonexclusive-distrib/1.0/license.html	-
dc.rights.holder	The Authors	-
Appears in Collections:	Dept of Electronic and Electrical Engineering Research Papers

Files in This Item:

File	Description	Size	Format
Preprint.pdf	Copyright © 2023 The Authors. This is a preprint made available under the arXiv.org - Non-exclusive license to distribute, see: https://arxiv.org/licenses/nonexclusive-distrib/1.0/license.html. This arXiv preprint (version 3) has been prepared for publication in a future issue of a journal, but has not been fully edited. Content may change prior to final publication. It has not been certified by peer review.	6.63 MB	Adobe PDF	View/Open

Show simple item record