Indexed by:
Abstract:
In industrial manufacturing, defect detection is essential. Since the 2020's ViT (vision transformer) hit the scene, ViT has been increasingly used for defect detection tasks in the vision domain. The advantage of ViT over convolutional neural networks (CNNs) is its ability to capture global remote dependencies to learn better features. In addition to this, contrast learning based on self-supervised methods has been well used in defect detection tasks. In this study, we suggest a strategy for detecting fabric defects that combines transformer and contrast learning. First, we propose a new backbone network CViT (convolutional vision transformer), which is improved relative to ViT by adding a convolutional attention module to the ordinary transformer block structure while using depthwise separable convolution instead of linear projection to obtain q, k, and v for attention computation. Second, to compensate for the potential instability of CViT, instead of the 16 × 16 big convolutions used in the ViT, we use several stacked 3 × 3 tiny convolutions to divide each enhanced sample into a series of patches. Third, we incorporate conditional position encoding(CPE) and explore the impact of different position encodings on model performance. Finally, the effectiveness of our model is demonstrated on three classical public datasets for fabric fault detection. © 2024 SPIE. All rights reserved.
Keyword:
Reprint 's Address:
Email:
Version:
Source :
ISSN: 0277-786X
Year: 2024
Volume: 13089
Language: English
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 2
Affiliated Colleges: