浏览全部资源
扫码关注微信
1.安徽建筑大学 电子与信息工程学院,安徽 合肥 230601
2.安徽省古建筑智能感知与高维建模国际联合研究中心,安徽 合肥 230601
王坤侠,女,博士,副教授,从事人工智能情感计算研究,kxwang@ahjzu.edu.cn。
纸质出版日期:2024-04-25,
收稿日期:2023-10-18,
扫 描 看 全 文
王坤侠, 余万成, 胡玉霞. 嵌入混合注意力机制的Swin Transformer人脸表情识别[J]. 西北大学学报(自然科学版), 2024,54(2):168-176.
WANG Kunxia, YU Wancheng, HU Yuxia. Facial expression recognition in Swin Transformer by embedding hybrid attention mechanism[J]. Journal of Northwest University (Natural Science Edition), 2024,54(2):168-176.
王坤侠, 余万成, 胡玉霞. 嵌入混合注意力机制的Swin Transformer人脸表情识别[J]. 西北大学学报(自然科学版), 2024,54(2):168-176. DOI: 10.16152/j.cnki.xdxbzr.2024-02-003.
WANG Kunxia, YU Wancheng, HU Yuxia. Facial expression recognition in Swin Transformer by embedding hybrid attention mechanism[J]. Journal of Northwest University (Natural Science Edition), 2024,54(2):168-176. DOI: 10.16152/j.cnki.xdxbzr.2024-02-003.
人脸表情识别是心理学领域的一个重要研究方向,可应用于交通、医疗、安全和刑事调查等领域。针对卷积神经网络(CNN)在提取人脸表情全局特征的局限性,提出了一种嵌入混合注意力机制的Swin Transformer人脸表情识别方法,以Swin Transformer为主干网络,在模型Stage3的融合层(Patch Merging)中嵌入了混合注意力模块,该方法能够有效提取人脸面部表情的全局特征和局部特征。首先,层次化的Swin Transformer模型可有效获取深层全局特征信息。其次,嵌入的混合注意力模块结合了通道和空间注意力机制,在通道维度和空间维度上进行特征提取,从而让模型能够更好地提取局部位置的特征信息。同时,采用迁移学习方法对模型网络权重进行初始化,进而提高模型的精度和泛化能力。所提方法在FER2013、RAF-DB和JAFFE这3个公共数据集上分别达到了73.63%、87.01%和98.28%的识别准确率,取得了较好的识别效果。
Facial expression recognition is an important research domain in psychology that can be applied to many fields such as transportation
medical care
security
and criminal investigation. Given the limitations of convolutional neural networks (CNN) in extracting global features of facial expressions
this paper proposes a Swin Transformer method embedded with a hybrid attention mechanism for facial expression recognition. Using the Swin Transformer as the backbone network
a hybrid attention module is embedded in the fusion layer (Patch Merging) in the model of Stage3
which can effectively extract global and local features from facial expressions.Firstly
the hierarchical Swin Transformer model can effectively obtain deep global features.Secondly
the embedded hybrid attention module combines channel and spatial attention mechanisms to extract features in the channel dimension and spatial dimension
which can attain better local features. At the same time
this article uses the transfer learning method to initialize the model network weights
thereby improving the recognition performance and generalization ability.The proposed method achieved recognition accuracies of 73.63%
87.01%
and 98.28% on three public datasets (FER2013
RAF-DB
and JAFFE)respectively
achieving good recognition results.
表情识别Transformer注意力机制迁移学习
expression recognitionTransformerattention mechanismtransfer learning
李珊, 邓伟洪. 深度人脸表情识别研究进展[J]. 中国图象图形学报, 2020, 25(11):2306-2320.
LI S, DENG W H. Deep facial expression recognition: A survey[J]. Journal of Image and Graphics, 2020, 25(11):2306-2320.
ADYAPADY R R, ANNAPPA B. A comprehensive review of facial expression recognition techniques[J]. Multimedia Systems, 2023, 29(1): 73-103.
HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
QI Y F, ZHOU C Y, CHEN Y X. NA-Resnet: Neighbor Block and optimized attention module for global-local feature extraction in facial expression recognition[J]. Multimedia Tools and Applications, 2023, 82(11): 16375-16393.
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: ACM, 2017: 6000-6010.
MA T L, MAO M Y, ZHENG H H, et al. Oriented object detection with transformer[EB/OL]. (2021-06-06)[2023-09-20]. http://arxiv.org/abs/2106.03146http://arxiv.org/abs/2106.03146.
DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[EB/OL]. (2021-06-03)[2023-09-20]. http://arxiv.org/abs/2010.11929http://arxiv.org/abs/2010.11929.
WANG W H, XIE E Z, LI X, et al. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions[C]//2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 548-558.
WU H P, XIAO B, CODELLA N, et al. CvT: Introducing convolutions to vision transformers[C]//2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 22-31.
LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows[C]//2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 9992-10002.
LIU C, HIROTA K, DAI Y P. Patch attention convolutional vision transformer for facial expression recognition with occlusion[J]. Information Sciences, 2023, 619(C): 781-794.
FENG H Q, HUANG W K, ZHANG D H, et al. Fine-tuning swin transformer and multiple weights optimality-seeking for facial expression recognition[J]. IEEE Access, 2023, 11: 9995-10003.
CHEN X C, ZHENG X W, SUN K, et al. Self-supervised vision transformer-based few-shot learning for facial expression recognition[J]. Information Sciences, 2023, 634(C): 206-226.
祁宣豪, 智敏. 图像处理中注意力机制综述[J]. 计算机科学与探索, 2024, 18(2):345-362.
QI X H, ZHI M. Review of attention mechanisms in image processing[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(2):345-362.
JADERBERG M, SIMONYAN K, ZISSERMAN A, et al. Spatial transformer networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 2. Montreal: ACM, 2015: 2017-2025.
WANG Q L, WU B G, ZHU P F, et al. ECA-net: Efficient channel attention for deep convolutional neural networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 11531-11539.
WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]// European Conference on Computer Vision (ECCV). Cham: Springer, 2018: 3-19.
冯晓毅, 黄东, 崔少星, 等. 基于空时注意力网络的面部表情识别[J]. 西北大学学报(自然科学版), 2020, 50(3):319-327.
FENG X Y, HUANG D, CUI S X. Spatial-temporal attention network forfacial expression recognition[J]. Journal of Northwest University(Natural Science Edition).2020, 50(3):319-327.
HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141.
GOODFELLOW I J, ERHAN D, CARRIER P L, et al. Challenges in representation learning: A report on three machine learning contests[C]//The 20th International Conference on Neural Information Processing. Daegu: Springer, 2013:117-124.
LYONS M, AKAMATSU S, KAMACHI M, et al. Coding facial expressions with Gabor wavelets[C]//Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition. Nara: IEEE, 2002: 200-205.
LI S, DENG W H, DU J P. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2584-2593.
WANG K, PENG X J, YANG J F, et al. Region attention networks for pose and occlusion robust facial expression recognition[J]. IEEE Transactions on Image Processing, 2020, 29: 4057-4069.
CHU X X, TIAN Z, WANG Y Q, et al. Twins: Revisiting the design of spatial attention in vision transformers[EB/OL]. (2021-09-30)[2023-09-20]. http://arxiv.org/abs/2104.13840http://arxiv.org/abs/2104.13840.
ZHENG C, MENDIETA M, CHEN C. POSTER: A pyramid cross-fusion transformer network for facial expression recognition[C]//2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). Paris: IEEE, 2023: 3138-3147.
LI Y J, LU G M, LI J X, et al. Facial expression recognition in the wild using multi-level features and attention mechanisms[J]. IEEE Transactions on Affective Computing, 2023, 14(1):451-462.
SINGH R, SHARMA H, MEHTA N K, et al. Efficientnet for human fer using transfer learning[J]. ICTACT Journal on Soft Computing, 2023, 13(1): 2792-2797.
WANG K X, HE R X, WANG S, et al. The Efficient-CapsNet model for facial expression recognition[J]. Applied Intelligence, 2023, 53(13): 16367-16380.
LI S Q, LI W, WEN S P, et al. Auto-FERNet: A facial expression recognition network with architecture search[J]. IEEE Transactions on Network Science and Engineering, 2021, 8(3): 2213-2222.
MEENA G, MOHBEY K K, KUMAR S. Sentiment analysis on images using convolutional neural networks based Inception-V3 transfer learning approach[J]. International Journal of Information Management Data Insights, 2023, 3(1): 100174.
SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization[C]//2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 618-626.
0
浏览量
0
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构