research-article Free Access
- Authors:
- Yangbangyan Jiang State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
Search about this author
- Xiaodan Li Security Department of Alibaba Group, Hangzhou, China
Security Department of Alibaba Group, Hangzhou, China
Search about this author
- Yuefeng Chen Security Department of Alibaba Group, Hangzhou, China
Security Department of Alibaba Group, Hangzhou, China
Search about this author
- Yuan He Security Department of Alibaba Group, Hangzhou, China
- Qianqian Xu Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Search about this author
- Zhiyong Yang School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China
School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China
Search about this author
- Xiaochun Cao School of Cyber Science and Technology, Shenzhen Campus, Sun Yat-sen University, Shenzhen, China
School of Cyber Science and Technology, Shenzhen Campus, Sun Yat-sen University, Shenzhen, China
Search about this author
- Qingming Huang School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China
School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China
Search about this author
IEEE Transactions on Pattern Analysis and Machine IntelligenceVolume 45Issue 5May 2023pp 5970–5987https://doi.org/10.1109/TPAMI.2022.3208419
Published:21 September 2022Publication History
- 0citation
- 0
- Downloads
Metrics
Total Citations0Total Downloads0Last 12 Months0
Last 6 weeks0
- Get Citation Alerts
New Citation Alert added!
This alert has been successfully added and will be sent to:
You will be notified whenever a record that you have chosen has been cited.
To manage your alert preferences, click on the button below.
Manage my Alerts
New Citation Alert!
Please log in to your account
- Publisher Site
IEEE Transactions on Pattern Analysis and Machine Intelligence
Volume 45, Issue 5
PreviousArticleNextArticle
Abstract
In recent years, great progress has been made to incorporate unlabeled data to overcome the inefficiently supervised problem via semi-supervised learning (SSL). Most state-of-the-art models are based on the idea of pursuing consistent model predictions over unlabeled data toward the input noise, which is called <italic>consistency regularization</italic>. Nonetheless, there is a lack of theoretical insights into the reason behind its success. To bridge the gap between theoretical and practical results, we propose a worst-case consistency regularization technique for SSL in this article. Specifically, we first present a generalization bound for SSL consisting of the empirical loss terms observed on labeled and unlabeled training data separately. Motivated by this bound, we derive an SSL objective that minimizes the largest inconsistency between an original unlabeled sample and its multiple augmented variants. We then provide a simple but effective algorithm to solve the proposed minimax problem, and theoretically prove that it converges to a stationary point. Experiments on five popular benchmark datasets validate the effectiveness of our proposed method.
References
- [1] Deng J., Dong W., Socher R., Li L.-J., Li K., and Fei-Fei L., “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2009 , pp. 248–255.Google Scholar - [2] Lin T.et al., “Microsoft COCO: Common objects in context,” in Proc. Eur. Conf. Comput. Vis.,
2014 , pp. 740–755.Google Scholar - [3] Kurakin A.et al., “FixMatch: Simplifying semi-supervised learning with consistency and confidence,” in Proc. Int. Conf. Neural Inf. Process. Syst.,
2020 , pp. 596–608.Google Scholar - [4] Jeong J., Lee S., Kim J., and Kwak N., “Consistency-based semi-supervised learning for object detection,” in Proc. Int. Conf. Neural Inf. Process. Syst.,
2019 , pp. 10 758–10 767.Google Scholar - [5] Mittal S., Tatarchenko M., and Brox T., “Semi-supervised semantic segmentation with high- and low-level consistency,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 4, pp. 1369–1379, Apr. 2021.Google Scholar
- [6] Wang Y., Khan S., Gonzalez-Garcia A., van de Weijer J., and Khan F. S., “Semi-supervised learning for few-shot image-to-image translation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
2020 , pp. 4452–4461.Google Scholar - [7] Li X., Makihara Y., Xu C., Yagi Y., and Ren M., “Gait recognition via semi-supervised disentangled representation learning to identity and covariate features,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
2020 , pp. 13 306–13 316.Google Scholar - [8] Tarvainen A. and Valpola H., “Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results,” in Proc. Int. Conf. Neural Inf. Process. Syst.,
2017 , pp. 1195–1204.Google Scholar - [9] Miyato T., Maeda S., Koyama M., and Ishii S., “Virtual adversarial training: A regularization method for supervised and semi-supervised learning,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 8, pp. 1979–1993, Aug. 2019.Google Scholar
Cross Ref
- [10] Berthelot D., Carlini N., Goodfellow I. J., Papernot N., Oliver A., and Raffel C., “MixMatch: A holistic approach to semi-supervised learning,” in Proc. Int. Conf. Neural Inf. Process. Syst.,
2019 , pp. 5050–5060.Google Scholar - [11] Xie Q., Dai Z., Hovy E., Luong M.-T., and Le Q. V., “Unsupervised data augmentation for consistency training,” in Proc. Int. Conf. Neural Inf. Process. Syst.,
2020 , pp. 6256–6268.Google Scholar - [12] Berthelot D.et al., “ReMixMatch: Semi-supervised learning with distribution matching and augmentation anchoring,” in Proc. Int. Conf. Learn. Representations,
2020 , pp. 1–13.Google Scholar - [13] Laine S. and Aila T., “Temporal ensembling for semi-supervised learning,” in Proc. Int. Conf. Learn. Representations,
2017 , pp. 1–13.Google Scholar - [14] Wei C., Shen K., Chen Y., and Ma T., “Theoretical analysis of self-training with deep networks on unlabeled data,” in Proc. Int. Conf. Learn. Representations,
2021 , pp. 1–30.Google Scholar - [15] Zhai R.et al., “Adversarially robust generalization just requires more unlabeled data,” 2019, arXiv:1906.00555.Google Scholar
- [16] van Engelen J. E. and Hoos H. H., “A survey on semi-supervised learning,” Mach. Learn., vol. 109, no. 2, pp. 373–440, 2020.Google Scholar
Cross Ref
- [17] Chapelle O. and Zien A., “Semi-supervised classification by low density separation,” in Proc. Int. Workshop Artif. Intell. Statist.,
2005 , pp. 57–64.Google Scholar - [18] Lee D.-H., “Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks,” in Proc. Int. Conf. Mach. Learn. Workshop,
2013 , pp. 1–6.Google Scholar - [19] Rasmus A., Berglund M., Honkala M., Valpola H., and Raiko T., “Semi-supervised learning with ladder networks,” in Proc. Int. Conf. Neural Inf. Process.,
2015 , pp. 3546–3554.Google Scholar - [20] Sajjadi M., Javanmardi M., and Tasdizen T., “Regularization with stochastic transformations and perturbations for deep semi-supervised learning,” in Proc. Int. Conf. Neural Inf. Process.,
2016 , pp. 1163–1171.Google Scholar - [21] Grandvalet Y. and Bengio Y., “Semi-supervised learning by entropy minimization,” in Proc. Int. Conf. Neural Inf. Process. Syst.,
2004 , pp. 529–536.Google Scholar - [22] Verma V., Lamb A., Kannala J., Bengio Y., and Lopez-Paz D., “Interpolation consistency training for semi-supervised learning,” in Proc. Int. Joint Conf. Artif. Intell.,
2019 , pp. 3635–3641.Google Scholar - [23] Zhang H., Cissé M., Dauphin Y. N., and Lopez-Paz D., “Mixup: Beyond empirical risk minimization,” in Proc. Int. Conf. Learn. Representations,
2018 , pp. 1–13.Google Scholar - [24] Cubuk E. D., Zoph B., Shlens J., and Le Q. V., “Randaugment: Practical automated data augmentation with a reduced search space,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops,
2020 , pp. 3008–3017.Google Scholar - [25] Ren Z., Yeh R. A., and Schwing A. G., “Not all unlabeled data are equal: Learning to weight data in semi-supervised learning,” in Proc. Int. Conf. Neural Inf. Process. Syst.,
2020 , pp. 21 786–21 797.Google Scholar - [26] Chen T., Kornblith S., Swersky K., Norouzi M., and Hinton G. E., “Big self-supervised models are strong semi-supervised learners,” in Proc. Int. Conf. Neural Inf. Process. Syst.,
2020 , pp. 22 243–22 255.Google Scholar - [27] Beyer L., Zhai X., Oliver A., and Kolesnikov A., “S4L: Self-supervised semi-supervised learning,” in Proc. IEEE/CVF Int. Conf. Comput. Vis.,
2019 , pp. 1476–1485.Google Scholar - [28] Chen J., Shah V., and Kyrillidis A., “Negative sampling in semi-supervised learning,” in Proc. Int. Conf. Mach. Learn.,
2020 , pp. 1704–1714.Google Scholar - [29] Zhang L. and Qi G., “WCP: Worst-case perturbations for semi-supervised deep learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
2020 , pp. 3911–3920.Google Scholar - [30] Yarowsky D., “Unsupervised word sense disambiguation rivaling supervised methods,” in Proc. Annu. Meeting Assoc. Comput. Linguistics,
1995 , pp. 189–196.Google Scholar - [31] Rosenberg C., Hebert M., and Schneiderman H., “Semi-supervised self-training of object detection models,” in Proc. IEEE Workshop Appl. Comput. Vis.,
2005 , pp. 29–36.Google Scholar - [32] Reichart R. and Rappoport A., “Self-training for enhancement and domain adaptation of statistical parsers trained on small datasets,” in Proc. Annu. Meeting Assoc. Comput. Linguistics,
2007 , pp. 616–623.Google Scholar - [33] Xie Q., Luong M., Hovy E. H., and Le Q. V., “Self-training with noisy student improves ImageNet classification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
2020 , pp. 10 684–10 695.Google Scholar - [34] Iscen A., Tolias G., Avrithis Y., and Chum O., “Label propagation for deep semi-supervised learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
2019 , pp. 5070–5079.Google Scholar - [35] Li S., Liu B., Chen D., Chu Q., Yuan L., and Yu N., “Density-aware graph for deep semi-supervised visual recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
2020 , pp. 13 397–13 406.Google Scholar - [36] Chen P., Ma T., Qin X., Xu W., and Zhou S., “Data-efficient semi-supervised learning by reliable edge mining,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
2020 , pp. 9189–9198.Google Scholar - [37] Cascante-Bonilla P., Tan F., Qi Y., and Ordonez V., “Curriculum labeling: Revisiting pseudo-labeling for semi-supervised learning,” in Proc. AAAI Conf. Artif. Intell.,
2021 , pp. 6912–6920.Google Scholar - [38] Han T., Gao J., Yuan Y., and Wang Q., “Unsupervised semantic aggregation and deformable template matching for semi-supervised learning,” in Proc. Int. Conf. Neural Inf. Process. Syst.,
2020 , pp. 9972–9982.Google Scholar - [39] Xu Y.et al., “Dash: Semi-supervised learning with dynamic thresholding,” in Proc. Int. Conf. Mach. Learn.,
2021 , pp. 11 525–11 536.Google Scholar - [40] Tai K. S., Bailis P., and Valiant G., “Sinkhorn label allocation: Semi-supervised classification via annealed self-training,” in Proc. Int. Conf. Mach. Learn.,
2021 , pp. 10 065–10 075.Google Scholar - [41] Gong C., Wang D., and Liu Q., “AlphaMatch: Improving consistency for semi-supervised learning with alpha-divergence,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
2021 , pp. 13 683–13 692.Google Scholar - [42] Pham H., Dai Z., Xie Q., and Le Q. V., “Meta pseudo labels,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
2021 , pp. 11 557–11 568.Google Scholar - [43] Zhou X. and Belkin M., “Semi-supervised learning,” in Academic Press Library in Signal Processing, vol. 1. Amsterdam, The Netherlands: Elsevier, 2014, pp. 1239–1269.Google Scholar
- [44] Mohri M., Rostamizadeh A., and Talwalkar A., Foundations of Machine Learning. Cambridge, MA, USA: MIT Press, 2012.Google Scholar
Digital Library
- [45] Golowich N., Rakhlin A., and Shamir O., “Size-independent sample complexity of neural networks,” in Proc. Conf. Learn. Theory,
2018 , pp. 297–299.Google Scholar - [46] Wei C. and Ma T., “Data-dependent sample complexity of deep neural networks via Lipschitz augmentation,” in Proc. Int. Conf. Neural Inf. Process. Syst.,
2019 , pp. 9722–9733.Google Scholar - [47] Goodfellow I. J.et al., “Generative adversarial nets,” in Proc. Int. Conf. Neural Inf. Process. Syst.,
2014 , pp. 2672–2680.Google Scholar - [48] Arjovsky M., Chintala S., and Bottou L., “Wasserstein generative adversarial networks,” in Proc. Int. Conf. Mach. Learn.,
2017 , pp. 214–223.Google Scholar - [49] Arora S., Ge R., Liang Y., Ma T., and Zhang Y., “Generalization and equilibrium in generative adversarial nets (GANs),” in Proc. Int. Conf. Mach. Learn.,
2017 , pp. 224–232.Google Scholar - [50] Osborne M. J.et al., An Introduction to Game Theory, vol. 3. New York, NY, USA: Oxford Univ. Press, 2004.Google Scholar
- [51] Jin C., Netrapalli P., and Jordan M. I., “What is local optimality in nonconvex-nonconcave minimax optimization?,” in Proc. Int. Conf. Mach. Learn.,
2020 , pp. 4880–4889.Google Scholar - [52] Rafique H., Liu M., Lin Q., and Yang T., “Weakly-convex–concave min–max optimization: Provable algorithms and applications in machine learning,” Optim. Methods Softw., pp. 1–35, 2021.Google Scholar
- [53] Boyd S., Boyd S. P., and Vandenberghe L., Convex Optimization. Cambridge, U.K.: Cambridge Univ. Press, 2004.Google Scholar
Digital Library
- [54] Krizhevsky A. and Hinton G., “Learning multiple layers of features from tiny images,”
Master's thesis , Dept. Comput. Sci., Univ. Toronto, Toronto, ON, Canada, 2009.Google Scholar - [55] Netzer Y., Wang T., Coates A., Bissacco A., Wu B., and Ng A. Y., “Reading digits in natural images with unsupervised feature learning,” in Proc. NIPS Workshop Deep Learn. Unsupervised Feature Learn.,
2011 , pp. 1–9.Google Scholar - [56] Coates A., Ng A. Y., and Lee H., “An analysis of single-layer networks in unsupervised feature learning,” in Proc. Int. Conf. Artif. Intell. Statist.,
2011 , pp. 215–223.Google Scholar - [57] Zagoruyko S. and Komodakis N., “Wide residual networks,” in Proc. Brit. Mach. Vis. Conf.,
2016 , pp. 87.1–87.12.Google Scholar - [58] van der Maaten L. and Hinton G., “Visualizing data using t-SNE,” J. Mach. Learn. Res., vol. 9, no. 86, pp. 2579–2605, 2008.Google Scholar
- [59] Chen X., Fan H., Girshick R., and He K., “Improved baselines with momentum contrastive learning,” 2020, arXiv:2003.04297.Google Scholar
- [60] Cai Z., Ravichandran A., Maji S., Fowlkes C. C., Tu Z., and Soatto S., “Exponential moving average normalization for self-supervised and semi-supervised learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2021 , pp. 194–203.Google Scholar - [61] He K., Zhang X., Ren S., and Sun J., “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
2016 , pp. 770–778.Google Scholar - [62] Nesterov Y., Introductory Lectures on Convex Optimization: A Basic Course, vol. 87. Berlin, Germany: Springer, 2003.Google Scholar
- [63] Long P. M. and Sedghi H., “Generalization bounds for deep convolutional neural networks,” in Proc. Int. Conf. Learn. Representations,
2020 , pp. 1–15.Google Scholar - [64] Yang Z., Xu Q., Bao S., Cao X., and Huang Q., “Learning with multiclass AUC: Theory and algorithms,” IEEE Trans. Pattern Anal. Mach. Intell., early access, Jul. 30, 2021, doi: 10.1109/TPAMI.2021.3101125.Google Scholar
- [65] Reeve H. W. J. and Kabán A., “Optimistic bounds for multi-output learning,” in Proc. Int. Conf. Mach. Learn.,
2020 , pp. 8030–8040.Google Scholar
Cited By
View all
Recommendations
- Semi-supervised partial multi-label classification via consistency learning
Highlights
- We solve the inconsistency of the distributions in features and labels and acquire the label level instance correlation via HSIC for partial multi-label ...
Abstract
Partial multi-label learning refers to the problem that each instance is associated with a candidate label set involving both relevant and noisy labels. Existing solutions mainly focus on label disambiguation, while ignoring the ...
Read More
- Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
In multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...
Read More
- Multiview Semi-Supervised Learning with Consensus
Obtaining high-quality and up-to-date labeled data can be difficult in many real-world machine learning applications. Semi-supervised learning aims to improve the performance of a classifier trained with limited number of labeled data by utilizing the ...
Read More
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in
Full Access
Get this Article
- Information
- Contributors
Published in
IEEE Transactions on Pattern Analysis and Machine Intelligence Volume 45, Issue 5
May 2023
1242 pages
ISSN:0162-8828
Issue’s Table of Contents
0162-8828 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.
Sponsors
In-Cooperation
Publisher
IEEE Computer Society
United States
Publication History
- Published: 21 September 2022
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics
- Bibliometrics
- Citations0
Article Metrics
- View Citations
Total Citations
Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet
Digital Edition
View this article in digital edition.
View Digital Edition
- Figures
- Other
Close Figure Viewer
Browse AllReturn
Caption
View Issue’s Table of Contents