MaxMatch: Semi-Supervised Learning With Worst-Case Consistency (2024)

research-article

Free Access

Authors:
Yangbangyan Jiang State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China

State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China

0000-0002-0148-8306
Search about this author

,
Xiaodan Li Security Department of Alibaba Group, Hangzhou, China

Security Department of Alibaba Group, Hangzhou, China
Search about this author

,
Yuefeng Chen Security Department of Alibaba Group, Hangzhou, China

Security Department of Alibaba Group, Hangzhou, China
Search about this author

,
Yuan He Security Department of Alibaba Group, Hangzhou, China

Security Department of Alibaba Group, Hangzhou, China

0000-0002-6885-1341
Search about this author

,
Qianqian Xu Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

0000-0002-3512-7277
Search about this author

,
Zhiyong Yang School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China

School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China

0000-0002-4409-4999
Search about this author

,
Xiaochun Cao School of Cyber Science and Technology, Shenzhen Campus, Sun Yat-sen University, Shenzhen, China

School of Cyber Science and Technology, Shenzhen Campus, Sun Yat-sen University, Shenzhen, China

0000-0001-7141-708X
Search about this author

,
Qingming Huang School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China

School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China

0000-0001-7542-296X
Search about this author

IEEE Transactions on Pattern Analysis and Machine IntelligenceVolume 45Issue 5May 2023pp 5970–5987https://doi.org/10.1109/TPAMI.2022.3208419

Published:21 September 2022Publication History

0citation
0
Downloads

Metrics

Total Citations0Total Downloads0

Last 12 Months0

Last 6 weeks0

IEEE Transactions on Pattern Analysis and Machine Intelligence

Volume 45, Issue 5

PreviousArticleNextArticle

Skip Abstract Section

Abstract

In recent years, great progress has been made to incorporate unlabeled data to overcome the inefficiently supervised problem via semi-supervised learning (SSL). Most state-of-the-art models are based on the idea of pursuing consistent model predictions over unlabeled data toward the input noise, which is called <italic>consistency regularization</italic>. Nonetheless, there is a lack of theoretical insights into the reason behind its success. To bridge the gap between theoretical and practical results, we propose a worst-case consistency regularization technique for SSL in this article. Specifically, we first present a generalization bound for SSL consisting of the empirical loss terms observed on labeled and unlabeled training data separately. Motivated by this bound, we derive an SSL objective that minimizes the largest inconsistency between an original unlabeled sample and its multiple augmented variants. We then provide a simple but effective algorithm to solve the proposed minimax problem, and theoretically prove that it converges to a stationary point. Experiments on five popular benchmark datasets validate the effectiveness of our proposed method.

References

[1] Deng J., Dong W., Socher R., Li L.-J., Li K., and Fei-Fei L., “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 248–255.Google Scholar
[2] Lin T.et al., “Microsoft COCO: Common objects in context,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 740–755.Google Scholar
[3] Kurakin A.et al., “FixMatch: Simplifying semi-supervised learning with consistency and confidence,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2020, pp. 596–608.Google Scholar
[4] Jeong J., Lee S., Kim J., and Kwak N., “Consistency-based semi-supervised learning for object detection,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2019, pp. 10 758–10 767.Google Scholar
[5] Mittal S., Tatarchenko M., and Brox T., “Semi-supervised semantic segmentation with high- and low-level consistency,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 4, pp. 1369–1379, Apr. 2021.Google Scholar
[6] Wang Y., Khan S., Gonzalez-Garcia A., van de Weijer J., and Khan F. S., “Semi-supervised learning for few-shot image-to-image translation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 4452–4461.Google Scholar
[7] Li X., Makihara Y., Xu C., Yagi Y., and Ren M., “Gait recognition via semi-supervised disentangled representation learning to identity and covariate features,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 13 306–13 316.Google Scholar
[8] Tarvainen A. and Valpola H., “Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2017, pp. 1195–1204.Google Scholar
[9] Miyato T., Maeda S., Koyama M., and Ishii S., “Virtual adversarial training: A regularization method for supervised and semi-supervised learning,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 8, pp. 1979–1993, Aug. 2019.Google ScholarCross Ref
[10] Berthelot D., Carlini N., Goodfellow I. J., Papernot N., Oliver A., and Raffel C., “MixMatch: A holistic approach to semi-supervised learning,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2019, pp. 5050–5060.Google Scholar
[11] Xie Q., Dai Z., Hovy E., Luong M.-T., and Le Q. V., “Unsupervised data augmentation for consistency training,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2020, pp. 6256–6268.Google Scholar
[12] Berthelot D.et al., “ReMixMatch: Semi-supervised learning with distribution matching and augmentation anchoring,” in Proc. Int. Conf. Learn. Representations, 2020, pp. 1–13.Google Scholar
[13] Laine S. and Aila T., “Temporal ensembling for semi-supervised learning,” in Proc. Int. Conf. Learn. Representations, 2017, pp. 1–13.Google Scholar
[14] Wei C., Shen K., Chen Y., and Ma T., “Theoretical analysis of self-training with deep networks on unlabeled data,” in Proc. Int. Conf. Learn. Representations, 2021, pp. 1–30.Google Scholar
[15] Zhai R.et al., “Adversarially robust generalization just requires more unlabeled data,” 2019, arXiv:1906.00555.Google Scholar
[16] van Engelen J. E. and Hoos H. H., “A survey on semi-supervised learning,” Mach. Learn., vol. 109, no. 2, pp. 373–440, 2020.Google ScholarCross Ref
[17] Chapelle O. and Zien A., “Semi-supervised classification by low density separation,” in Proc. Int. Workshop Artif. Intell. Statist., 2005, pp. 57–64.Google Scholar
[18] Lee D.-H., “Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks,” in Proc. Int. Conf. Mach. Learn. Workshop, 2013, pp. 1–6.Google Scholar
[19] Rasmus A., Berglund M., Honkala M., Valpola H., and Raiko T., “Semi-supervised learning with ladder networks,” in Proc. Int. Conf. Neural Inf. Process., 2015, pp. 3546–3554.Google Scholar
[20] Sajjadi M., Javanmardi M., and Tasdizen T., “Regularization with stochastic transformations and perturbations for deep semi-supervised learning,” in Proc. Int. Conf. Neural Inf. Process., 2016, pp. 1163–1171.Google Scholar
[21] Grandvalet Y. and Bengio Y., “Semi-supervised learning by entropy minimization,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2004, pp. 529–536.Google Scholar
[22] Verma V., Lamb A., Kannala J., Bengio Y., and Lopez-Paz D., “Interpolation consistency training for semi-supervised learning,” in Proc. Int. Joint Conf. Artif. Intell., 2019, pp. 3635–3641.Google Scholar
[23] Zhang H., Cissé M., Dauphin Y. N., and Lopez-Paz D., “Mixup: Beyond empirical risk minimization,” in Proc. Int. Conf. Learn. Representations, 2018, pp. 1–13.Google Scholar
[24] Cubuk E. D., Zoph B., Shlens J., and Le Q. V., “Randaugment: Practical automated data augmentation with a reduced search space,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops, 2020, pp. 3008–3017.Google Scholar
[25] Ren Z., Yeh R. A., and Schwing A. G., “Not all unlabeled data are equal: Learning to weight data in semi-supervised learning,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2020, pp. 21 786–21 797.Google Scholar
[26] Chen T., Kornblith S., Swersky K., Norouzi M., and Hinton G. E., “Big self-supervised models are strong semi-supervised learners,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2020, pp. 22 243–22 255.Google Scholar
[27] Beyer L., Zhai X., Oliver A., and Kolesnikov A., “S4L: Self-supervised semi-supervised learning,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 1476–1485.Google Scholar
[28] Chen J., Shah V., and Kyrillidis A., “Negative sampling in semi-supervised learning,” in Proc. Int. Conf. Mach. Learn., 2020, pp. 1704–1714.Google Scholar
[29] Zhang L. and Qi G., “WCP: Worst-case perturbations for semi-supervised deep learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 3911–3920.Google Scholar
[30] Yarowsky D., “Unsupervised word sense disambiguation rivaling supervised methods,” in Proc. Annu. Meeting Assoc. Comput. Linguistics, 1995, pp. 189–196.Google Scholar
[31] Rosenberg C., Hebert M., and Schneiderman H., “Semi-supervised self-training of object detection models,” in Proc. IEEE Workshop Appl. Comput. Vis., 2005, pp. 29–36.Google Scholar
[32] Reichart R. and Rappoport A., “Self-training for enhancement and domain adaptation of statistical parsers trained on small datasets,” in Proc. Annu. Meeting Assoc. Comput. Linguistics, 2007, pp. 616–623.Google Scholar
[33] Xie Q., Luong M., Hovy E. H., and Le Q. V., “Self-training with noisy student improves ImageNet classification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 10 684–10 695.Google Scholar
[34] Iscen A., Tolias G., Avrithis Y., and Chum O., “Label propagation for deep semi-supervised learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 5070–5079.Google Scholar
[35] Li S., Liu B., Chen D., Chu Q., Yuan L., and Yu N., “Density-aware graph for deep semi-supervised visual recognition,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 13 397–13 406.Google Scholar
[36] Chen P., Ma T., Qin X., Xu W., and Zhou S., “Data-efficient semi-supervised learning by reliable edge mining,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 9189–9198.Google Scholar
[37] Cascante-Bonilla P., Tan F., Qi Y., and Ordonez V., “Curriculum labeling: Revisiting pseudo-labeling for semi-supervised learning,” in Proc. AAAI Conf. Artif. Intell., 2021, pp. 6912–6920.Google Scholar
[38] Han T., Gao J., Yuan Y., and Wang Q., “Unsupervised semantic aggregation and deformable template matching for semi-supervised learning,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2020, pp. 9972–9982.Google Scholar
[39] Xu Y.et al., “Dash: Semi-supervised learning with dynamic thresholding,” in Proc. Int. Conf. Mach. Learn., 2021, pp. 11 525–11 536.Google Scholar
[40] Tai K. S., Bailis P., and Valiant G., “Sinkhorn label allocation: Semi-supervised classification via annealed self-training,” in Proc. Int. Conf. Mach. Learn., 2021, pp. 10 065–10 075.Google Scholar
[41] Gong C., Wang D., and Liu Q., “AlphaMatch: Improving consistency for semi-supervised learning with alpha-divergence,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 13 683–13 692.Google Scholar
[42] Pham H., Dai Z., Xie Q., and Le Q. V., “Meta pseudo labels,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021, pp. 11 557–11 568.Google Scholar
[43] Zhou X. and Belkin M., “Semi-supervised learning,” in Academic Press Library in Signal Processing, vol. 1. Amsterdam, The Netherlands: Elsevier, 2014, pp. 1239–1269.Google Scholar
[44] Mohri M., Rostamizadeh A., and Talwalkar A., Foundations of Machine Learning. Cambridge, MA, USA: MIT Press, 2012.Google ScholarDigital Library
[45] Golowich N., Rakhlin A., and Shamir O., “Size-independent sample complexity of neural networks,” in Proc. Conf. Learn. Theory, 2018, pp. 297–299.Google Scholar
[46] Wei C. and Ma T., “Data-dependent sample complexity of deep neural networks via Lipschitz augmentation,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2019, pp. 9722–9733.Google Scholar
[47] Goodfellow I. J.et al., “Generative adversarial nets,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2014, pp. 2672–2680.Google Scholar
[48] Arjovsky M., Chintala S., and Bottou L., “Wasserstein generative adversarial networks,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 214–223.Google Scholar
[49] Arora S., Ge R., Liang Y., Ma T., and Zhang Y., “Generalization and equilibrium in generative adversarial nets (GANs),” in Proc. Int. Conf. Mach. Learn., 2017, pp. 224–232.Google Scholar
[50] Osborne M. J.et al., An Introduction to Game Theory, vol. 3. New York, NY, USA: Oxford Univ. Press, 2004.Google Scholar
[51] Jin C., Netrapalli P., and Jordan M. I., “What is local optimality in nonconvex-nonconcave minimax optimization?,” in Proc. Int. Conf. Mach. Learn., 2020, pp. 4880–4889.Google Scholar
[52] Rafique H., Liu M., Lin Q., and Yang T., “Weakly-convex–concave min–max optimization: Provable algorithms and applications in machine learning,” Optim. Methods Softw., pp. 1–35, 2021.Google Scholar
[53] Boyd S., Boyd S. P., and Vandenberghe L., Convex Optimization. Cambridge, U.K.: Cambridge Univ. Press, 2004.Google ScholarDigital Library
[54] Krizhevsky A. and Hinton G., “Learning multiple layers of features from tiny images,” Master's thesis, Dept. Comput. Sci., Univ. Toronto, Toronto, ON, Canada, 2009.Google Scholar
[55] Netzer Y., Wang T., Coates A., Bissacco A., Wu B., and Ng A. Y., “Reading digits in natural images with unsupervised feature learning,” in Proc. NIPS Workshop Deep Learn. Unsupervised Feature Learn., 2011, pp. 1–9.Google Scholar
[56] Coates A., Ng A. Y., and Lee H., “An analysis of single-layer networks in unsupervised feature learning,” in Proc. Int. Conf. Artif. Intell. Statist., 2011, pp. 215–223.Google Scholar
[57] Zagoruyko S. and Komodakis N., “Wide residual networks,” in Proc. Brit. Mach. Vis. Conf., 2016, pp. 87.1–87.12.Google Scholar
[58] van der Maaten L. and Hinton G., “Visualizing data using t-SNE,” J. Mach. Learn. Res., vol. 9, no. 86, pp. 2579–2605, 2008.Google Scholar
[59] Chen X., Fan H., Girshick R., and He K., “Improved baselines with momentum contrastive learning,” 2020, arXiv:2003.04297.Google Scholar
[60] Cai Z., Ravichandran A., Maji S., Fowlkes C. C., Tu Z., and Soatto S., “Exponential moving average normalization for self-supervised and semi-supervised learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 194–203.Google Scholar
[61] He K., Zhang X., Ren S., and Sun J., “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778.Google Scholar
[62] Nesterov Y., Introductory Lectures on Convex Optimization: A Basic Course, vol. 87. Berlin, Germany: Springer, 2003.Google Scholar
[63] Long P. M. and Sedghi H., “Generalization bounds for deep convolutional neural networks,” in Proc. Int. Conf. Learn. Representations, 2020, pp. 1–15.Google Scholar
[64] Yang Z., Xu Q., Bao S., Cao X., and Huang Q., “Learning with multiclass AUC: Theory and algorithms,” IEEE Trans. Pattern Anal. Mach. Intell., early access, Jul. 30, 2021, doi: 10.1109/TPAMI.2021.3101125.Google Scholar
[65] Reeve H. W. J. and Kabán A., “Optimistic bounds for multi-output learning,” in Proc. Int. Conf. Mach. Learn., 2020, pp. 8030–8040.Google Scholar

Cited By

View all

Recommendations

Semi-supervised partial multi-label classification via consistency learning
Highlights
- We solve the inconsistency of the distributions in features and labels and acquire the label level instance correlation via HSIC for partial multi-label ...
Abstract
Partial multi-label learning refers to the problem that each instance is associated with a candidate label set involving both relevant and noisy labels. Existing solutions mainly focus on label disambiguation, while ignoring the ...
Read More
Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
In multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...
Read More
Multiview Semi-Supervised Learning with Consensus
Obtaining high-quality and up-to-date labeled data can be difficult in many real-world machine learning applications. Semi-supervised learning aims to improve the performance of a classifier trained with limited number of labeled data by utilizing the ...
Read More

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Information
Contributors

Published in

IEEE Transactions on Pattern Analysis and Machine Intelligence Volume 45, Issue 5
May 2023
1242 pages
ISSN:0162-8828
Issue’s Table of Contents

0162-8828 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.
Sponsors
In-Cooperation
Publisher
IEEE Computer Society
United States
Publication History
- Published: 21 September 2022
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Bibliometrics
Citations0

Article Metrics
- Total Citations
  View Citations
- Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

Digital Edition

View this article in digital edition.

View Digital Edition

Figures
Other

Close Figure Viewer

Browse AllReturnChange zoom level

Caption

View Issue’s Table of Contents

MaxMatch: Semi-Supervised Learning With Worst-Case Consistency (2024)

New Citation Alert added!

New Citation Alert!

IEEE Transactions on Pattern Analysis and Machine Intelligence

Abstract

References

Cited By

Recommendations

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

Digital Edition

Caption

Export Citations

References