Hi, my name is Haoyue Bai. I am a Ph.D. candidate at the Computer Sciences Department, University of Wisconsin-Madison, fortunately advised by Prof. Robert Nowak. I am a graduate visiting researcher in UC Berkeley, working with Prof. Dawn Song and Dr. Yiyou Sun.
I am grateful to have worked with Prof. Sharon Li (UW-Madison), Dr. Wei Cheng (NEC Labs America), and Prof. Bolei Zhou (UCLA) during my graduate studies. I obtained my bachelor's degree at Zhejiang University and my master's degree at The Hong Kong University of Science and Technology, supervised by Prof. S.-H. Gary Chan.
My current research develops the theoretical and algorithmic foundations of reliable and trustworthy AI, with a focus on open-world robustness, data-efficient reliability, and safe reasoning in foundation models. A central theme of my work is leveraging post-deployment wild data, principled uncertainty, and provable objectives to ensure that AI systems can generalize, detect unknowns, and reason safely under distribution shift.
Out-of-distribution (OOD) learning and open-world robustness: Designing adaptive and interpretable learning principles that help ML models detect and generalize under distribution shifts, such as semantic and covariate shifts, correlation and diversity shifts.
Reliable algorithms with provable guarantees: Developing machine learning techniques with statistical and algorithmic guarantees to ensure reliable deployment of AI systems under real-world distribution shifts.
Safety and reliability of foundation models: Understanding the failure modes and boundaries of large language models (LLMs) and vision language models (VLMs) through systematic diagnostics, such as failure analysis in open-world reasoning, robustness under distribution shift, and generalization. Developing methods to strengthen their reliability and improve quality of learned representations and embeddings.
Data-efficient machine learning: Selecting the most informative data and signals (e.g., human or model feedback) to improve robustness, calibration, and coverage under tight annotation and compute budgets.
I am on the job market for 2025–2026. If you are interested in my research or background, please feel free to contact me.
Junior PhD/master/undergraduate students: If you would like to chat about life, career plan, graduate school applications, or research ideas related to AI/ML, feel free to email me to schedule a meeting. I will dedicate 30 mins every week, especially for students from underrepresented groups or whoever is in need.
Can reinforcement learning (RL) actually teach large language models new algorithms or just “sharpen” what’s already latent in the base model? We set out to test this directly, and the finding is clear: RL can discover new capabilities, but only when trained wisely.
Our approach strategically labels examples within a novel maximum disambiguation region, where the number of semantic and covariate OOD data roughly equalizes. By labeling within this region, we can maximally disambiguate the two types of OOD data, thereby maximizing the utility of the fixed labeling budget.
We introduce ALOE, a novel active learning algorithm for open-world environments designed to enhance model adaptation by incorporating new OOD classes via a two-stage approach.
Unknown Aware AI-Generated Content Detection Ellie Thieu,
Jifan Zhang,
Haoyue Bai †
under review, CVPR, 2025
We propose a learning framework for specific generator attribution that remains robust in the presence of unknowns or newly released generators.
Towards Text-Guided Attribute-Disentangled Multimodal Representation Learning Yibing Wei, Sudeep Katakol, Manuel Brack, Jinhong Lin,
Haoyue Bai †, Yu-Teng Li, Richard Zhang, Eli Shechtman, Hareesh Ravi, Ajinkya Kale
under review, CVPR, 2025
This work identify a core limitation of current multimodal embeddings and formulate Queryable Attribute Representation Extraction (QARE) to explicitly evaluate query sensitivity and attribute invariance.
we introduce CounTr, a novel end-to-end transformer approach for crowd counting and density estimation, which enables capture global context in every layer of the Transformer.
My name is Haoyue Bai.
Haoyue is pronounced “How-yweh”.
Bai is pronounced “bye”.
My first name 皓月 means “bright moon”: 皓 conveys brightness and whiteness; and 月 means "moon".
My last name 白 means “white”. Together, my name evokes the image of a white, luminous moon shining in the night sky.
In Chinese culture, the moon symbolizes connection, shared emotions, and reunion across distance.
A well-known line from Tang poetry says: “a bright moon shines across a thousand miles; we share this very moment.”
I like to think of my name as carrying that same spirit — gentle light and warmth that bring best wishes to those we care about, no matter how far apart we may be.