Skip to content
View Zeng-WH's full-sized avatar
πŸ”­
Researching
πŸ”­
Researching

Block or report Zeng-WH

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Zeng-WH/README.md

I am Weihao Zeng, an incoming PhD student supervised by Prof. Junxian He at the Hong Kong University of Science and Technology starting in the fall of 2025.

My main focus is on the post-training aspect of LLMs, specifically including:

  • Improving model reasoning capabilities using reinforcement learning (RL) / self-evolution techniques (SimpleRL, B-STaR)
  • Exploring efficient data engineering methods for post-training (Deita, Auto Evol-Instruct)
  • The application of LLMs in task-oriented dialogue systems (FutureTOD, Seen2UnSeen)

Feel free to email me for any form of academic cooperation: [email protected]

πŸ”₯ News

  • 2025-03: We introduce SimpleRL-Zoo, a deep investigation of zero RL training across diverse model families and sizes! SimpleRL-Zoo Twitter
  • 2025-01: Announce our latest effort on O/R-1 Style Model and Scalable Reinforcement Learning for LLM Reasoning! SimpleRL Twitter
  • 2025-01: πŸŽ‰πŸŽ‰ Our B-STaR have been accepted by ICLR 2025!
  • 2024-09: πŸŽ‰πŸŽ‰ Our Auto Evol-Instruct have been accepted by EMNLP 2024!
  • 2024-01: πŸŽ‰πŸŽ‰ Our Deita have been accepted by ICLR 2024!
  • 2023-05: πŸŽ‰πŸŽ‰ Two paper have been accepted by ACL 2023!

πŸ“ Publications

  1. SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild

    Weihao Zeng*, Yuzhen Huang*, Qian Liu, Wei Liu, Keqing He, Zejun Ma, Junxian He

    Preprint SimpleRL-Zoo Github

  2. 7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient

    Weihao Zeng*, Yuzhen Huang*, Wei Liu, Keqing He, Qian Liu, Zejun Ma, Junxian He

    Preprint SimpleRL Twitter Github

  3. B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

    Weihao Zeng*, Yuzhen Huang*, Lulu Zhao, Yijun Wang, Zifei Shan, Junxian He

    ICLR 2025 paper

  4. FutureTOD: Teaching Future Knowledge to Pre-trained Language Model for Task-Oriented Dialogue

    Weihao Zeng, Keqing He, Yejie Wang, Chen Zeng, Jingang Wang, Yunsen Xian, Weiran Xu

    ACL 2023 Main Conference paper

  5. Seen to Unseen: Exploring Compositional Generalization of Multi-Attribute Controllable Dialogue Generation

    Weihao Zeng, Lulu Zhao, Keqing He, Ruotong Geng, Jingang Wang, Wei Wu, Weiran Xu

    ACL 2023 Main Conference paper

  6. What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning

    Wei Liu*, Weihao Zeng*, Keqing He, Yong Jiang, Junxian He

    ICLR 2024 paper

  7. Automatic Instruction Evolving for Large Language Models

    Weihao Zeng, Can Xu, Yingxiu Zhao, Jian-Guang Lou, Weizhu Chen

    EMNLP 2024 paper

Full Publications on Google Scholar

πŸ”₯ Invited Talks

  • April 2025, Qingke Talk, SimpleRL-Zoo and B-STaR: Improving reasoning performance and efficiency through reinforcement learning
  • Mar 2025, Westlake University, SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild.
  • Feb 2025, Northwestern University, SimpleRL: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient.
  • Feb 2025 Tiktok, SimpleRL: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient.
  • Feb 2025, Huawei Noah's Ark Lab, SimpleRL: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient.

πŸŽ– Competitions and Awards

  • National Scholarship in China (2019/2023)
  • 2022-09: πŸ†πŸ† Achieved the 1st Award on SereTOD Challenge 2022 track 2, EMNLP 2022!
  • 2021-08: πŸ†πŸ† Achieved the 4th Award on SMP 2021 Conversational AI Challenge!
  • 2021-09: πŸ†πŸ† Achieved the 8th Place on CCIR 2021 Intelligent NLU Challenge!

Pinned Loading

  1. hkust-nlp/simpleRL-reason hkust-nlp/simpleRL-reason Public

    Simple RL training for reasoning

    Python 3.7k 277

  2. hkust-nlp/deita hkust-nlp/deita Public

    Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]

    Python 563 31

  3. hkust-nlp/B-STaR hkust-nlp/B-STaR Public

    B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

    Python 82 11

  4. Prompt-Tuning Prompt-Tuning Public

    Implementation of "The Power of Scale for Parameter-Efficient Prompt Tuning"

    Python 58 10