MohammadHossein Rezaei

Email: mhrezaei@arizona.edu

I am a Machine Learning Research Engineer at Scale AI where I work on post-training and evaluation of LLMs. I worked on OnlineRubrics, an approach for post-training LLMs with evolving rubrics to improve alignment in tasks without verifiable ground-truth.

I earned a B.S. in Computer Science from the University of Arizona UArizona . I was a member of the Computational Language Understanding (CLU) Lab, advised by Eduardo Blanco, where I worked on making SLMs more robust against negation by further pre-training and paraphrasing in affirmative terms.

Previously, I was a research intern at Stanford University in the SALT Lab advised by Diyi Yang. There, I co-created EgoNormia, a benchmark for evaluating physical-social norm understanding in vision-language models.

news

Jun 12, 2026	New paper out: Rubric-Guided Self-Distillation: Post-Training Without Rubric Verifiers — verifier-free rubric post-training with no training-time judge calls.
May 13, 2026	New paper out: Reward Hacking in Rubric-Based Reinforcement Learning.
Jan 05, 2026	I moved to New York City to join Scale AI as a Machine Learning Research Engineer, Post-training.
Dec 19, 2025	I graduated Summa Cum Laude with a B.S. in Computer Science and a Minor in Mathematics. I delivered the keynote address at the College of Science Convocation Ceremony.
Dec 17, 2025	I was selected as the Overall Outstanding Senior for both the Computer Science Department and the College of Science at the University of Arizona.
Oct 09, 2025	Check out my internship project at Scale AI: Online Rubrics Elicitation from Pairwise Comparisons.
May 27, 2025	I joined Scale AI as a Research Intern, Post-training.

selected publications

Scale AI

Rubric-Guided Self-Distillation: Post-Training Without Rubric Verifiers

MohammadHossein Rezaei, Anas Mahmoud, Zihao Wang, Utkarsh Tyagi, Advait Gosai, Razvan-Gabriel Dumitru, Aakash Sabharwal, Bing Liu, and Yunzhong He

Jun 2026

arXiv Bib PDF

@misc{rezaei2026rgsd,
  title = {Rubric-Guided Self-Distillation: Post-Training Without Rubric Verifiers},
  author = {Rezaei, MohammadHossein and Mahmoud, Anas and Wang, Zihao and Tyagi, Utkarsh and Gosai, Advait and Dumitru, Razvan-Gabriel and Sabharwal, Aakash and Liu, Bing and He, Yunzhong},
  month = jun,
  year = {2026},
  eprint = {2606.12507},
  archiveprefix = {arXiv},
  primaryclass = {cs.LG},
  url = {https://arxiv.org/abs/2606.12507},
}

Scale AI

Reward Hacking in Rubric-Based Reinforcement Learning

Anas Mahmoud, MohammadHossein Rezaei, Zihao Wang, Anisha Gunjal, Bing Liu, and Yunzhong He

May 2026

arXiv Bib PDF

@misc{mahmoud2026rewardhacking,
  title = {Reward Hacking in Rubric-Based Reinforcement Learning},
  author = {Mahmoud, Anas and Rezaei, MohammadHossein and Wang, Zihao and Gunjal, Anisha and Liu, Bing and He, Yunzhong},
  month = may,
  year = {2026},
  eprint = {2605.12474},
  archiveprefix = {arXiv},
  primaryclass = {cs.CL},
  url = {https://arxiv.org/abs/2605.12474},
}

ICML

Online Rubrics Elicitation from Pairwise Comparisons

MohammadHossein Rezaei, Robert Vacareanu, Zihao Wang , Clinton Wang, Bing Liu, Yunzhong He, and Afra Feyza Akyürek

In Proceedings of the 43rd International Conference on Machine Learning (ICML), Jul 2026

To appear

arXiv Bib PDF Website

@inproceedings{rezaei2026onlinerubricselicitationpairwise,
  title = {Online Rubrics Elicitation from Pairwise Comparisons},
  author = {Rezaei, MohammadHossein and Vacareanu, Robert and Wang, Zihao and Wang, Clinton and Liu, Bing and He, Yunzhong and Akyürek, Afra Feyza},
  booktitle = {Proceedings of the 43rd International Conference on Machine Learning (ICML)},
  month = jul,
  year = {2026},
  address = {Seoul, South Korea},
  note = {To appear},
  eprint = {2510.07284},
  archiveprefix = {arXiv},
  primaryclass = {cs.CL},
  url = {https://arxiv.org/abs/2510.07284},
}

ACL (Findings)

EgoNormia: Benchmarking Physical-Social Norm Understanding

MohammadHossein Rezaei^*, Yicheng Fu^*, Phil Cuvin^*, Caleb Ziems, Yanzhe Zhang, Hao Zhu, and Diyi Yang

In Findings of the Association for Computational Linguistics: ACL 2025, Jul 2025

DOI arXiv Bib PDF Code Poster Slides Website

@inproceedings{rezaei-etal-2025-egonormia,
  title = {EgoNormia: Benchmarking Physical-Social Norm Understanding},
  author = {Rezaei, MohammadHossein and Fu, Yicheng and Cuvin, Phil and Ziems, Caleb and Zhang, Yanzhe and Zhu, Hao and Yang, Diyi},
  booktitle = {Findings of the Association for Computational Linguistics: ACL 2025},
  month = jul,
  year = {2025},
  address = {Vienna, Austria},
  url = {https://aclanthology.org/2025.findings-acl.985/},
  publisher = {Association for Computational Linguistics},
  doi = {10.18653/v1/2025.findings-acl.985},
  pages = {19256--19283},
  isbn = {979-8-89176-256-5},
}