Usman Anwar

I am a PhD student in Computational and Biological Learning lab at Cambridge University, UK. I am broadly interested in AI Safety and Alignment. I am supervised by David Kruger and funded by Open Phil AI Fellowship and Vitalik Buterin Fellowship on AI Safety.

Email / GitHub / Google Scholar / LinkedIn / CV

If you want to chat with me, please get in touch here.

Selected Publications

	Foundational Challenges in Assuring Alignment and Safety of Large Language Models Usman Anwar and 41 other authors Under Submission, 2024 arxiv / tweetprint / This 150+ pages long agenda identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose 200+, concrete research questions.
	Reward Model Ensembles Help Mitigate Overoptimization Thomas Coste, Usman Anwar, Robert Kirk, David Krueger Internation Conference on Learning Representations, 2024 arxiv / code / We show that using an ensmeble of reward models can be effective in mitigating overoptimization.
	Bayesian Methods for Constraint Inference in Reinforcement Learning Dimitris Papadimitriou, Usman Anwar, Daniel Brown Transactions on Machine Learning Research, 2022 paper / poster / We develop a Bayesian approach for learning constraints which provides several advantages as it can work with partial trajectories, is applicable in both stochastic and deterministic environments and due to its ability to provide a posterior distribution enables use of active learning for accurate learning of constraints.
	Inverse Constrained Reinforcement Learning Usman Anwar, Shehryar Malik, Alireza Aghasi, Ali Ahmed Internation Conference on Machine Learning, 2021 arxiv / video / code / poster / slides / We propose a framework for learning Markovian constraints from user demonstrations in high dimensional, continuous settings. We empirically show that constraints thus learned are general and transfer well to agents with different dynamics and morphologies.

Design and source code from Leonid Keselman's website

Usman Anwar

Selected Publications

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Reward Model Ensembles Help Mitigate Overoptimization

Bayesian Methods for Constraint Inference in Reinforcement Learning

Inverse Constrained Reinforcement Learning