Work | uzpg

I am a cofounder of Fulcrum Research, a lab studying how to oversee large systems of agents operating with limited human feedback. See here for our open source releases and here for our writing.

More coming soon!

Research

I am interested in the principles and conditions under which intelligence emerges. I studied this question within the field of science of deep learning, both by developing new measurements for the capabilities of AI systems, and through the analysis and synthesis of phenomena within them.

Humans are to me the most intelligent systems in the world. Even if or when AIs surpass human intelligence, it seems by default this will route through the massive boostrapping and reconstruction of the human civilizational corpus. I am interested in how AI systems can learn rich notions of value from human supervision.

As an undergrad, I was fortunate to work with many researchers at MIT CSAIL. In the Jacob Andreas lab, I studied how to optimize language models for direct collaboration humans, via RL, and also did research on long horizon software evals. I also worked in the Isola lab on in context learning and inner optimization in transformers, and before that in the Tegmark and Solar-Lezama labs.

Papers:

Breakpoint: Scalable evaluation of system-level reasoning in LLM code agents. Kaivalya Hariharan*, Uzay Girit*, Atticus Wang, Jacob Andreas. CoLM, 2025. A methodology to synthetically generate software tasks of arbitrary difficulty, with a detailed analysis of agent behavior and static measures of difficulty.
The Quantization Model of Neural Scaling. Eric J. Michaud, Ziming Liu, Uzay Girit, Max Tegmark. NeurIPS, 2023. Modeling capability emergence in terms of structure in the task distribution of language.
Agent psychometrics: Task-level performance prediction in agentic coding benchmarks. Chris Ge, Daria Kryvosheieva, Daniel Fried, Uzay Girit, Kaivalya Hariharan. ICLR 2026 Workshop on Agents in the Wild. Enriching psychometric modeling with task features to predict the difficulty of new evaluation tasks.
Lower Data Diversity Accelerates Training: Case Studies in Synthetic Tasks. In Submission. Suhas Kotha*, Uzay Girit*, Tanishq Kumar*, Gauran Ghosal, Aditi Raghunathan. Investigating an ICL generalization phenomenon where memorization acts as a pathway to generalization.
Between the Bars: Gradient-based Jailbreaks are Bugs that induce Features. Kaivalya Hariharan*, Uzay Girit*. Accepted to NeurIPS ATTRIB 2024, Redteaming Adversaries. An analysis of structure and patterns in language model adversaries.

Past

In highschool, I was already very interested in making tools to augment human cognition, and made several popular open source tools, like Archivy, a self hosted knowledge management system, and Espial, a tool that used embeddings and heuristics to automatically discover connections between ideas.

In summer 2023 I worked at Dust under Stanislas Polu, back when there were only 4 people at the company.

I was also one of the founding members and the main developer of AdiosCorona, a general resource on COVID guidelines which delivered information to millions of people during the pandemic.

See GitHub for more.