Daniel Jenkins
AI Evaluation Specialist
Evaluating frontier AI systems at the intersection of human judgment and model alignment.
I specialize in evaluating large language models and AI agents for alignment, safety, and real-world reliability. My work focuses on RLHF evaluation, outcome-based rubric design, and agentic workflow testing — building the structured criteria and realistic scenarios that determine whether AI systems genuinely complete tasks or merely imitate completion.
My background in philosophy, formal logic, and physics shapes how I approach this work: making precise distinctions, constructing sound evaluation frameworks, and identifying the failure modes that matter in deployment.