About the Role
Build the evaluations and datasets that tell labs and institutions exactly how well a model performs on real financial reasoning.
A full-time role based at Qofi's New York office, working alongside operators from top investment banks and private equity firms and engineers from leading AI labs.
Qofi builds the highest-precision financial-reasoning data in the world, and the platform and deployments that put it to work inside institutions.
High ownership from day one, in a flat structure where ideas move from whiteboard to production without committees.
Base salary plus meaningful equity and full benefits. Range is a placeholder pending confirmation.
On-site at Qofi's New York office, alongside the team building the data engine.
Full-time. Rolling start, the team moves quickly once there's a fit.
What You'll Do
Design and maintain rigorous evaluations for finance reasoning, tool use, and agentic tasks.
Develop scoring rubrics and statistical methods that make model performance legible and trustworthy.
Partner with engineering to productionize evals and with experts to ground them in real workflows.
Investigate failure modes and report findings that shape both data and model strategy.
What We're Looking For
Graduate degree or equivalent experience in ML, statistics, CS, or a quantitative field.
Hands-on experience designing or running model evaluations and analyzing results rigorously.
Strong Python and data-analysis skills.
Care for measurement integrity, and healthy skepticism about metrics that look too good.
Hiring Process
Send a resume and a short note on why this role, about five minutes.
A conversation about your background, the role, and what you're looking for.
Technical and team-fit conversations with the people you'd work with directly.
A clear decision quickly, with an offer and a start date that works.