← All open roles
ResearchFull-timeNew York

Research Scientist, Evaluations

Build the evaluations and datasets that tell labs and institutions exactly how well a model performs on real financial reasoning.

EvalsBenchmarkingStatisticsPython
Apply for This Role

About the Role

Build the evaluations and datasets that tell labs and institutions exactly how well a model performs on real financial reasoning.

A full-time role based at Qofi's New York office, working alongside operators from top investment banks and private equity firms and engineers from leading AI labs.

Qofi builds the highest-precision financial-reasoning data in the world, and the platform and deployments that put it to work inside institutions.

High ownership from day one, in a flat structure where ideas move from whiteboard to production without committees.

Compensation
$200K-$320K + equity

Base salary plus meaningful equity and full benefits. Range is a placeholder pending confirmation.

Location
New York

On-site at Qofi's New York office, alongside the team building the data engine.

Team
Research

Full-time. Rolling start, the team moves quickly once there's a fit.

What You'll Do

Design and maintain rigorous evaluations for finance reasoning, tool use, and agentic tasks.

Develop scoring rubrics and statistical methods that make model performance legible and trustworthy.

Partner with engineering to productionize evals and with experts to ground them in real workflows.

Investigate failure modes and report findings that shape both data and model strategy.

What We're Looking For

Graduate degree or equivalent experience in ML, statistics, CS, or a quantitative field.

Hands-on experience designing or running model evaluations and analyzing results rigorously.

Strong Python and data-analysis skills.

Care for measurement integrity, and healthy skepticism about metrics that look too good.

Hiring Process

01
Apply

Send a resume and a short note on why this role, about five minutes.

02
Intro Call

A conversation about your background, the role, and what you're looking for.

03
Team Interviews

Technical and team-fit conversations with the people you'd work with directly.

04
Offer

A clear decision quickly, with an offer and a start date that works.

Build the Standard for Finance AI

A small team of operators and engineers building what hasn't been built, from New York.

Apply for This Role