See all jobs
Mercor
Data Scientist
Posted 1 month ago

This description is a summary of our understanding of the job description. Click on ‘Apply’ button to find out more.

Role Description

We’re seeking a data-driven analyst to conduct comprehensive failure analysis on AI agent performance across finance-sector tasks. You’ll identify patterns, root causes, and systemic issues in our evaluation framework by analyzing task performance across multiple dimensions (task types, file types, criteria, etc.).


  • Statistical Failure Analysis

    : Identify patterns in AI agent failures across task components (prompts, rubrics, templates, file types, tags)

  • Root Cause Analysis

    : Determine whether failures stem from task design, rubric clarity, file complexity, or agent limitations

  • Dimension Analysis

    : Analyze performance variations across finance sub-domains, file types, and task categories

  • Reporting & Visualization

    : Create dashboards and reports highlighting failure clusters, edge cases, and improvement opportunities

  • Quality Framework

    : Recommend improvements to task design, rubric structure, and evaluation criteria based on statistical findings

  • Stakeholder Communication

    : Present insights to data labeling experts and technical teams

Qualifications


  • Statistical Expertise

    : Strong foundation in statistical analysis, hypothesis testing, and pattern recognition

  • Programming

    : Proficiency in Python (pandas, scipy, matplotlib/seaborn) or R for data analysis

  • Data Analysis

    : Experience with exploratory data analysis and creating actionable insights from complex datasets

  • AI/ML Familiarity

    : Understanding of LLM evaluation methods and quality metrics

  • Tools

    : Comfortable working with Excel, data visualization tools (Tableau/Looker), and SQL

Requirements

  • Experience with AI/ML model evaluation or quality assurance
  • Background in finance or willingness to learn finance domain concepts
  • Experience with multi-dimensional failure analysis
  • Familiarity with benchmark datasets and evaluation frameworks
  • 2-4 years of relevant experience
About The Company

Status

Accepting Applications

Apply

Similar jobs

New
The mission of Speechify is to make sure that reading is never a barrier to learning
Software Engineer, Data Infrastructure & Acquisition
The mission of Speechify is to make sure that reading is never a barrier to learning
Remote Latam's
New
KOSTAL Group
Junior Finance & TAX Specialist
KOSTAL Group
Remote Latam's
New
Mutual of Omaha Mortgage
Data Analyst
Mutual of Omaha Mortgage
Remote Latam's