top of page

Research Engineer - Model Evaluations Tools

Oct 11, 2023

Last Updated

AI Auditing and Evaluation Roles


A software engineer specializing in AI model evaluations tools has a deep understanding of the needs of model evaluation and those of ML researchers. This position often involves close collaboration with ML researchers to define the scope and vision of tools needed to evaluate large AI models, including for detecting capabilities having potential to contribute to catastrophic risks. This role entails the identification of methods to integrate the issues surrounding the latest large models into evaluation tools, the performance of hands-on evaluations to pinpoint areas for evals improvement, the design of user-friendly interfaces for complex evaluations, and the application of data visualization techniques to simplify the understanding of intricate data. This role plays a pivotal part in enabling researchers to excel in their work and accelerate progress in the field of AI evaluation.

Example tasks

  • Develop effective visualizations that enable researchers to efficiently assess the progress of a model and its significant capabilities.

  • Build effective interfaces for tweaking complex language model prompts and composition rules.

  • Understand what model evaluation and machine learning researchers require and outline the tools needed to help them work faster and better.

  • Conduct assessments firsthand or closely observe researchers to pinpoint areas within their workflow where improved tools could accelerate their progress.

  • Data visualization.

Why we think this job is impactful

The role of software engineer (model evaluations tools) is crucial for mitigating AI risks, primarily due to the inadequacy of current AI evaluation standards in detecting and mitigating these risks effectively. “Making AI evals work” is thus of the utmost importance, and is urgent as AI capabilities continue to advance. This role revolves around pioneering research efforts aimed at establishing robust evaluation standards capable of identifying and addressing AI risks, including existential risks. A software engineer working on model evaluations tools often collaborates with ML researchers in order to define the essential scope of evaluation tools, and actively seeks innovative methods to safely incorporate powerful ML models into these tools.

How Successif can help

We have developed a way to assess potential candidate’s fitness for this role and collected sample interview questions that can be asked for this job. If you are passionate about mitigating the risks of transformative AI systems and believe you would be a good fit for this role, apply for our career services.

bottom of page