Job Summary: In this role, you'll apply your expertise to help train next-generation AI systems. Your work will shape how models learn, reason, and perform through high-quality, real-world input.
Key Responsibilities:
- Design and implement self-contained evaluation tasks, including prompts, supporting files, and detailed grading rubrics to assess AI performance on practical computer-based workflows.
- Define clear, unambiguous written criteria for what constitutes successful and unsuccessful task completion across diverse administrative and workflow scenarios.
- Meticulously observe and document AI agent behaviors, producing crisp, precise summaries and reports in high-quality English.
- Iterate and refine evaluation tasks and rubrics based on feedback and team collaboration to ensure robust benchmarking methodologies.
- Work cross-functionally across a wide range of domains, adapting evaluation frameworks as project requirements evolve.
- Collaborate with the customer's team to share insights and help drive continuous improvement in AI evaluation techniques.
- Champion meticulousness, structured observation, and clear written communication throughout all project deliverables.
Required Skills and Qualifications:
- Minimum 3 years of experience in roles emphasizing written precision and structured thinking—such as paralegal, executive assistant, junior analyst, librarian, document archival specialist, research assistant, technical writer, QA analyst, etc.
- Native or fluent in English writing, with a demonstrated ability to produce observations that are succinct, specific, and unambiguous.
- Proven skill in designing or applying rubric-based evaluation, grading against set criteria, or building structured scoring frameworks.
- High attention to detail and ability to notice subtle patterns or inconsistencies others might miss.
- Exceptional written and verbal communication skills, especially for documenting nuanced observations and feedback.
- Fluency in navigating computers, common SaaS tools, web browsers, file management, and document editing platforms.
- Strong self-direction, with the ability to independently take ownership of ambiguous or loosely defined projects.
Preferred Qualifications:
- Prior experience evaluating AI outputs or participating in technology-driven process improvement projects.
- Background in developing or refining evaluation rubrics or scoring methodologies.
- Comfort working across multiple domains and adapting quickly to new types of workflow challenges.