1. Role Overview
Mercor is collaborating with a leading AI research team to advance DeepResearch-2-App pipelines that simulate real-world code generation tasks. We’re seeking senior-level software engineers to serve as independent evaluators and supervisors in this process. You’ll help assess and refine AI-generated code across a wide range of domain-specific scenarios, with a focus on feasibility, functionality, and test coverage. This is a part-time, project-based contract ideal for highly experienced engineers looking to contribute to cutting-edge AI evaluation.
2. Key Responsibilities
• Review domain-generated prompts and assess their feasibility from a coding perspective
• Supervise model outputs and validate Docker file execution
• Design and implement 40–60 unit tests per evaluation set
• Review peer-generated unit tests for completeness and robustness
• Execute unit tests and confirm code performance and reliability
3. Ideal Qualifications
• 6+ years of professional software engineering experience
• Deep specialization in backend or full-stack development, with testing and evaluation experience
• Strong ability to assess technical feasibility and debug complex systems
• Experience with Docker and automated testing frameworks
• Detail-oriented mindset and ability to provide structured technical feedback
4. More About the Opportunity
• Remote and asynchronous — set your own schedule
• Estimated workload: ~20 hours per week
• Project-based contract, with ongoing need for evaluations
5. Compensation & Contract Terms
• $120/hour for all services rendered
• Paid weekly via Stripe Connect
• You’ll be classified as an independent contractor
6. Application Process
• Submit your resume to get started
• Complete a brief form to detail your technical expertise
• If selected, you’ll receive onboarding materials and sample tasks
We consider all qualified applicants without regard to legally protected characteristics and provide reasonable accommodations upon request.
Mercor partners with leading AI labs and enterprises to train frontier models using human expertise. You will work on projects that focus on training and enhancing AI systems. You will be paid competitively, collaborate with leading researchers, and help shape the next generation of AI systems in your area of expertise.