Volver

AI Engineer for LLM Ops & Evaluation (m/f/d)

CompraTica Empleos

EMP:Technology
Munich
Tiempo Completo
Remoto
0 vistas

Descripción

You'll join an early-stage, AI-native startup with a product that has already proven market fit.

We build cutting-edge AI solutions for Governance, Risk and Compliance (GRC) for enterprises around the world.

Our customers are auditors, risk managers, and compliance teams, which means evaluation rigor, auditability, and EU AI Act readiness aren't afterthoughts for us.

They're product.

Requisitos

Tasks As our AI Engineer for LLMOps & Evaluation, you'll own the LLMOps pipeline end-to-end and work directly alongside our founding team.

You will: Own the LLMOps pipeline: Evaluate infrastructure, prompt optimization loop, and the production integration that turns experiments into reliable customer-facing features Design evaluation strategy per output type: Decide when to use deterministic evals (exact match, schema validation, embeddings) vs.

LLM-as-judge, and build the rubrics, test datasets, and human-review loops that make the system trustworthy Drive prompt engineering and optimization across all LLM operations in the product: Moving from hand-tuned prompts to a measurable, iterative process Pick the right tool for each problem: Some things are LLM problems, some are embedding + classical NLP problems, some are deterministic logic Run the production side of AI features: Observability (Langfuse /LangSmith / similar), cost and latency engineering, incident response when an LLM feature degrades Build human-in-the-loop workflows: Review queues, feedback ingestion, labeling; so production signal feeds back into evals and prompt iteration Mentor our AI & Analytics Intern and contribute to how we build the AI team over time Requirements 3+ years of hands-on experience building and shipping ML/AI systems in production (we care more.

Acerca de

what you've shipped than years on a CV) Have shipped an LLM evaluation or prompt optimization pipeline, not just used LLMs in a project, but owned the loop Strong hands-on experience with LLM-as-judge, including its variance problems and ...

¿Te interesa? Aplicá ahora