You live in the model layer. You know which open-weight model to reach for, how to fine-tune on a small corpus, how to build evals that catch regressions, and how to wire agents into production systems.
What you'll do:
- Build agentic systems that actually work - not demos, production
- Run evals on prompts, models, and pipelines
- Pick the right model for the job (open-weight or API)
- Optimize for cost, latency, and quality at the same time
What we're looking for:
- Deep familiarity with at least one major model family (Claude, GPT, Mistral, Llama)
- Hands-on with the agent stack: tool use, RAG, evals, vector search
- Comfort with Python and the ML/data tooling around it
- Public writing or open-source work in the AI space is a strong plus