You build the infrastructure that other engineers use to ship agents. Tool registries, evals, observability, rate limiting, prompt versioning, model routing. The unglamorous work that makes the glamorous work possible.
What you'll do:
- Build and operate the agent runtime: tool framework, context management, session log capture
- Design the eval pipeline that gates every model/prompt change
- Wire up observability so we know what our agents are actually doing in production
- Make the developer experience good - fast iteration, clear errors, useful traces
What we're looking for:
- Strong backend fundamentals: Postgres, queues, gRPC/HTTP, observability stacks
- Experience with one or more LLM SDKs in production
- Opinionated about API design and developer experience
- Bonus: you've shipped an eval or agent platform that's used by other engineers