About the Role
We are hiring a Senior AI Engineer who builds production-grade AI products end-to-end. You will design and ship AI agents, Retrieval-Augmented Generation (RAG) systems, and fine-tuned small language models, while also owning the full-stack delivery from React/Vue/Angular frontends through Python/Node backends to AWS, GCP and Azure deployments.
Equally important: you are an AI-adopted engineer. You use Claude Code, Cursor, Codex, and other AI coding assistants as a daily multiplier, and you know how to use them well — managing context, controlling token spend, writing CLAUDE.md / AGENTS.md files, using subagents and MCP servers, and applying evaluation-driven workflows so that AI-generated code is shipped responsibly.
What You Will Do
- Design, build and deploy AI agents using LangChain, LangGraph, LlamaIndex, CrewAI or equivalent frameworks — including multi-agent orchestration, tool use, memory, and planning loops.
- Architect RAG pipelines end-to-end: ingestion, chunking, embedding selection, vector stores (Pinecone / Weaviate / Qdrant / pgvector), hybrid search, re-ranking, query rewriting, and evaluation.
- Fine-tune small and open-source language models (Llama, Mistral, Phi, Gemma, Qwen) using LoRA, QLoRA, PEFT, instruction tuning and DPO — and decide when fine-tuning is the right answer versus prompting or RAG.
- Build full-stack AI applications: React/Next.js frontends with streaming UIs (Vercel AI SDK / SSE / WebSockets), FastAPI or Node backends, and well-designed APIs.
- Own deployment, scaling and observability on AWS (Bedrock, SageMaker, Lambda, ECS/EKS) and GCP (Vertex AI, Cloud Run, GKE), with Docker, Kubernetes, Terraform and CI/CD.
- Implement LLM observability and evals using LangSmith, Langfuse, RAGAS, DeepEval — and treat evaluation as a first-class engineering artifact, not an afterthought.
- Apply AI coding assistants (Claude Code, Cursor, Codex, Windsurf, Copilot) as a daily tool with strong discipline around context management, token efficiency, subagents, hooks, slash commands, and MCP servers.
- Address non-functional requirements: latency budgets, cost/token economics, prompt injection defense, PII handling, OWASP LLM Top 10, rate limiting, semantic caching, and graceful degradation.
- Collaborate with product, design and business stakeholders to translate ambiguous problems into shippable AI solutions, and mentor mid-level engineers on AI engineering practices.
Must-Have Skills
AI / GenAI Engineering
- 4+ years of software engineering and at least 2 years of hands-on production work with LLMs (OpenAI, Anthropic Claude, Gemini, or open-source).
- Strong RAG experience: chunking strategies, embedding models, vector databases, hybrid search, re-ranking, evaluation, and avoiding common failure modes.
- Production experience building AI agents with LangChain and LangGraph (or LlamaIndex, CrewAI, AutoGen, Pydantic AI). Comfortable with tool/function calling, structured outputs, agent memory and multi-agent patterns.
- Experience fine-tuning small/open-source models (LoRA, QLoRA, PEFT) and using Hugging Face Transformers, Datasets, Accelerate, and the Hub.
- Strong prompt engineering: system design, few-shot, chain-of-thought, prompt caching, structured output schemas, evaluation of prompts as code.
AI-Augmented Development
- Daily, production-grade use of Claude Code, Cursor, or Codex. Understands CLAUDE.md / AGENTS.md, project memory files, slash commands, subagents, hooks, MCP servers, and plan-vs-execute workflows.
- Deliberate token and context management: knows when to use Haiku vs Sonnet vs Opus (and equivalents on other providers), uses prompt caching, batches work, prunes context aggressively.
- Disciplined review of AI-generated code, with tests and evals — never ships unread output.
Full-Stack Engineering
- Backend: Python (FastAPI / Flask) and/or Node.js (TypeScript). Solid grasp of async patterns, streaming responses (SSE / WebSockets/ API).
- Frontend: React, Next.js, TypeScript, Tailwind CSS. Comfortable building streaming chat UIs and agentic interfaces.
- Databases: PostgreSQL, Redis, at least one vector DB. Familiar with schema design, indexing, and query optimization.
Non-Functional Engineering
- Latency: streaming, parallel tool calls, model routing, semantic caching, request batching.
- Cost: token accounting, model tiering (cheap-first), prompt caching, context compression, batch APIs.
- Security: prompt injection defense, output filtering, PII redaction, OWASP LLM Top 10, secrets management, least-privilege IAM.
- Reliability: retries, fallbacks across providers, rate limiting, queue-based decoupling, structured error handling.
Good to Have
- Anthropic Claude Code certification or Anthropic skill-based credentials.
- NVIDIA Generative AI / LLM certifications, DeepLearning.AI Specializations (LangChain, RAG, Agentic AI), or Hugging Face certifications.
- Experience with MCP (Model Context Protocol) — building or consuming MCP servers.
- Experience with GraphRAG, knowledge graphs (Neo4j), or hybrid symbolic/neural systems.
- On-device or edge inference (Ollama, llama.cpp, ONNX, TensorRT).
- Production deployments on AWS (Bedrock, SageMaker, Lambda, ECS/EKS, S3, IAM) and/or GCP (Vertex AI, Cloud Run, GKE).
- Docker, Kubernetes, CI/CD (GitHub Actions or GitLab CI), and IaC (Terraform or Pulumi).
- LLM observability and tracing: LangSmith, Langfuse, Weights & Biases, Arize, or equivalent. Evaluation harnesses in CI.