We're looking for someone beyond prompt-tuning — someone who has built, deployed, and optimized real RAG systems and LLM pipelines at scale.
Meruya, West Jakarta
Full-time
Apply / refer
What You'll Do:
Lead the design and deployment of production-grade RAG and LLM systems
Optimize retrieval accuracy, context quality, and model performance at scale
Build robust data pipelines for ingestion, chunking, embedding, and indexing
Collaborate with data engineers, software teams, and product leads to integrate AI into core features
Set best practices for prompt engineering, evaluation, and monitoring
Mentor junior engineers and drive technical direction for GenAI projects
Who You Are:
4+ years in ML engineering, with 1.5+ years focused on LLMs and RAG systems
Deep hands-on experience with LangChain, LlamaIndex, vector databases (Pinecone, Weaviate, FAISS)
Strong in Python, PyTorch/TensorFlow, and MLOps tools (MLflow, Kubernetes, Docker)
Experience deploying models on AWS, GCP, or Azure or hybrid environments
Familiar with evaluation frameworks, A/B testing, and latency optimization
Bonus: Experience with fine-tuning, LoRA, or distillation for domain-specific performance
Practical mindset you care about reliability, cost, and impact, not just novelty
Perks & Benefits:
- Hybrid work mode (2 days WFH, 3 days WFO)
- Daily meals provided
- Health insurance coverage
- Clear career development path
- A collaborative and growth-driven team culture, etc