About BukuWarung
BukuWarung's vision is to empower 60mn MSMEs in Indonesia to become financially aware and enable them to manage and grow their business using our technology platform from bookkeeping and digital payments to AI-driven merchant operations.
As part of our next growth phase, we are expanding our
AI Platform and Operations
function to build scalable, intelligent systems that accelerate product development, automate operations, and make infrastructure self-optimizing.
We are looking for an
AI Engineer (AI Operations Engineer)
who can bridge the gap between
AI product development
and
infrastructure management
designing, deploying, and maintaining AI systems that power both internal tools and production workloads.
Key Responsibilities
- AI Product Development
- Design, train, and deploy machine learning or LLM-based models that solve core operational and product problems (e.g., anomaly detection, classification, forecasting, and conversational AI).
- Build modular APIs and microservices for inference, data processing, and automation.
- Collaborate with product teams to prototype, test, and iterate on AI-first user experiences.
- Convert experimental notebooks into production-grade pipelines and scalable services.
- AI Infrastructure & Reliability
- Design and maintain scalable ML infrastructure across training, deployment, and monitoring workflows.
- Build CI/CD pipelines for model delivery, manage containerized inference systems, and ensure production reliability.
- Implement observability for AI models tracking drift, latency, performance, and cost.
- Collaborate with DevOps and platform engineering to optimize compute utilization, GPU scheduling, and cost management.
- Automation & AIOps
- Automate workflows for model retraining, deployment, and validation.
- Build systems for intelligent alerting, anomaly detection, and auto-remediation of AI services.
- Integrate AI pipelines into existing DevOps and monitoring tools for proactive issue management.
- Data Pipeline & Tooling
- Develop robust data ingestion and processing pipelines (structured/unstructured).
- Manage feature stores, vector databases, and embeddings pipelines for retrieval-augmented generation (RAG) systems.
- Build internal developer tools and utilities for faster experimentation and monitoring.
- Collaboration & Governance
- Partner closely with AI researchers, backend engineers, and product managers to translate business needs into reliable AI systems.
- Contribute to MLOps best practices, documentation, and standardization.
- Ensure compliance with BukuWarung's data security, audit, and ethical AI frameworks.
Qualifications
- Bachelor's or Master's degree in Computer Science, Engineering, or related field
- 5+ years of hands-on experience in backend or ML engineering roles
- Strong programming skills in Python (FastAPI, Flask) and familiarity with microservice design
- Experience deploying and monitoring ML/LLM workloads in production (batch and real-time)
Proficiency with:
ML/AI frameworks (PyTorch, TensorFlow, Hugging Face, LangChain)
- Infrastructure tools (Docker, Kubernetes, Terraform, Airflow)
- Cloud platforms (GCP, AWS, or Azure)
Observability stack (Prometheus, Grafana, ELK, OpenTelemetry
Experience managing GPU-based workloads and cost optimization
- Excellent problem-solving, debugging, and automation skills
- Familiarity with vector databases (Pinecone, Weaviate, FAISS) and RAG pipeline architecture
Preferred Experience
- Built and deployed AI-powered automation systems or developer tools
- Experience with LLM fine-tuning, embedding generation, or prompt engineering
- Exposure to distributed systems and scalable API design
- Understanding of data governance, security, and compliance in AI workflows
- Previous experience in fintech, SaaS, or infrastructure-heavy products