Machine Learning Engineer – Ops (MLE Ops) ensures the continuous operation, performance, and reliability of AI and machine learning models that have been deployed into production. This role manages the daily execution of ML pipelines, monitors model performance in the experiment and production layers, and supports retraining, version control, and issue resolution to maintain model health and value realization. As part of the AI CoE’s Run Layer, the MLE Ops is responsible for ensuring that all AI systems deployed through Vertex AI are stable, auditable, and continuously optimized. The role involves close collaboration with MLEs, Data Scientists, and FinOps Analysts to monitor costs, model drift, and performance degradation, ensuring AI remains an operationally reliable capability across all domains.
Responsibilities- Monitor and manage AI pipelines in Vertex AI; ensure job completion, version tracking, and failure recovery.
- Monitor deployed models for data drift, accuracy decay, and performance anomalies.
- Execute scheduled or trigger-based model retraining to sustain performance levels.
- Ensure continuous integration pipelines run smoothly; apply updates and patches as needed.
- Diagnose and resolve pipeline, model, or data‑related incidents in coordination with MLEs and Data Engineers.
- Maintain logs, change records, and performance reports for governance and transparency.
- Bachelor’s degree in Computer Science, Information Technology, or Data Engineering.
- 3–6 years of experience in machine learning operations (MLOps), DevOps, or data infrastructure management.
- Hands‑on experience with Vertex AI, Airflow, Docker, and CI/CD frameworks (GitLab CI, Jenkins, Cloud Build).
- Understanding of model monitoring, retraining workflows, and performance metrics tracking.
- Strong analytical and troubleshooting skills for debugging complex pipeline or environment issues.
- Detail‑oriented with good documentation habits and a focus on operational discipline.
- Seniority level: Not Applicable
- Employment type: Full‑time
- Job function: Analyst and Information Technology
- Industries: Telecommunications, Technology, Information and Media, and Internet Marketplace Platforms
- Location: South Jakarta, Jakarta, Indonesia
- Posted: 2 weeks ago