About the job
VIDA Digital Identity is Indonesia's leading provider of digital identity verification, digital signature, and trust services, serving enterprises and government institutions with high standards of security, compliance, and reliability.
We are seeking an experienced Site Reliability Engineering (SRE) Lead to drive the reliability, scalability, and operational excellence of VIDA's core infrastructure — across both data centers and cloud environments.
The ideal candidate will have deep expertise in data center operations, infrastructure reliability, and automation, with strong experience in regulated SaaS environments.
Responsibilities
1. Site Reliability & Infrastructure Management
- Lead the SRE function to maintain high availability and performance across all environments.
- Manage robust, scalable, and secure infrastructure supporting VIDA's digital identity and trust platforms.
- Establish monitoring, alerting, and incident response systems to proactively detect and mitigate service disruptions.
- Drive automation in deployment, scaling, and recovery processes to reduce manual effort.
2. Data Center Operations
- Oversee VIDA's physical and hybrid data center operations, ensuring performance, security, and uptime SLAs.
- Collaborate with network engineers, cloud architects, and system admins to maintain seamless connectivity and integration.
- Establish and maintain Disaster Recovery (DR) and Business Continuity Plans (BCP) aligned service obligations.
3. Reliability Engineering & Continuous Improvement
- Build and maintain observability frameworks for system health and performance monitoring.
- Conduct root cause analyses (RCA) for incidents and implement corrective actions.
Partner with development teams to embed reliability and performance improvements into the software delivery process.
Leadership & Team Development
Lead and mentor a team of SREs and infrastructure engineers.
- Collaborate cross-functionally with Engineering, Security, Compliance, and Product teams.
- Establish and maintain documentation and standard operating procedures (SOPs) for infrastructure management.
Qualifications & Experience
Must Have:
- Bachelor's degree in Computer Science, Information Systems, or a related technical field.
- 8+ years of experience in SRE, Infrastructure, or DevOps roles — with at least 3 years in a leadership position.
- Strong technical expertise in data center operations, networking, load balancing, storage systems, and server infrastructure.
- Strong knowledge of networking (TCP/IP, BGP routing, switching, VLANs, firewalls, VPNs, Transit IP).
- Experience managing hybrid infrastructure environments (on-premise and cloud).
- Experience with Linux systems administration, containerization (Docker/Kubernetes), and Infrastructure as Code (Terraform, Ansible).
Preferred:
- Experience in SaaS or regulated industries
Familiarity with cryptographic systems, PKI, and HSM management.