About the job: Site Reliability Engineer (SRE) Lead – Data Center Operations
VIDA Digital Identity is Indonesia's leading provider of
digital identity verification, digital signature, and trust services
, serving enterprises and government institutions with high standards of
security, compliance, and reliability
.
We are seeking an experienced
Site Reliability Engineering (SRE) Lead
to drive the
reliability, scalability, and operational excellence
of VIDA's core infrastructure — across both
data centers and cloud environments
.
The ideal candidate will have deep expertise in
data center operations
,
infrastructure reliability
, and
automation
, with strong experience in
regulated SaaS environments
.
Responsibilities
1. Site Reliability & Infrastructure Management
- Lead the SRE function to maintain high availability and performance across all environments.
- Manage robust, scalable, and secure infrastructure supporting VIDA's digital identity and trust platforms.
- Establish monitoring, alerting, and incident response systems to proactively detect and mitigate service disruptions.
- Drive automation in deployment, scaling, and recovery processes to reduce manual effort.
2. Data Center Operations
- Oversee VIDA's physical and hybrid data center operations, ensuring performance, security, and uptime SLAs.
- Collaborate with network engineers, cloud architects, and system admins to maintain seamless connectivity and integration.
- Establish and maintain Disaster Recovery (DR) and Business Continuity Plans (BCP) aligned service obligations.
3. Reliability Engineering & Continuous Improvement
- Build and maintain observability frameworks for system health and performance monitoring.
- Conduct root cause analyses (RCA) for incidents and implement corrective actions.
- Partner with development teams to embed reliability and performance improvements into the software delivery process.
4. Leadership & Team Development
- Lead and mentor a team of SREs and infrastructure engineers.
- Collaborate cross-functionally with Engineering, Security, Compliance, and Product teams.
- Establish and maintain documentation and standard operating procedures (SOPs) for infrastructure management.
Qualifications & Experience
Must Have:
- Bachelor's degree in Computer Science, Information Systems, or a related technical field.
- 8+ years of experience in SRE, Infrastructure, or DevOps roles — with at least 3 years in a leadership position.
- Strong technical expertise in
data center operations
,
networking
,
load balancing
,
storage systems
, and
server infrastructure
. - Strong knowledge of networking (
TCP/IP, BGP routing, switching, VLANs, firewalls, VPNs, Transit IP
). - Experience managing hybrid infrastructure environments (on-premise and cloud).
- Experience with
Linux systems administration, containerization (Docker/Kubernetes)
, and
Infrastructure as Code (Terraform, Ansible)
.
Preferred:
- Experience in
SaaS or regulated industries
Familiarity with
cryptographic systems, PKI, and HSM management
.