We are looking for a hands-on Cloud Operations Engineer to join our global cloud
infrastructure team. This role supports scalable and resilient SaaS operations across
AWS (primary), Azure, and GCP , with a strong emphasis on automation, observability,
security and cost-control.
The engineer will contribute to providing 24/7/365 support, ensuring continuous
operational coverage across multiple time zones. The ideal candidate is expected to
work independently during o>-peak hours and possess strong skills in infrastructure
troubleshooting, CI/CD processes, and a proactive approach to maintaining uptime,
meeting SLAs, and managing incident response.
Responsibilities
- Design, maintain, and operate multi-cloud infrastructure (AWS, Azure, GCP)
- Develop CI/CD pipelines using GitHub Actions, GitLab CI/CD, AWS CodeBuild
- Automate infrastructure provisioning (IAC) using Terraform, AWS CDK, CloudFormation
- Operate Kubernetes clusters (EKS, AKS, GKE) with Helm
- Implement observability and monitoring using Grafana, Datadog, Prometheus, CloudWatch
- Build dashboards for SLA, availability, performance and cost insights
- Automate deployments, blue/green releases, rollbacks, and patching workflows
- Enforce security best practices: IAM policies, VPC, WAF, encryption, audit logging
- Manage cloud IAM across AWS Organizations, Azure AD and GCP Cloud IAM
- Investigate and resolve production issues across distributed systems
- Collaborate with application developers to increase deployment maturity
- Propose and integrate cutting-edge technologies (e.g., serverless, event-driven)
Qualifications
- 4+ years of hands-on experience in AWS (primary), plus exposure to Azure and GCP
- Deep understanding of Kubernetes operations (EKS, AKS, GKE)
- Infrastructure-as-Code: Terraform, AWS CDK, or CloudFormation
- CI/CD pipeline development: GitHub Actions, GitLab CI/CD, AWS tools
- Observability tooling: Grafana, Prometheus, Datadog, native cloud logs
- Cloud-native networking, security hardening, IAM, and zero-trust architecture
- Strong Git practices for infrastructure versioning and collaboration
- Comfortable working solo during night shifts, handling on-call rotations
Nice-to-Have:
- Experience in the Media & Entertainment domain (e.g., DRM, watermarking, media streaming)
- Serverless/cloud-native frameworks: AWS Lambda, EventBridge, Kafka
- AI/ML platform familiarity: SageMaker, Azure ML, Vertex AI
- Hybrid network familiarity: AWS Direct Connect, Azure ExpressRoute, GCP Interconnect
Soft Skills
- Proactive, detail-oriented troubleshooting in live production environments
- Clear communicator especially during incidents and RCA documentation
- Strong self-discipline and consistency in solo/on-call hours
- High empathy and cultural sensitivity when working globally across teams and time zones
- Eagerness to experiment, learn, and improve with evolving DevOps trends
- Cultural fit with DevOps values: openness, automation, collaboration, and ownership
Certifications (Required/Preferred)
• Required: AWS Associate or Professional-level certification (Solutions Architect
/ DevOps Engineer)
• Preferred: Azure or GCP certifications (Fundamentals or Associate-level)
• Kubernetes CKA or CKAD certification is a strong plus