About the Role
The
Site Reliability Engineering (SRE)
team architects, builds, and maintains the
rock-solid infrastructure
that applications rely on. We work closely with development teams to ensure
scalability, reliability, and efficiency
. This collaboration empowers us to deliver
exceptional customer experiences
while enabling developers to focus on building great features.
What You Will Do:
- Deploy, automate, maintain, and manage various cloud-based and on-premises production systems.
- Understanding the high-level overview of our architecture, and possessing the ability to systematically document new and existing requirements to ensure a smooth project delivery without miscommunication.
- Work closely with the Information security and infrastructure team in ensuring that we are adopting security best practices.
- Ensuring the availability, performance, scalability, and security of productions systems.
- Troubleshoot and resolve system issues across platform and application domains.
- Suggest architectural improvements and recommend process optimizations.
- Evaluate new technologies to enhance the infrastructure stack.
- Ensuring system security policies are properly remediated.
- Drive and implement automated provisioning and scaling of servers, along with testing and compliance checks using automation tools.
- Handle operational tasks, including on-call duties, alerts, and incident management.
What We Are Looking For:
- Minimum 2 years of engineering experience.
- Bachelor's or Master's degree in a relevant field (e.g., IT, Computer Science) or a proven track record in DevOps.
- A strong willingness to continuously upgrade skills and stay up-to-date with the latest DevOps trends.
- Experience with cloud-native tools (e.g., Kubernetes, Docker, Nginx, OpenTelemetry) is a plus.
- Experience managing cloud servers (AWS, GCP).
- A desire to transition into engineering management is a valued addition.
- Experience with on-premises physical servers, databases, and storage solutions (MySQL, PostgreSQL, Redis) is a plus, as well as familiarity with Infrastructure as Code (IaC) tools (Terraform, Pulumi).