Qualifications:

3–5 years of experience as a Performance Engineer or Site Reliability Engineer (SRE) focused on capacity and performance.
Bachelor's degree in Computer Science, Information Systems, or related field.
Proficient in monitoring and observability tools (Prometheus, Grafana, ELK Stack, Datadog, CloudWatch, or Google Monitoring).
Strong analytical skills; proficient in SQL and Excel/Google Sheets.
Skilled in Python (Pandas, Matplotlib) or scripting languages (Python/Bash) for automation and analysis.
Solid understanding of system performance metrics, databases, and networking.
Hands-on experience with Azure Cloud, including auto-scaling, pricing models, and FinOps practices.
Knowledge of statistical modeling or machine learning for performance forecasting is a plus.
Familiarity with performance testing tools such as JMeter or Gatling is an advantage.

Job Description:

Ensure IT infrastructure and resources have sufficient capacity to meet current and future business needs efficiently.
Develop and implement capacity planning for hardware, software, and network.
Monitor and analyze system performance and utilization data.
Build forecasting models to predict future capacity needs.
Identify IT resource requirements and recommend optimization strategies.
Collaborate with Architecture and Development teams to model performance impacts of new features.
Maintain dashboards and reports on capacity and performance.
Provide cost optimization recommendations (e.g., rightsizing, reserved instances)