FoJobPreviewBackLink:System Engineer / Bandung West
Responsibilities

Administration

  • Perform installation, uninstallation, and modification of software, patches, and system components.
  • Review and analyze monthly compliance reports (provided by customer) to create mitigation and resolution action plans.
  • Execute server OS patching, including remediation and rollback for failed patches, as well as patch deployment for critical Common Vulnerability and Exposure (CVE) issues.
  • Conduct server and hardware firmware upgrades as part of lifecycle management.

Problem Management

  • Isolate, diagnose, and troubleshoot system-related incidents.
  • Coordinate service incident management to ensure timely resolution and communication.
  • Raise and manage service requests on behalf of the customer when required.
  • Participate in root cause analysis (RCA) reviews and provide technical insights for preventive measures.
  • Review monthly system logs, identify anomalies, and highlight issues with supporting justification for Authority investigations where applicable.
Core Skills / Requirements
  • 4–5 years of experience in IT Operations, including OS upgrades, patching, and hardware/software lifecycle management.
  • Hands-on experience with NVIDIA Base Command Manager for GPU cluster administration.
  • Working knowledge of Kubernetes and NVIDIA GPU Operator.
  • Experience with NVIDIA AI Enterprise solutions.
  • Strong analytical, troubleshooting, and incident management skills.
  • Ability to deliver professional, structured reports and technical findings.