Your New Company
My client’s nature of business is within the Ai Infrastructure sector providing PaaS / SaaS platforms.
Your New Role
Lead a team of system operation engineers to ensure smooth and reliable data center operations.
Manage daily operations of GPU clusters, ensure system health, uptime, and performance and ensure timely resolution of system issues and maintain SLA compliance.
Develop and enforce standard operating procedures (SOPs) for DC Ops and incident management; as well as optimize workflows, including hardware provisioning, monitoring, and scaling GPU resources.
Collaborate with cross-functional teams to support AI workload requirements
Manage the deployment, configuration, and optimization of GPU servers, network devices, and supporting infrastructure (e.g., CPU servers and storage).
What You’ll Need to Succeed
5+ years of experience in data center operations or system administration, with at least 2 years in a managerial role.
Extensive experience in data center operations and system administration, with strong expertise in server hardware, including GPU cards, CPU configurations, and storage solutions.
Understanding of Linux fundamentals and Kubernetes environments, familiar with monitoring tools (e.g., Prometheus, Grafana) and logging frameworks.
Strong experience with storage systems (NVMe, SAN, NAS), networking concepts, and protocols (e.g., TCP/IP, RDMA) is a plus.
Proficient in operating ticketing systems, troubleshooting CPU/GPU clusters, and possessing strong knowledge of GPU hardware (e.g., NVIDIA GPUs), server architecture, and storage solutions.
Knowledge of networking concepts (e.g., TCP/IP, VLANs, load balancing) and experience managing bare metal servers, GPU infrastructure, or high-performance computing systems is a bonus.
What You’ll Get in Return
In return for your dedication and hard work, you’ll be rewarded with:
What You Need to Do Now
If you think this is you, what are you waiting for? Hit "apply now" for more details or a confidential discussion. Please contact Julian Yew at Hays on +603-5870-5003or email
Julian.Yew@hays.com.my .
At Hays, we value diversity and are passionate about placing people in a role where they can flourish and succeed. We actively encourage people from diverse backgrounds to apply.