Head of Platform/AI Cluster Management - System Integrator (San Francisco) Job at Hamilton Barnes Associates Limited, San Francisco, CA

QWxsOWZXRWdsNGR0Y0lLSDJQY2JPVkhU
  • Hamilton Barnes Associates Limited
  • San Francisco, CA

Job Description

Ready to lead innovation at the intersection of platforms and artificial intelligence?

Join a pioneering technology company driving advancements in cloud, AI, and data-driven solutions across global markets. The organization is recognized for fostering innovation, scalability, and collaboration through cutting-edge platforms that empower enterprises to evolve intelligently.

The team is hiring a Head of Platform/AI Cluster Management to oversee the strategic development, integration, and optimization of AI and platform initiatives. The role will focus on leading cross-functional teams, enhancing performance and scalability, and aligning technology strategy with long-term business goals.

Shape the future of intelligent platforms and transformative innovation. Apply now!

Responsibilities

  • Own the scheduler/runtime layer (Slurm, Kubernetes, Ray), including multi-tenancy, quotas, and GPU/host fleet management.
  • Lead cluster operations across images, CI/CD, repair/health, performance/telemetry, and incident response.
  • Deliver platform services that ensure workload SLOs and reliable runtime execution.
  • Define and implement namespace/tenancy design, node health automation, golden images, admission controls, on-call runbooks, and go-live gates.
  • Collaborate closely with infra, SRE, and network teams to optimize workload placement and cluster efficiency.
  • Provide hands-on expertise in NCCL behaviours, placement strategies, and congestion signal management.

Requirements

  • Deep expertise in cluster management, scheduling, and runtime environments for large-scale compute.
  • Hands-on background with Slurm, Kubernetes, Ray, or similar orchestration platforms.
  • Strong understanding of NCCL performance tuning, workload isolation, and congestion management.
  • Experience scaling multi-tenant, GPU-heavy clusters with strict SLOs.
  • Ability to thrive in a startup environment with full ownership over platform and cluster strategy.

Salary

  • $500,000 gross per year (Negotiable)
#J-18808-Ljbffr

Job Tags

Full time,

Similar Jobs

Inergroup Insourcing Solutions

Warehouse Packer Job at Inergroup Insourcing Solutions

 ...Hiring Immediately for Warehouse Packer Jobs in McDonough, GA: Inergroup is hiring immediately for Warehouse Packer jobs in McDonough...  ...available Skills and Qualifications: No prior warehouse experience is required; we welcome entry-level candidates. Ability to... 

NYSERDA

Internal Auditor II Job at NYSERDA

 ...can serve as replicable models for other capital providers. Job Overview NYSERDA seeks a self-motivated and knowledgeable Internal Auditor II with more than seven years of audit experience, including meaningful exposure to investment fund operations. This role will... 

Day One Agency

VP, Creative Director (Art) Job at Day One Agency

 ...VP, Creative Director (Art) Looking for something fresh? Day One Agency is hiring. Day One is a creative agency committed to bringing fresh thinking to our work. Every. Single. Day. For us, every day is day one, a new opportunity to start from the beginning, find a...