IT – Server Farm Staff System Engineer

Responsibilities

  • Support R&D users in EDA software development and IP designs.
  • Managing the US based server farm team and meeting operations SLA.
  • Operating, managing and enhancing the internal compute farm and associated cloud (AWS).
  • Maintaining, enhancing, monitoring, reporting, and improving its efficiency.

Requirements

  • Extensive technical experience managing IBM LSF and RTM in a Farm environment.  Knowledge of LSF spanning Farm to Cloud is highly desirable
  • Extensive technical knowledge to  build and management automation using tools such as jumpstart, kickstart, Chef or Puppet/Ansible and Shell script. Sun Grid Engine experience is also desirable
  • Solid understanding and proven operational experience with compute farms, job submission/management technologies, cloud, and associated management tools.
  • Three years of direct management experience of a global or regional compute farm and/or hybrid cloud environment consisting of a 1,000 or more servers with some remote direct reports
  • Three years technical experience architecting, maintaining and managing a compute farm environment running Linux.
  • Proven experience working directly with R&D software development teams to collaboratively develop solutions to optimize their working environment (Direct EDA experience desired)
  • Proven experience in capacity and performance management, optimizing performance, ensuring adequate capacity, working with R&D on optimization of their workloads, and development and maintenance of key performance indicators
  • At least 3 years working in a global group, coordinating support, strategies, projects and operations across multiple geographies in a team oriented approach
  • A proven process focus shown through documentation, change management, incident management and problem resolution activities

Education

  • BS / MS in computer science or related field
Share this job