Site Reliability, Staff - 5354 - Synopsys

General Information

Job Title

Cloud Site Reliability, Staff

Job ID

5354

Country

India

City

Hyderabad

Date Posted

03-Sep-2024

Job Category

Information Technology

Job Subcategory

Site Reliability

Hire Type

Employee

Remote Eligible

Descriptions & Requirements

Job Description and Requirements

Synopsys IT cloud team is responsible for providing best in class EDA Infrastructure & Design environment in the public cloud, optimized to meet the scale and complexity of the EDA workload.

As we expand our cloud deployments, we are looking for a talented Site Reliability Engineer with experience of EDA/HPC environments to deliver insights from massive-scale data in real-time. Specifically, we are searching for someone who brings fresh ideas, demonstrates a unique and informed viewpoint, and enjoys collaborating with a cross-functional team to develop real-world solutions and positive user experiences at every interaction.

Responsibilities

Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve.
Implement, maintain, and consult on the observability stack that supports the needs of multiple internal stakeholders.
Utilize your deep experience and problem-solving skills to help prevent and investigate production issues.
Participate in the design and implementation of new system layers of high complexity compute environments.
Researching and recommending specific systems, architectures, and applications, for cloud infrastructure solutions.

Desired Skills:

A degree in Computer Science or a related field, with a minimum of 5 years of experience in SRE roles.
Knowledge of Cloud engineering / architecture with Azure, AWS or GCP.
Familiarity with containerization technologies such as Docker, Swarm and Kubernetes.
Knowledge of IaaC / configuration mgmt. / systems automation tools at scale (e.g. Terraform, Ansible, etc.);
Deep knowledge of Linux OS, Networking and NFS technologies.
Experience with data stores and search engines such as Elasticsearch is a must. Other technologies like Prometheus, Grafana, and similar technologies is a plus.
Experience with CI/CD: GitOps / GitHub Actions, ArgoCD, Flux.
Solid Python programming skills and experience.
SLURM, Linux, networking and NFS is required.
Excellent problem-solving skills and attention to detail.
Ability to work collaboratively with other teams and stakeholders.
Ability to work in a fast-paced and dynamic environment.
Experience implementing and delivering monitoring solutions in development, QA, and Production environments.
Domain Knowledge of the underlying infrastructure requirements such as Networking, Storage, and Hardware Optimization.
Proven experience in High-Performance Computing environments for HPC/EDA workload.
Extremely strong problem-solving / troubleshooting skills. HA and Scalability knowledge and experience
Proven experience in High-Performance Computing environments for HPC/EDA workload.

Personal attributes:

A team player with strong collaboration skills. Proven communication skills, both verbal and written.
Passion for continuous learning and knowledge sharing
Ability to drive continuous improvement and propose innovative solutions

Inclusion and Diversity are important to us. Synopsys considers all applicants for employment without regard to race, color, religion, national origin, gender, sexual orientation, gender identity, age, military veteran status, or disability.