Nowa
Staff Software Engineer - Production Engineering
Brak informacji o wynagrodzeniu
SeniorFull-time
#343003·Dodano dziś·0
Źródło: SnowflakeTech Stack / Keywords
SnowflakeAIGolangTestingKubernetesLinuxCloudAWS
Firma i stanowisko
Snowflake is a company focused on powering the era of the agentic enterprise by integrating AI as a core collaborator in work processes. The Production Engineering Team is responsible for driving reliability tools and processes to ensure a top-tier customer experience, including championing Service Level Objectives (SLOs), building infrastructure for rapid detection of reliability issues, and engaging in system health verification after releases.
Wymagania
- Bachelor's degree in Computer Science, a related technical field involving software engineering, or equivalent practical experience.
- Proficient in at least one modern programming language, preferably Golang.
- Systematic problem-solving methods, effective communication skills.
Preferred qualifications:
- 10+ years industry experience designing, building and supporting large scale systems in production.
- Experience in modern observability tools and production monitoring practices.
- Experience with capacity and load testing of distributed applications.
- Experience with containers and container orchestration systems such as Kubernetes.
- Experience in deploying, managing, and operating scalable and fault tolerant Linux infrastructure.
- Experience with the SLO-driven reliability management processes.
- Hands on experience with one or more public cloud providers (AWS, Azure, or GCP).
- Ability to spot systematic issues, define roadmaps and guide other engineers to resolve them.
Obowiązki
- Lead the improvement of the whole lifecycle of services—from inception and design, deployment, operation, and refinement.
- Drive scaling systems sustainably by automation; drive changes that improve reliability and velocity.
- Establish and practice low noise incident response rotations and blameless postmortems to prevent problem recurrence.
- Write and review code. Develop documentation and capacity plans, and debug the hardest problems on large distributed systems.
- Collaborate with software engineers to establish, maintain, and optimize functional and performance SLOs.
- Participate in a 24x1 on-call rotation.
Snowflake
20 aktywnych ofert