Senior Software Engineer - Site Reliability Engineering, Vertex AI 3P SRE

Brak informacji o wynagrodzeniu
MidFull-time·Umowa o pracę
#332927·Dodano 11 dni temu·13
Źródło: theprotocol.it
Aplikuj teraz

Tech Stack / Keywords

Windows

Firma i stanowisko

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google Cloud's services—both internally critical and externally-visible systems—have reliability, uptime appropriate to customer's needs and a fast rate of improvement. SRE also monitors systems capacity and performance.

Much of the software development focuses on optimizing existing systems, building infrastructure, and eliminating work through automation. The SRE team manages complex challenges of scale unique to Google Cloud, using expertise in coding, algorithms, complexity analysis, and large-scale system design.

Vertex AI is Google Cloud's flagship AI platform, offering tools and services for the entire ML lifecycle, including data preparation, experimentation, training, tuning, and serving models at scale. Vertex AI supports third-party models, including open-source models from Model Garden and enterprise-grade models from partners like Anthropic.

The role focuses on ensuring the reliability and performance of the infrastructure underpinning third-party model deployments, especially the GKE-based serving stack.

The Technical Infrastructure team develops and maintains data centers and builds Google platforms to keep networks up and running, ensuring users have the best and fastest experience possible.


Wymagania

  • Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
  • 5 years of experience with software development in one or more programming languages.
  • 3 years of experience in designing, analyzing, and troubleshooting large-scale distributed systems.
  • 2 years of experience leading projects and providing technical leadership.

Nice to have:

  • Master's degree in Computer Science or Engineering or a related technical field.
  • Experience with Load Balancing, high-availability, and scalable system design.
  • Experience with Monitoring and Observability in Cloud and cluster management systems (e.g., Cloud Monitoring, Monarch, Automon).
  • Experience with Performance analysis and optimization (e.g., Dapper, pprof).
  • Experience with cluster management system and Google Cloud interactions.

Obowiązki

  • Engage in and improve the whole lifecycle of services from inception and design through to deployment, operation, and refinement.
  • Support services before they go live through system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews.
  • Maintain services once live by measuring and monitoring availability, latency, and overall system health.
  • Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity.
  • Practice sustainable incident response and blameless post-mortems (Post-Mortem Examinations).

Oferta

  • Sharing the costs of sports activities
  • Private medical care
  • Sharing the costs of foreign language classes
  • Sharing the costs of professional training & courses
  • Life insurance
  • Remote work opportunities
  • Fruits
  • Corporate products and services at discounted prices
  • Integration events
  • Dental care
  • Corporate gym
  • Corporate sports team
  • Retirement pension plan
  • Saving & investment scheme
  • Corporate library
  • No dress code
  • Coffee / tea
  • Drinks
  • Parking space for employees
  • Leisure zone
  • Extra social benefits
  • Meal passes
  • Redeployment package
  • Employee referral program
  • Opportunity to obtain permits and licenses
  • Charity initiatives
  • Extra leave
Karta sportowa
Opieka zdrowotna
Kursy językowe
Dofinansowanie szkoleń
Ubezpieczenie
Elastyczne godziny
Parking dla aut
Stołówka
Darmowe napoje
Darmowe przekąski
Google

Google

227 aktywnych ofert

Zobacz wszystkie oferty
Aplikuj teraz