Engineering Manager - Technical Platform Systems (Observability)

Brak informacji o wynagrodzeniu
SeniorFull-time
#335765·Dodano 5 dni temu·0
Źródło: nofluffjobs.com
Aplikuj teraz

Tech Stack / Keywords

Site reliability engineeringInfrastructure as Code

Firma i stanowisko

The Technical Platform Observability team at Allegro ensures platform stability by performing 250,000 system health checks every minute, preventing problems before they impact customers and providing performance insights and proactive alerts for the entire Tech organization supporting 1000+ 24/7 on-duty officers.


Wymagania

  • Experience in leading and growing engineering teams with a focus on coaching, mentoring, and building a culture of ownership.
  • Strong technical background in Observability, Monitoring, or Site Reliability Engineering (SRE).
  • Proven track record of managing complex migrations or large-scale infrastructure projects.
  • Understanding of the "Last Mile of Observability" to ensure automated signals translate into effective human action.
  • Proficiency in modern infrastructure practices including Infrastructure as Code, GitOps, open-source, and high-availability distributed systems.
  • Ability to balance technical debt with delivery of new, scalable platform features.
  • Effective communication in English at a minimum B2 level.
  • Ability to translate technical complexities into clear business value.

Obowiązki

Leadership & Strategy:

  • Leading the team responsible for Allegro’s central observability and monitoring ecosystem.
  • Overseeing mission-critical alerting, routing infrastructure, and self-service monitoring platforms.

Mission-Critical Innovation:

  • Overseeing the strategic transition of on-call management and incident response system.
  • Ensuring zero downtime in alerting coverage for over 2,000 services.

Platform Evolution:

  • Driving the evolution of monitoring-as-a-service capabilities.
  • Moving towards a fully declarative, Git-based workflow to democratize monitoring ownership.

Scalability & Performance:

  • Managing a massive-scale data ecosystem including VictoriaMetrics and Zabbix.
  • Ensuring physical infrastructure safety and long-term performance baselines.

Stakeholder Management:

  • Collaborating with Area Managers and the wider tech community to maintain Grafana as the primary operational front door.

System Discovery:

  • Maintaining automated collection targets for approximately 141,000 active instances.
  • Ensuring new services are monitored immediately upon deployment.

Technical Excellence:

  • Managing technical debt.
  • Providing expertise in high-complexity architectures to ensure platform stability during worst-case failure scenarios.

Oferta

  • Flexible working hours in a hybrid model (4/1) with start times between 7:00 a.m. and 10:00 a.m.
  • Well-located offices with fully equipped kitchens, bicycle parking, terraces, and ergonomic work tools.
  • Choice of a 16" or 14" MacBook Pro or Dell with Windows and necessary accessories.
  • Wide selection of fringe benefits in a cafeteria plan including medical, sports, lunch packages, insurance, and purchase vouchers.
  • Employer-paid English classes related to the job.
  • Training budget, inter-team tourism, hackathons, and internal learning platform.
  • Additional day off for volunteering.
  • Social events such as Spin Kilometers, Family Day, Fat Thursday, Advent of Code.
  • Sport subscription.
  • Private healthcare.
  • Free coffee and beverages.
  • Canteen.
  • Bicycle parking.
  • Shower facilities.
  • Mobile phone.
  • In-house trainings.
  • Modern office environment.
  • No dress code.
Elastyczne godziny
Karta sportowa
Opieka zdrowotna
Darmowe napoje
Stołówka
Parking rowerowy
Prysznic
Telefon służbowy
Szkolenia wewnętrzne
Dofinansowanie szkoleń
Kursy językowe
Allegro

Allegro

124 aktywne oferty

Zobacz wszystkie oferty
Aplikuj teraz