Nowa
Engineering Manager - Technical Platform Systems (Observability)
Brak informacji o wynagrodzeniu
SeniorFull-time·Umowa o pracę
#336037·Dodano 5 dni temu·3
Źródło: theprotocol.itTech Stack / Keywords
Windows
Firma i stanowisko
The Technical Platform Observability team at Allegro ensures platform stability by performing 250,000 system health checks every minute, preventing problems before they impact customers. The team supports over 1000 24/7 on-duty officers and provides clear performance insights and proactive alerts for the entire Tech organization. The role involves leading the evolution of the Observability ecosystem, including transitioning incident management and on-call alerting systems.
Wymagania
- Experience in leading and growing engineering teams with focus on coaching, mentoring, and building a culture of ownership.
- Strong technical background in Observability, Monitoring, or Site Reliability Engineering (SRE).
- Proven track record managing complex migrations or large-scale infrastructure projects.
- Understanding of the "Last Mile of Observability" to ensure automated signals translate into effective human action.
- Proficiency in modern infrastructure practices including Infrastructure as Code, GitOps, open-source, and high-availability distributed systems.
- Ability to balance technical debt with delivery of new scalable platform features.
- Effective communication in English at minimum B2 level.
- Ability to translate technical complexities into clear business value.
Obowiązki
Leadership & Strategy:
- Leading the team responsible for Allegro’s central observability and monitoring ecosystem, overseeing mission-critical alerting, routing infrastructure, and self-service monitoring platforms.
Mission-Critical Innovation:
- Overseeing the strategic transition of on-call management and incident response system, ensuring zero downtime in alerting coverage for over 2,000 services.
Platform Evolution:
- Driving the evolution of monitoring-as-a-service capabilities towards a fully declarative, Git-based workflow to democratize monitoring ownership.
Scalability & Performance:
- Managing a massive-scale data ecosystem including VictoriaMetrics (ingesting 100M+ samples/sec) and Zabbix to ensure infrastructure safety and performance baselines.
Stakeholder Management:
- Collaborating with Area Managers and the tech community to ensure Grafana remains the primary operational front door for incident response and system behavior exploration.
System Discovery:
- Maintaining automated collection targets for approximately 141,000 active instances, ensuring new services are monitored upon deployment.
Technical Excellence:
- Managing technical debt and providing expertise in high-complexity architectures to ensure platform stability during worst-case failure scenarios.
Oferta
- Flexible working hours in a hybrid model (4/1) with start times between 7:00 a.m. and 10:00 a.m.
- Well-located offices with fully equipped kitchens, bicycle parking, terraces, and ergonomic work tools.
- Choice of a 16" or 14" MacBook Pro or Dell with Windows and necessary accessories.
- Wide selection of fringe benefits in a cafeteria plan including medical, sports, lunch packages, insurance, and purchase vouchers.
- Employer-paid English classes related to the job.
- Training budget, inter-team tourism, hackathons, and internal learning platform.
- Additional day off for volunteering.
- Social events such as Spin Kilometers, Family Day, Fat Thursday, Advent of Code, and others.
Elastyczne godziny
Parking rowerowy
Opieka zdrowotna
Kursy językowe
Dofinansowanie szkoleń
Karta sportowa
Ubezpieczenie
Imprezy teamowe
Allegro
124 aktywne oferty