Senior Site Reliability Engineer
Tech Stack / Keywords
Firma i stanowisko
SmartRecruiters delivers an AI-powered hiring platform built for global scale, automating and optimizing the entire talent acquisition process. More than 4,000 companies, including LinkedIn, McDonald's, VISA, CD Projekt Red, and Allegro rely on SmartRecruiters to build winning teams. In 2025, SmartRecruiters joined SAP, the global leader in enterprise applications, to accelerate the reinvention of hiring by combining AI innovation with SAP’s ecosystem. The R&D structure is based on empowered product teams responsible for business outcomes and autonomous problem solving.
Wymagania
- Most Senior Engineers have 5+ years of professional experience
- Working knowledge of SRE and observability industry standards and best practices (SLIs/SLOs, error budgets, incident management, on-call)
- Engineering experience in JVM stack
- Experience with AWS (or other cloud provider), Kubernetes, and IaC tools and practices, including running and troubleshooting distributed applications
- Proven track record of delivering solutions for reliability, monitoring, and container management
- Deep knowledge of the Linux operating system, focusing on system hardening and troubleshooting performance issues
- Very good scripting skills (Bash, Golang or Python)
- Experience managing and troubleshooting database systems, both SQL and NoSQL is a plus
- Solid understanding of networking standards, including TCP/IP, DNS, VPN and load balancing is a plus
- Comfortable partnering with teams to design resilient data access and use database observability to prevent and resolve incidents
- Strong communication skills in English, both verbal and written, with ability to coach and influence other engineers
Obowiązki
- Cooperate closely with other Platform and Engineering teams on strategic reliability and observability initiatives across SmartRecruiters
- Improve, automate and grow SmartRecruiters observability and reliability tooling (metrics, logs, traces, alerting)
- Respond to production incidents and client threats, lead remediation, and drive follow-up improvements
- Partner with product engineers working in Java, Node.js, and Python to design, instrument, and operate services for failure, owning SLIs/SLOs and error budgets together
- Create reusable building blocks (dashboards, alerts, libraries and IaC modules) that can be rolled out company-wide
- Mentor members of the engineering team and act as an advocate for modern SRE and observability practices
- Document standards, best practices, and policies for monitoring, alerting, incident response, and reliability
- Conduct capacity planning and performance testing of platform
Oferta
- Sport subscription
- Private healthcare
- Small teams
- International projects
- Unlimited vacation days
- Company shutdowns twice a year
- Free coffee
- Bike parking
- Playroom
- Shower
- Free parking
- In-house trainings
- Modern office
- Startup atmosphere
- No dress code
- Family events
- Company parties
- In-house hack days
Inne informacje
Important: the position is available only under a standard contract of employment with 80% of tax deductible cost. You may be located anywhere in Poland and work remotely or out of our Cracow office.
SmartRecruiters Inc.
23 aktywne oferty