Site Reliability Engineer

~21 840 - 35 280 PLN/ mies.
MidFull-time
#299438·Dodano 2 miesiące temu·56
Źródło: Antal
Aplikuj teraz

Tech Stack / Keywords

CloudMicroservicesArchitectureGoogle Cloud PlatformJavaSpring BootSpringApache

Firma i stanowisko

Antal is a leading recruitment and HR advisory company, present in Poland since 1996 and later expanded to the Czech Republic and Hungary. Across the CEE region, we employ around 150 professionals who deliver a full range of services – from specialist and executive recruitment, employee outsourcing and HR consulting, to employer branding and market research.

Our division-based structure combines deep industry expertise with functional specialisation, enabling us to provide tailored solutions for companies in every sector. We act as a trusted partner for both employers and candidates, sharing our knowledge and guiding them through every stage of the talent journey. We connect exceptional people with the right opportunities and help organisations build successful teams.


Wymagania

  • 4+ years of experience supporting and/or developing distributed systems (Java-based environments)
  • Strong troubleshooting and analytical skills
  • Experience with disaster recovery processes
  • Hands-on experience with application lifecycle and CI/CD tooling (JIRA, Confluence, Jenkins, Ansible)
  • Experience supporting complex, cross-platform systems (Java / Python environments)
  • Knowledge of Agile/Kanban delivery models
  • Experience implementing monitoring and logging frameworks (e.g. Grafana, InfluxDB, Prometheus, Splunk, Loki or similar)
  • Basic knowledge of relational databases (Oracle, PostgreSQL)
  • Understanding of cloud platforms (preferably GCP)
  • Familiarity with Unix/Linux environments
  • Ability to lead technical discussions with global support teams
  • Strong communication skills and ability to work across regions

Technical Requirements:

  • Core Java knowledge
  • Application support experience
  • Monitoring tools (Grafana, InfluxDB, Prometheus or similar)
  • Basic cloud knowledge (GCP preferred)
  • Automation tools (Jenkins, Ansible)
  • Knowledge of relational databases (Oracle, PostgreSQL)

Obowiązki

  • Manage application support operations with focus on resiliency, availability and performance
  • Coordinate production incident resolution and conduct post-mortems / root cause analysis
  • Investigate and resolve complex production issues across distributed systems
  • Contribute to continuous service improvement and knowledge base documentation
  • Actively engage in Incident, Problem and Service Management processes
  • Apply SRE principles to enhance reliability, scalability and observability
  • Develop and improve monitoring, alerting and incident detection mechanisms
  • Support hybrid cloud environments and automation initiatives
  • Work in a 2-shift rotation (8:00 AM start / 4:00 PM start)
  • Participate in weekend and on-call rotations

Oferta

  • Opportunity to work on business-critical global risk platforms
  • Participation in a large-scale cloud and architecture transformation
  • Modern technology stack and DevOps culture
  • Hybrid working model (2 days per week in Kraków office)
  • Long-term project within a stable, global financial environment
Antal Sp. z o.o.

Antal Sp. z o.o.

946 aktywnych ofert

Zobacz wszystkie oferty
Aplikuj teraz