#299684•Dodano Invalid Date•15•źródło: Infogain

Network Operations Center Lead

27 720 - 30 240 PLN(znormalizowane)

Doświadczenie

Senior

Lokalizacja

—

Tryb pracy

Zdalnie

Wymiar

Full-time

AWSAzureBashDynatraceKubernetesLinuxPowershellPythonSplunkPrometheusgrafanaCI/CDTerraformDataDogGCPGKE

Aplikuj

O ofercie

You will join an SRE-aligned operations team responsible for keeping a mission-critical, global cloud platform reliable, performant, and secure. The project focuses on 24/7 cloud operations, proactive monitoring, incident response, and continuous improvement of observability coverage across multi-region GCP environments. You will work closely with SRE, Cloud Engineering, and development teams to maintain high availability, support business continuity, and drive operational excellence.

Wymagania

Bachelor’s degree in Computer Science, Information Technology, or related field (or equivalent practical experience).
3–5 years of experience in a NOC, operations, or cloud infrastructure support role.
Strong understanding of cloud platforms and core services (AWS / Azure / GCP), with hands-on exposure to production operations.
Familiarity with container orchestration (Kubernetes / GKE) and CI/CD pipelines.
Experience with monitoring and logging tools such as Datadog, Dynatrace, Prometheus, Grafana, ELK, CloudWatch, Splunk, Sumo Logic, New Relic (or equivalents).
Proficiency in Linux/Unix environments.
Basic scripting/automation skills (Python, Bash, PowerShell) and/or Infrastructure-as-Code exposure (Terraform).
Strong communication, documentation, and incident management skills; able to collaborate effectively across engineering teams.

Obowiązki

Monitor GCP cloud infrastructure across multiple regions using advanced observability tooling; identify monitoring gaps and implement improvements to increase coverage.
Respond to alerts and incidents in real time; gather supporting data for root cause analysis and escalate when required.
Investigate logs, APM traces, dashboards, and monitors to assess broader/tangential impact and provide incident forensics.
Troubleshoot issues related to cloud networking, containers, storage, APIs, and service reliability.
Create, maintain, and improve troubleshooting guides (TSGs), incident response procedures, runbooks, and operational documentation.
Provide leadership and mentorship to NOC engineers; help set operational standards and best practices.
Collaborate with SRE, Cloud Engineering, and development teams to resolve complex infrastructure and reliability issues.
Perform routine health checks across the cloud environment and ensure readiness for high availability.
Monitor observability platform spend and recommend optimization actions where appropriate.
Evaluate private beta/beta releases of observability tooling; summarize findings and advise on adoption.
Perform routine patching and upgrades of observability agents across the platform.
Contribute changes to a source repository (e.g., runbooks, configs, automation, monitoring-as-code).
Ensure compliance with SLAs, security policies, and operational standards.
Participate in a 24/7 on-call rotation and support disaster recovery and business continuity activities.
Analyze performance metrics and recommend opportunities for optimization and automation.

Benefity

Hybrid work model combining office & remote work
Attractively located office with collaboration spaces
Onsite parking space for employees
Referral program with financial bonus
Life Insurance
Budget for development (including language courses and others), clear career path with the possibility to gain experience in international environment
Access to internal Learning Platform with multiple trainings oriented for professional growth
Access to MyBenefit platform (Multisport included)
Team Building activities
Charity initiatives
Working environment promoting diversity and inclusion
Private medical care - Platinum Package

Inne informacje

Must possess a legal work permit in Poland