#299684•Dodano Invalid Date•15•źródło: Infogain
Network Operations Center Lead
27 720 - 30 240 PLN(znormalizowane)
Doświadczenie
Senior
Lokalizacja
—
Tryb pracy
Zdalnie
Wymiar
Full-time
AWSAzureBashDynatraceKubernetesLinuxPowershellPythonSplunkPrometheusgrafanaCI/CDTerraformDataDogGCPGKE
O ofercie
You will join an SRE-aligned operations team responsible for keeping a mission-critical, global cloud platform reliable, performant, and secure. The project focuses on 24/7 cloud operations, proactive monitoring, incident response, and continuous improvement of observability coverage across multi-region GCP environments. You will work closely with SRE, Cloud Engineering, and development teams to maintain high availability, support business continuity, and drive operational excellence.
Wymagania
- Bachelor’s degree in Computer Science, Information Technology, or related field (or equivalent practical experience).
- 3–5 years of experience in a NOC, operations, or cloud infrastructure support role.
- Strong understanding of cloud platforms and core services (AWS / Azure / GCP), with hands-on exposure to production operations.
- Familiarity with container orchestration (Kubernetes / GKE) and CI/CD pipelines.
- Experience with monitoring and logging tools such as Datadog, Dynatrace, Prometheus, Grafana, ELK, CloudWatch, Splunk, Sumo Logic, New Relic (or equivalents).
- Proficiency in Linux/Unix environments.
- Basic scripting/automation skills (Python, Bash, PowerShell) and/or Infrastructure-as-Code exposure (Terraform).
- Strong communication, documentation, and incident management skills; able to collaborate effectively across engineering teams.
Obowiązki
- Monitor GCP cloud infrastructure across multiple regions using advanced observability tooling; identify monitoring gaps and implement improvements to increase coverage.
- Respond to alerts and incidents in real time; gather supporting data for root cause analysis and escalate when required.
- Investigate logs, APM traces, dashboards, and monitors to assess broader/tangential impact and provide incident forensics.
- Troubleshoot issues related to cloud networking, containers, storage, APIs, and service reliability.
- Create, maintain, and improve troubleshooting guides (TSGs), incident response procedures, runbooks, and operational documentation.
- Provide leadership and mentorship to NOC engineers; help set operational standards and best practices.
- Collaborate with SRE, Cloud Engineering, and development teams to resolve complex infrastructure and reliability issues.
- Perform routine health checks across the cloud environment and ensure readiness for high availability.
- Monitor observability platform spend and recommend optimization actions where appropriate.
- Evaluate private beta/beta releases of observability tooling; summarize findings and advise on adoption.
- Perform routine patching and upgrades of observability agents across the platform.
- Contribute changes to a source repository (e.g., runbooks, configs, automation, monitoring-as-code).
- Ensure compliance with SLAs, security policies, and operational standards.
- Participate in a 24/7 on-call rotation and support disaster recovery and business continuity activities.
- Analyze performance metrics and recommend opportunities for optimization and automation.
Benefity
- Hybrid work model combining office & remote work
- Attractively located office with collaboration spaces
- Onsite parking space for employees
- Referral program with financial bonus
- Life Insurance
- Budget for development (including language courses and others), clear career path with the possibility to gain experience in international environment
- Access to internal Learning Platform with multiple trainings oriented for professional growth
- Access to MyBenefit platform (Multisport included)
- Team Building activities
- Charity initiatives
- Working environment promoting diversity and inclusion
- Private medical care - Platinum Package
Inne informacje
Must possess a legal work permit in Poland