#300249•Dodano Invalid Date•13•źródło: Graphcore
Senior Staff Engineer - Observability Infrastructure
29 225 - 39 534 PLN(znormalizowane)
Doświadczenie
Senior
Lokalizacja
Gdańsk
Tryb pracy
Stacjonarnie
Wymiar
Full-time
AIArchitectureCloudTestingUnit TestingPrometheusGrafanaKafka
O ofercie
Graphcore is a company building AI compute technology, including semiconductor, software, and AI infrastructure at datacenter scale. It is part of the SoftBank Group and delivers technology into the SoftBank AI ecosystem. The company is expanding globally to address AI opportunities.
Wymagania
- BSc or MSc degree in Computer Engineering, Computer Science, or related field, or equivalent experience.
- Proven success in architecting and implementing scalable, performant, reliable cluster management systems including telemetry collection and analysis engines.
- Experience managing large-scale datacenters with a focus on hardware observability solutions.
- Experience maintaining and scaling observability stacks using Prometheus, Grafana, OTEL, ClickHouse, Kafka, Superset, or Elastic Stack, with understanding of secure telemetry practices and data exposure controls.
- Working knowledge of Datadog, Dynatrace, or Splunk.
- Experience with large-scale telemetry datasets, time series databases, down-sampling techniques, and creating actionable dashboards.
- Experience with automation technologies such as Ansible or Terraform.
- Experience with containerization technologies like Docker and Kubernetes.
- Experience managing or developing in Linux environments.
- Strong skills in at least one of C, C++, Go, or Python.
- Excellent written and verbal communication skills.
Nice to have:
- 10+ years of relevant post-degree experience.
- Knowledge of cloud-native development and deployment methodologies (SaaS/PaaS/IaaS).
- Knowledge of data center networking and monitoring best practices.
- Knowledge of monitoring, observability, and management solutions used by hyperscalers.
- Knowledge of declarative management systems.
Obowiązki
- Contribute to all phases of product development including definition, architecture, design, implementation, debugging, testing, and early customer support.
- Design and implement fault-remediation solutions at scale.
- Implement multi-component integrations based on Graphcore and third-party technology stacks, covering data ingestion, decision making, management, monitoring, and UI.
- Create reference designs including documentation, configuration files, scripts, and source code.
- Deploy solutions internally to support engineering teams in debugging, performance analysis, benchmarking, and test/QA at all scales.
- Maintain and improve deployed infrastructure to provide the best service for customers.
- Ensure solutions are properly tested by collaborating with development and QA teams to enhance unit testing and comprehensive test plans.
- Mentor and guide junior engineers, fostering continuous learning and improvement.
Benefity
- Competitive salary.
- Annual leave policy.
- Medical and dental health plans.
- Gym card.
- Employee pension matched up to 4%.
- Yearly review of benefits program.
- Inclusive work environment with equal opportunity process.
- Flexible interview approach and reasonable adjustments upon request.
Inne informacje
Applicants must hold the right to work in Poland. Visa sponsorship or support for visa applications is not provided.