#300135•Dodano Invalid Date•12•źródło: nofluffjobs.com

Senior Staff Engineer - Observability Infrastructure

350 700 - 474 400 PLN(znormalizowane)

Doświadczenie

Senior

Lokalizacja

Gdańsk

Tryb pracy

Hybryda

Wymiar

Full-time

GrafanaPrometheusOTELDatadogKafkaSupersetElastic StackDynatraceSplunkDockerDocker Containers K8sKubernetesLinuxPythonCC++GoAnsibleTerraformSaaSPaaSIaaS

Aplikuj

O ofercie

Graphcore is a company building the future of AI compute, combining semiconductor, software, and AI expertise to create a complete AI compute stack from silicon and software to infrastructure at datacenter scale. It is part of the SoftBank Group and delivers technology into the SoftBank AI ecosystem. The company is expanding globally to address AI opportunities.

Wymagania

BSc or MSc degree in Computer Engineering, Computer Science, or related field, or equivalent experience.
Proven success in architecting and implementing scalable, performant, reliable cluster management systems including telemetry collection and analysis engines.
Experience managing large-scale datacenters with a focus on hardware observability solutions.
Experience maintaining and scaling modern observability stacks using Prometheus, Grafana, OTEL, ClickHouse, Kafka, Superset, or Elastic Stack.
Understanding of secure telemetry practices and data exposure controls.
Working knowledge of Datadog, Dynatrace, or Splunk.
Experience with large-scale telemetry datasets, time series databases, down-sampling techniques, and creating actionable dashboards.
Experience with automation technologies such as Ansible or Terraform.
Experience in containerization technologies including Docker and Kubernetes.
Experience managing or developing in Linux environments.
Strong skills in at least one of C, C++, Go, or Python.
Excellent written and verbal communication skills.

Desirable:

10+ years of relevant post-degree experience.
Knowledge of cloud-native development and deployment methodologies (SaaS, PaaS, IaaS).
Knowledge of data center networking and monitoring best practices.
Knowledge of monitoring, observability, and management solutions used by hyperscalers.
Knowledge of declarative management systems.

Obowiązki

Contribute to all phases of product development including definition, architecture, design, implementation, debugging, testing, and early customer support.
Design and implement fault-remediation solutions at scale.
Implement multi-component integrations based on Graphcore and third-party technology stacks, covering data ingestion to decision making, ensuring seamless management, monitoring, and UI.
Create reference designs including documentation, configuration files, scripts, and source code.
Deploy solutions internally to support engineering teams in debugging, performance analysis, benchmarking, and test/QA at all scales.
Maintain and improve deployed infrastructure to provide optimal service for customers.
Ensure solutions are properly tested by collaborating with development and QA teams to enhance unit testing and comprehensive test plans.
Mentor and guide junior engineers, fostering continuous learning and improvement.

Benefity

Competitive salary
Annual leave policy
Medical and dental health plans
Gym card
Employee pension matched up to 4%
Equal opportunity process with inclusive work environment
Flexible interview approach with reasonable adjustments
Private healthcare
International projects
Team events
Training budget
Stunning office view
Modern office
Free coffee, snacks, and beverages
Gym
Bike parking
Shower
Canteen
Playroom
Free parking
Startup atmosphere
No dress code
Free breakfast