#300135•Dodano Invalid Date•12•źródło: nofluffjobs.com
Senior Staff Engineer - Observability Infrastructure
350 700 - 474 400 PLN(znormalizowane)
Doświadczenie
Senior
Lokalizacja
Gdańsk
Tryb pracy
Hybryda
Wymiar
Full-time
GrafanaPrometheusOTELDatadogKafkaSupersetElastic StackDynatraceSplunkDockerDocker Containers K8sKubernetesLinuxPythonCC++GoAnsibleTerraformSaaSPaaSIaaS
O ofercie
Graphcore is a company building the future of AI compute, combining semiconductor, software, and AI expertise to create a complete AI compute stack from silicon and software to infrastructure at datacenter scale. It is part of the SoftBank Group and delivers technology into the SoftBank AI ecosystem. The company is expanding globally to address AI opportunities.
Wymagania
- BSc or MSc degree in Computer Engineering, Computer Science, or related field, or equivalent experience.
- Proven success in architecting and implementing scalable, performant, reliable cluster management systems including telemetry collection and analysis engines.
- Experience managing large-scale datacenters with a focus on hardware observability solutions.
- Experience maintaining and scaling modern observability stacks using Prometheus, Grafana, OTEL, ClickHouse, Kafka, Superset, or Elastic Stack.
- Understanding of secure telemetry practices and data exposure controls.
- Working knowledge of Datadog, Dynatrace, or Splunk.
- Experience with large-scale telemetry datasets, time series databases, down-sampling techniques, and creating actionable dashboards.
- Experience with automation technologies such as Ansible or Terraform.
- Experience in containerization technologies including Docker and Kubernetes.
- Experience managing or developing in Linux environments.
- Strong skills in at least one of C, C++, Go, or Python.
- Excellent written and verbal communication skills.
Desirable:
- 10+ years of relevant post-degree experience.
- Knowledge of cloud-native development and deployment methodologies (SaaS, PaaS, IaaS).
- Knowledge of data center networking and monitoring best practices.
- Knowledge of monitoring, observability, and management solutions used by hyperscalers.
- Knowledge of declarative management systems.
Obowiązki
- Contribute to all phases of product development including definition, architecture, design, implementation, debugging, testing, and early customer support.
- Design and implement fault-remediation solutions at scale.
- Implement multi-component integrations based on Graphcore and third-party technology stacks, covering data ingestion to decision making, ensuring seamless management, monitoring, and UI.
- Create reference designs including documentation, configuration files, scripts, and source code.
- Deploy solutions internally to support engineering teams in debugging, performance analysis, benchmarking, and test/QA at all scales.
- Maintain and improve deployed infrastructure to provide optimal service for customers.
- Ensure solutions are properly tested by collaborating with development and QA teams to enhance unit testing and comprehensive test plans.
- Mentor and guide junior engineers, fostering continuous learning and improvement.
Benefity
- Competitive salary
- Annual leave policy
- Medical and dental health plans
- Gym card
- Employee pension matched up to 4%
- Equal opportunity process with inclusive work environment
- Flexible interview approach with reasonable adjustments
- Private healthcare
- International projects
- Team events
- Training budget
- Stunning office view
- Modern office
- Free coffee, snacks, and beverages
- Gym
- Bike parking
- Shower
- Canteen
- Playroom
- Free parking
- Startup atmosphere
- No dress code
- Free breakfast