Site Reliability Engineer (SRE/ DevOps) - Engineering Productivity

Brak informacji o wynagrodzeniu
SeniorFull-time
#346927·Dodano dziś·0
Źródło: Arista Networks
Aplikuj teraz

Tech Stack / Keywords

DevOpsNetworkingGoPythonShellScriptingLinuxUnix

Firma i stanowisko

Arista Networks is an industry leader in data-driven, client-to-cloud networking for large data center, campus and routing environments. They leverage advancements in cloud computing, artificial intelligence, and software-defined networking to provide competitive solutions. The company values diversity and inclusivity and has received awards for Best Engineering Team, Best Company for Diversity, Compensation, and Work-Life Balance.


Wymagania

Essential Skills:

  • At least BSc Computer Science or Engineering + 3 years’ experience, MS Computer Science or Engineering + 3 years’ experience, or equivalent work experience.
  • Knowledge of one or more of Go, Python, shell scripting to implement medium complexity automation workflows.
  • Knowledge of Linux (or UNIX) from administration and debugging perspective.
  • Hands-on experience operating software systems (infrastructure, complex applications) at scale.
  • Experience in server provisioning, especially from storage and networking perspective.
  • Strong problem solving and software troubleshooting skills.
  • Experience with infrastructure-as-code.

Desired Skills:

  • Experience managing databases such as mariadb, postgres, mongodb.
  • Experience with docker and virtualization technologies like kvm, qemu, kata-containers.
  • Experience managing monitoring stack including Prometheus, Loki, Tempo, InfluxDB, Grafana, Thanos.
  • Experience managing ElasticSearch clusters.
  • Experience managing Artifactory, docker registry.
  • Experience managing CI/CD systems like ArgoCD, Spinnaker.
  • Experience managing version control systems like Perforce, Gerrit.
  • Experience with infrastructure-as-code frameworks like Ansible.
  • Experience managing large Java applications.
  • Experience in storage infrastructure management such as NAS, SAN, Ceph.

Obowiązki

  • Build, deploy safely and incrementally and operate critical production systems focusing on scalability, reliability, observability, performance, and security.
  • Monitor, support and enhance developer experience across services.
  • Build automation to remove toil and efficiently operate production systems.
  • Proactively monitor, respond to, and enhance alerts and set up automated alert handling.
  • Create and maintain incident response runbooks.
  • Triage platform/infrastructural issues and assist Arista software engineers in their triages; engage with 3rd party vendor support.
  • Write postmortem documents and build solutions to prevent incident recurrence.
  • Plan and communicate maintenance windows on production systems.
  • Work with product development teams to identify infrastructural bottlenecks and design and implement solutions.
  • Survey and adopt best practices around infrastructure/platform to maintain secure, scalable, and fault-tolerant systems.
  • Study design and implementation details of OSS systems for better triage and fix resolution.
Arista Networks

Arista Networks

14 aktywnych ofert

Zobacz wszystkie oferty
Aplikuj teraz