Senior GPU Networking Architect

292 500 - 507 000 PLN/ rok.Umowa o pracę (brutto)
375 000 - 650 000 PLN/ rok.Umowa o pracę (brutto)
SeniorFull-time·Umowa o pracę
#320960·Dodano 28 dni temu·49
Źródło: NVIDIA
Aplikuj teraz

Tech Stack / Keywords

NetworkingAIArchitectureNetworkCUDANVSHMEMLLMPyTorch

Firma i stanowisko

NVIDIA is a technology company with over 25 years of experience in computer graphics, PC gaming, and accelerated computing. The company is focused on AI and GPU computing, developing software foundations for large-scale AI systems.


Wymagania

  • 5+ years of hands-on CUDA programming, including writing and optimizing GPU kernels.
  • M.Sc. or equivalent experience in computer science, computer engineering, or related field.
  • Strong understanding of GPU architecture fundamentals including warp scheduling, shared memory, L2 cache, memory coalescing, occupancy tuning, and asynchronous execution.
  • Experience with systems-level C/C++ development in performance-critical environments.
  • Familiarity with GPU data movement mechanisms such as GPUDirect RDMA and GPU-initiated communication.
  • Ability to read and analyze GPU performance profiles (e.g., Nsight Compute, Nsight Systems) and apply optimizations.
  • Strong collaboration skills in a multi-national, interdisciplinary environment.

Nice to have:

  • Experience developing or optimizing communication kernels in libraries such as NCCL, NVSHMEM, or similar.
  • Understanding of distributed deep learning parallelism techniques and their communication patterns.
  • Background in RDMA, InfiniBand, high-speed networking, and GPU system topology including NVLink, NVSwitch, PCIe, and network fabrics.
  • Experience with overlap techniques like kernel pipelining, persistent kernels, or cooperative groups.
  • Proven experience optimizing large-scale LLM training or inference workloads and familiarity with frameworks such as PyTorch, TensorRT-LLM, or vLLM.

Obowiązki

  • Build, implement, and optimize GPU communication kernels for collective and point-to-point operations in large-scale AI systems.
  • Leverage knowledge of GPU architecture to improve kernel efficiency, minimize latency, and overlap computation with communication.
  • Develop GPU-resident communication primitives and device-side APIs for kernel-initiated data movement across nodes and accelerators.
  • Profile and tune GPU kernels end-to-end, identifying bottlenecks and driving targeted optimizations.
  • Collaborate with network software, hardware, and AI framework teams to co-design communication strategies aligned with GPU execution patterns.
  • Build proofs-of-concept, conduct experiments, and perform quantitative modeling to evaluate new communication strategies.
  • Contribute to the evolution of programming models exposing GPU-aware networking capabilities to application developers.

Oferta

  • Highly competitive salaries.
  • Comprehensive benefits package.
Bonusy
NVIDIA

NVIDIA

30 aktywnych ofert

Zobacz wszystkie oferty
Aplikuj teraz