T Hub - AI Expert

Brak informacji o wynagrodzeniu
SeniorFull-time·Umowa o pracę
#331954·Dodano około miesiąc temu·16
Źródło: T-Mobile
Aplikuj teraz

Tech Stack / Keywords

AINLPLlamaPythonPyTorchDatabasesLinuxOpenShift

Firma i stanowisko

We seek an AI Expert with deep expertise in designing, implementing, and optimizing Retrieval Augmented Generation (RAG) systems in on-premises environments. The ideal candidate will have hands-on experience with vLLM, liteLLM, and open-source LLMs like gpt-oss or qwen, along with a proven ability to integrate these tools into scalable, secure, and high-performance enterprise workflows.


Wymagania

  • Bachelor’s/Master’s/PhD in Computer Science, AI, or related field.
  • 3+ years in ML/NLP roles, with 2+ years focused on RAG systems.
  • Proven experience deploying LLMs in on-prem or hybrid environments.
  • Proficiency with vLLM, LiteLLM, and open-source LLMs (e.g., LLAMA 3.2, Deepseek, Mistral).
  • Strong Python expertise with frameworks like PyTorch, Hugging Face Transformers, and LangChain.
  • Experience with vector databases (e.g. qdrant).
  • Familiarity with Linux-based systems and RedHat OpenShift.
  • Ability to communicate complex AI concepts to non-technical stakeholders.
  • Strong problem-solving skills and adaptability in fast-paced environments.

Obowiązki

RAG System Development:

  • Architect and deploy end-to-end RAG pipelines, combining retrieval mechanisms (e.g., vector databases like qdrant) with generative models for enterprise use cases.
  • Fine-tune and optimize retrieval models to ensure high accuracy and low latency in on-prem environments.

Model Integration & Deployment:

  • Implement and customize inference servers using vLLM for efficient LLM serving and LiteLLM for lightweight model orchestration.
  • Integrate open-source LLMs with proprietary data sources and APIs.

On-Prem Infrastructure Management:

  • Design GPU-optimized, scalable infrastructure for LLM training and inference, ensuring compliance with security and data governance policies.
  • Collaborate with DevOps teams to containerize workflows using Docker/Kubernetes and automate MLOps pipelines.

Performance Optimization:

  • Apply techniques like quantization, pruning, and dynamic batching to maximize resource efficiency in resource-constrained on-prem setups.
  • Monitor system performance, troubleshoot bottlenecks, and ensure high availability.

Cross-Functional Collaboration:

  • Partner with data engineers to curate and preprocess domain-specific datasets for retrieval and generation tasks.
  • Translate business requirements into technical solutions for stakeholders in telco environments.
T-Mobile

T-Mobile

109 aktywnych ofert

Zobacz wszystkie oferty
Aplikuj teraz