Nowa
Senior Machine Learning /AI Engineer (RL)
170 - 230 PLN/ godz.B2B (netto)
SeniorFull-time·B2B
#347762·Dodano wczoraj·0
Źródło: nofluffjobs.comTech Stack / Keywords
PythonMachine learningReinforcement LearningClaude CodeCodex
Firma i stanowisko
You will be cooperating with a leading provider of AI evaluation and optimization solutions, trusted by multinational companies to optimize AI agents and detect performance issues in large language models. The company’s mission is to enable safe, verifiable, and aligned AGI through rigorous, real-world agent evaluation.
Wymagania
- 6+ years of experience in Python software engineering.
- Minimum 3 years in a Machine Learning/Environment Engineering, Data Scientist position.
- Practical knowledge of AI frameworks (Langchain, Langraph, mcp-server).
- Extensive practical experience in working with AI, including prompt engineering and vibe coding.
- Experience in working with business requirements (analysis, summarizing, responding to changes).
- Expertise in planning your own work or that of a small team.
- Ability to work 2 p.m. - 10 p.m.
Nice to have:
- Knowledge of Codex or Claude Code.
- Experience in integrating AI with a system.
- Understanding of RL concepts - reward modeling, environment dynamics, verifiability, evaluation, and agent interaction loops.
- Familiarity with instrumentation, metrics, and data pipelines for RL evaluation.
Obowiązki
- Design and implement RL environments that support large-scale agent evaluation and reinforcement learning experiments.
- Build task generation pipelines, dynamic datasets, and scripted environments with controlled complexity and stochasticity.
- Develop verifiers and reward models to automatically score trajectories and evaluate model reasoning.
- Collaborate with infrastructure and systems engineers to ensure environments are scalable, reproducible, and instrumented for detailed telemetry.
- Design APIs and orchestration frameworks for running, resetting, and evaluating agents across environments.
- Optimize environment performance, logging, and reward reproducibility across distributed setups.
Oferta
- Sport subscription
- Private healthcare
- Flat structure
- Small teams
- International projects
Karta sportowa
Opieka zdrowotna
Inne informacje
Due to the client’s time zone, we would appreciate a candidate who can work 2 p.m. - 10 p.m.
Acaisoft
19 aktywnych ofert