Senior ML Engineer (Token Factory)

CompraTica Empleos

EMP:Technology

Czechia, Europe, Germany, Israel, Netherlands, UK

Tiempo Completo

Remoto

0 vistas

Descripción

Acerca de

Nebius: Nebius is leading a new era in cloud infrastructure for the global AI economy

We are building a full-stack AI cloud platform that supports developers and enterprises from data and model training through to production deployment, without the cost and complexity of building large in-house AI/ML infrastructure.

Built by engineers, for engineers.

From large-scale GPU orchestration to inference optimization, we own the hard problems across compute, storage, networking and applied AI.

Listed on Nasdaq (NBIS) and headquartered in Amsterdam, we have a global footprint with R&D hubs across Europe, the UK, North America and Israel.

Our team of 1,500+ includes hundreds of engineers with deep expertise across hardware, software and AI R&D.

The roleToken Factory is a part of Nebius Cloud, one of the world's largest GPU clouds, running tens of thousands of GPUs.

We are building a high-performance inference and fine-tuning platform designed to push foundation models to their hardware limits.

Our mission is to maximize throughput, minimise latency, and optimise cost-per-token across tens of thousands of GPUs.

Some directions we are currently working on, and which you can be a part of: Inference Optimization: Identifying LLM inference bottlenecks to drive production speedups.

Squeezing the maximum performance for a wide range of LLM architectures at scale (e.

, GPT-OSS, Kimi K2.

5, DeepSeek V3.

Inference engines support: Implement novel speculative decoding architectures, optimise components of various LLM designs (dense/MoE, autoregressive/parallel), and contribute to open-source inference engines.

Low Precision Training & Inference: Design and productionise low-precision (FP8, NVFP4/MXFP4) training and inference pipelines with measurable gains in throughput and cost-efficiency.

We expect you to have: A profound understanding of theoretical foundations of machine learning and transformer architecture.

Experience profiling GPU workloads using Nsight,.

¿Te interesa? Aplicá ahora

Aplicar Ahora