We are building a full-stack AI cloud platform that supports developers and enterprises from data and model training through to production deployment, without the cost and complexity of building large in-house AI/ML infrastructure.
Built by engineers, for engineers.
From large-scale GPU orchestration to inference optimization, we own the hard problems across compute, storage, networking and applied AI.
Listed on Nasdaq (NBIS) and headquartered in Amsterdam, we have a global footprint with R&D hubs across Europe, the UK, North America and Israel.
Our team of 1,500+ includes hundreds of engineers with deep expertise across hardware, software and AI R&D.
The roleToken Factory is a part of Nebius Cloud, one of the world's largest GPU clouds, running tens of thousands of GPUs.
We are building a high-performance inference and fine-tuning platform designed to push foundation models to their hardware limits.
Our mission is to maximize throughput, minimise latency, and optimise cost-per-token across tens of thousands of GPUs.
Some directions we are currently working on, and which you can be a part of: Inference Optimization: Identifying LLM inference bottlenecks to drive production speedups.
Squeezing the maximum performance for a wide range of LLM architectures at scale (e.
, GPT-OSS, Kimi K2.
5, DeepSeek V3.
Inference engines support: Implement novel speculative decoding architectures, optimise components of various LLM designs (dense/MoE, autoregressive/parallel), and contribute to open-source inference engines.
Low Precision Training & Inference: Design and productionise low-precision (FP8, NVFP4/MXFP4) training and inference pipelines with measurable gains in throughput and cost-efficiency.
We expect you to have: A profound understanding of theoretical foundations of machine learning and transformer architecture.
Experience profiling GPU workloads using Nsight,.