The data you'll work with is messy, incomplete, and arriving in every format imaginable, from structured ERP exports to blurry photos of warehouse labels.
Your job will be to own it, understanding what matters, what’s wrong, and what’s missing, closely collaborating with the DS/ML owner.
This role starts in Data Engineering but will grow with you toward data science and analytics as the platform scales.
Tasks In this role you will: Build Python-based data pipelines on AWS, from early foundations to a robust maintainable system, to ingest, validate, transform, and analyze complex chemical data.
Parse and extract structured data from messy sources: fragmented proprietary databases, PDFs, photos, free-form text.
Understand why half the fields are blank, why the same chemical has 15 different names, and what to do about it.
Help define our data infrastructure while it's still taking shape.
The tools, patterns, and practices we establish now will stick around as the platform grows.
Use AI tools (we use Claude Code) to structure and accelerate your work.
Contribute to design discussions, code reviews, and retrospectives.
We are a team of six, your opinions will matter.