Diffbot Data Operations in Palo Alto, California

About Us At Diffbot, we believe that access to structured data will be the critical resource for the coming wave of intelligent applications—for everything from new app experiences, to search assistants to enterprise business intelligence.

That’s why our team of world-class AI engineers is building a machine that can autonomously synthesize the web—the world’s largest database of structured information—and looking for others to join this mission.

Data Operations Critical to our efforts is “Databot”: the largest archive of human-curated web data in existence. Databot has tens of millions of records from hundreds of thousands of unique web sites—and it’s just getting started.

We’re looking for a dynamic and expansive individual to own Diffbot’s data efforts. Data is our lifeblood, and you’ll be responsible for fueling our machine learning engineers with unique, useful and—most importantly—pristine data.


• Manage the training data repository and evaluation system, the training of Diffbot models, and deployment into production.

• Monitor data quality through a rigorous system of quality assurance checks and tools.

• Determine the most appropriate and useful data for new projects. Acquire new data sets through creative means.

• Oversee and grow our network of data trainers. Source, onboard and manage new contractors to ensure a consistent flow of new data.


• Significant experience with web technologies / web development

• Familiarity with machine learning concepts, evaluation measures

• Experience managing teams of engineers and contractors

Preferred Skills

• Proficiency in Python, Linux

• Research experience in machine learning, particularly active learning

• Experience with big data scalability

To apply, please submit:

• Resume / LinkedIn / homepage

• A short self-introduction (describe your interest and area of speciality)

• How did you hear about Diffbot?