Forge: Turn Unstructured Documents Into AI Ready Datasets

Apr 23, 2026

2min read

Forge: Turn Unstructured Documents Into AI Ready Datasets

90% of the world’s data is unstructured, and most of it is invisible to the AI systems built to use it. We started this journey because we believe today’s AI is backwards. Most AI systems are static, expensive, and slow to change. Most of the burden is placed on end users and enterprises to figure out how to make AI work for them, instead of the other way around.

Data is a great example of this inertia. Frontier AI systems unlock intelligence only when data is in very precise formats that require arcane structures, completeness, and quality. However, the real-world data that matters most is never ready on a silver platter. The largest knowledge stores of humanity are locked in PDFs, OCR documents, spreadsheets, docs, slide decks, and email threads. Formats designed for people, not machines.

Today we are changing that. We're introducing Forge, a new feature that extends Adaptive data to the world of unstructured and real-world data. No upstream conversion. No bespoke pipelines. No schema work.

Where Your Most Valuable Data Has Been Hiding

Every organization sits on years of accumulated knowledge buried in documents. That knowledge should be powering decisions, training models, and shaping products. Not gathering dust because the format was wrong.

Earnings reports, legal contracts, insurance claims, medical records, procurement forms, research papers. This is where critical knowledge lives. And these formats break the moment you try to leverage them with AI. Inconsistent layouts, embedded tables, merged cells, handwriting, footnotes. Teams that recognize this problem end up building and maintaining custom preprocessing infrastructure just to make their data usable. It's slow, brittle, and expensive.

All Your Data Unlocked

Forge accepts documents in their raw, native form—no prep required. The entire layer of parsing libraries, layout detection, and schema reconciliation you'd otherwise assemble is gone. You bring the documents; Forge handles extraction, structuring, and transformation, then maps the output into a form Adaptive Data accepts. Your team focuses on what to do with the data, not how to free it.

Once we adapt your data, we observe 82% quality gains. Blueprint is our specification layer for steering data quality and behavior, and we have expanded language coverage to 242 languages. Adaptive data lets teams move from raw documents to a precisely shaped dataset step by step, selecting each capability as they need it. Define the properties you care about. The platform handles the rest.

Adaptive Data is the first of three pillars guiding our work over the next year, alongside Adaptive Interfaces and Adaptive Intelligence. We’ll be launching programs across each pillar, all designed to build AI systems that continuously learn and improve in deployment. Join our waitlist to stay involved.

Forge is available now for all Adaptive Data users. Unlock your documents. Evolve your data.

Try Forge Talk to an AI Expert

Author

Sudip Roy, Co-founder

Date

Apr 23, 2026