[−]gergelycsegzi · 2026-07-02 Thu 07:09 UTC ·
link
Yes, we do it by having multiple stages to the pipeline. First we would extract the independent data points (from say both page 4 and 40) and a second pass step establishes relationship (we call this resolution).
On the scale aspect, because we go in multiple passes, we break the scope into small enough pieces and then build it back up in a later step. Iirc the largest document I've seen a customer use was over 1k pages.
There are more complex data dependency scenarios where we find that the data that's extracted and combined (e.g. from page 4 and 40), needs to then be further transformed in different ways (e.g. having an evaluation and a clarification outcome at the end). To make these be aligned in value we are soon releasing a feature for what we call derived agents.
On the scale aspect, because we go in multiple passes, we break the scope into small enough pieces and then build it back up in a later step. Iirc the largest document I've seen a customer use was over 1k pages.
There are more complex data dependency scenarios where we find that the data that's extracted and combined (e.g. from page 4 and 40), needs to then be further transformed in different ways (e.g. having an evaluation and a clarification outcome at the end). To make these be aligned in value we are soon releasing a feature for what we call derived agents.