nilirl | Hacker News

Comment by nilirl | original | Launch HN: Parsewise (YC P25) – Reason Across Documents with an API

[−]nilirl · 2026-07-02 Thu 04:20 UTC · link

Does this also extract semantic relationships and data dependencies between fields?

In the past I'd built an internal tool that transforms insurance PDFs to structured data. I wanted to extract explicit data dependencies between fields to perform validation.

Insurance forms can sometimes have 30-40 pages and they can have fields on page 40 that depend on fields on page 4 with a few nested if conditions. Would Parsewise be able to extract those relationships?

If yes, how do you do it for large documents?

[−]gergelycsegzi · 2026-07-02 Thu 07:09 UTC · link

Yes, we do it by having multiple stages to the pipeline. First we would extract the independent data points (from say both page 4 and 40) and a second pass step establishes relationship (we call this resolution).

On the scale aspect, because we go in multiple passes, we break the scope into small enough pieces and then build it back up in a later step. Iirc the largest document I've seen a customer use was over 1k pages.

There are more complex data dependency scenarios where we find that the data that's extracted and combined (e.g. from page 4 and 40), needs to then be further transformed in different ways (e.g. having an evaluation and a clarification outcome at the end). To make these be aligned in value we are soon releasing a feature for what we call derived agents.