Books and PDFs are the first-class input. Upload and page persistence arrive in Phase 2.
Source documents
Foundational
Extraction runs
Versioned
Runs are explicit records with provenance and rerun boundaries. Execution wiring starts in the worker runtime.
Review boundary
Protected
Reviewed claims remain the source of truth. Publication and downstream payloads stay separate.
What this shell is showing
Hedda starts from a conservative source-first model. Documents, evidence spans, extraction runs, and reviewed claims are distinct concepts because the system must preserve provenance and rerun history instead of flattening everything into one event row.
- Documents remain visible even if the source file later goes missing.
- Extraction runs are versioned records, not silent in-place reruns.
- Published outputs arrive only after review and grouping decisions.
Road ahead
- Phase 2 adds document ingest and page persistence.
- Phase 3 adds structured draft extraction with evidence spans.
- Phase 4 adds the review queue and claim editing surfaces.
- Later phases add normalization, event grouping, and publication.
Sample inspection paths
Once the Phase 1 sample has been seeded into the local Hedda database, these routes expose the canonical document and extraction-run records directly from the database.