Methodology

How Decisis answers a research question.

This page is the durable description of what the system does, how it fails safely, and what authority is in the corpus across income tax, transfer tax, and cross-border tax research — equal use cases for the CPAs, tax attorneys, estate planners, and wealth planners who rely on it. It is the page that does not exist at competitive AI tools. Its presence is the point.

01

Source acquisition

Title 26 U.S.C. is parsed from Office of the Law Revision Counsel USLM XML at each release point. 26 C.F.R. is pulled from the eCFR Versioner API. IRB items (Rev. Ruls., Rev. Procs., Notices, Announcements) are scraped weekly from irs.gov/irb. PLRs, TAMs, CCAs, GCMs from the written-determinations index. Court opinions via CourtListener bulk plus a licensed citator for negative-treatment depth. State estate-tax codes for the 17 estate-tax jurisdictions. Treaties and uniform acts from primary publishers.

02

Section-aware chunking

Tax law is hierarchical. Naive 512-token chunking destroys cite-ability. Every chunk preserves its full path —/usc/26/A/11/A/2010/c/5/A/ii— and carries a citation header so retrieval and synthesis both know what the chunk is.

03

Hybrid retrieval

Every query is rewritten by a classifier (Sonnet 4.6) into a query plan with jurisdiction, document-type, and temporal filters. Three retrieval methods run in parallel: dense semantic via Voyage-3-large, lexical via Postgres FTS, and citation lookup for any explicit references. Results merge through reciprocal rank fusion, get reranked by Cohere v3, then expanded one hop through the citation graph. Top 10–20 chunks pass to synthesis.

04

Synthesis with citation enforcement

Synthesis runs on Claude Opus 4.7 with temperature=0 for reproducibility. The system prompt is explicit: no substantive claim without an inline pinpoint cite drawn from the provided sources block; sources block is the only authority; refuse rather than fabricate; surface conflicts explicitly; flag non-precedential authority; distinguish statute from regulation from ruling from case.

05

Post-synthesis verification

Every [Source N] marker is verified to resolve to a real chunk in the provided block. Any inline citation in the synthesized text is run through the citation parser and looked up againstcorpus_documents. If a citation cannot be resolved, the synthesis is flagged for regeneration; this catches the rare model hallucination that slips through prompt enforcement.

06

Refusal layers

Two refusal mechanisms. Retrieval-side: if the top reranked chunks score below 0.3 normalized, synthesis is skipped entirely and the practitioner sees a refusal with suggested next-step authorities. Synthesis-side: the prompt instructs the model to refuse when the sources block is insufficient. Refusal rate on our out-of-corpus eval set: 96.8%.

07

Currency tracking

Every corpus document carries effective date, supersession status, amendment history, and (where applicable) superseded_by reference. Retrieval filters on currency before reranking unless the query is explicitly historical. OBBBA repealed the TCJA sunset; the system treats pre-2026 sunset analysis as historical context, not current authority, automatically.

08

Audit and reproducibility

Every research session writes a research_session row, every retrieval writes research_retrieval rows, every model call writes aresearch_synthesis row with the full prompt, prompt hash, model version, and generated text. A practitioner running the same query against the same corpus state gets the same answer. The firm's audit log retains everything per state-bar policy.

09

Evaluation

Continuous eval against 200+ hand-curated estate/tax research questions with attorney-verified ideal citations, plus a 100-question out-of-corpus refusal set, a 50-question conflict-surfacing set, and a 50-question currency-accuracy set. Every change to retrieval, reranking, prompt, or chunking is gated on the eval. Regressions block merge.

Last reviewed by Director of Legal Content · 2026-05-10