API Reference¶

Public symbols live in their respective submodules. Import directly from the submodule (e.g. from pydantic_ai_provenance.capability import ProvenanceCapability).

Core¶

`ProvenanceCapability`¶

@dataclass
class ProvenanceCapability(AbstractCapability):
    source_tools: list[str] = []
    agent_name: str = "agent"
    inject_citation_instructions: bool = True

pydantic-ai AbstractCapability that hooks into agent lifecycle events to build the provenance DAG.

Parameters

Name	Type	Default	Description
`source_tools`	`list[str]`	`[]`	Tool names whose results are raw data sources. Each call gets a unique `d_*` citation key.
`agent_name`	`str`	`"agent"`	Label used in graph nodes.
`inject_citation_instructions`	`bool`	`True`	Auto-inject citation format instructions into the system prompt.

Properties

Name	Type	Description
`store`	`ProvenanceStore`	Available after the run starts. Raises `RuntimeError` if accessed before.

Methods

async def verify(
    self,
    text: str,
    *,
    claim_context_chars: int = 720,
    source_max_chars: int = 96_000,
    source_chunk_chars: int = 1_200,
    source_chunk_stride: int = 600,
    source_max_chunks: int = 400,
    min_score: float = 0.3,
    max_keys_per_tag: int = 2,
) -> CitationVerificationReport

Convenience wrapper around verify_citations using this capability's shared store. Safe to call on any capability in a multi-agent setup — all capabilities share the same underlying store.

`ProvenanceStore`¶

Central registry shared across all agents in a session.

store.register_data_source(node_id: str) -> str
store.register_agent_output(node_id: str) -> str
store.resolve_citation(key: str) -> str | None
store.citation_key_for_node(node_id: str) -> str | None
store.citation_summary() -> dict[str, dict]
store.to_html(title: str = "Provenance Graph") -> str
store.open_in_browser(title: str = "Provenance Graph") -> None
store.to_mermaid() -> str
store.to_dot(graph_name: str = "provenance") -> str
store.to_json() -> dict[str, Any]
store.to_json_str(indent: int = 2) -> str

Method	Returns	Description
`register_data_source(node_id)`	`str`	Assign a `d_*` key to a `DATA_READ` node.
`register_agent_output(node_id)`	`str`	Assign an `a_*` key to a `FINAL_OUTPUT` node.
`resolve_citation(key)`	`str \\| None`	Look up the node ID for a citation key.
`citation_key_for_node(node_id)`	`str \\| None`	Reverse lookup: node ID → citation key.
`citation_summary()`	`dict`	Human-readable map of every registered key → node metadata.

Graph primitives¶

`NodeType`¶

class NodeType(StrEnum):
    INPUT          = "input"
    DATA_READ      = "data_read"
    TOOL_CALL      = "tool_call"
    TOOL_RESULT    = "tool_result"
    MODEL_REQUEST  = "model_request"
    MODEL_RESPONSE = "model_response"
    AGENT_RUN      = "agent_run"
    FINAL_OUTPUT   = "final_output"

`ProvenanceNode`¶

@dataclass
class ProvenanceNode:
    id: str
    type: NodeType
    label: str
    agent_name: str
    run_id: str
    timestamp: datetime
    data: dict[str, Any]

    @classmethod
    def create(cls, type, label, agent_name, run_id, **data) -> ProvenanceNode: ...

`ProvenanceEdge`¶

@dataclass
class ProvenanceEdge:
    source_id: str
    target_id: str
    label: str = ""

`ProvenanceGraph`¶

graph.add_node(node)
graph.add_edge(source_id, target_id, label="")
graph.predecessors(node_id) -> list[ProvenanceNode]
graph.successors(node_id) -> list[ProvenanceNode]
graph.ancestors(node_id) -> set[str]
graph.final_output_nodes() -> list[ProvenanceNode]
graph.source_nodes() -> list[ProvenanceNode]
graph.all_paths_to_sources(node_id) -> list[list[ProvenanceNode]]

Attribution¶

`attribute_output`¶

def attribute_output(
    store: ProvenanceStore,
    output_node_id: str | None = None,
) -> AttributionResult

Full path-level attribution for one FINAL_OUTPUT node. Uses the first FINAL_OUTPUT node if output_node_id is None.

`attribute_all_outputs`¶

def attribute_all_outputs(store: ProvenanceStore) -> list[AttributionResult]

Attribution for every FINAL_OUTPUT node in the graph.

`AttributionResult`¶

@dataclass
class AttributionResult:
    output_node: ProvenanceNode
    sources: list[ProvenanceNode]
    paths: list[AttributionPath]

    @property
    def source_labels(self) -> list[str]: ...
    def summary(self) -> str: ...

`AttributionPath`¶

@dataclass
class AttributionPath:
    source: ProvenanceNode
    path: list[ProvenanceNode]

    @property
    def source_label(self) -> str: ...
    @property
    def hop_count(self) -> int: ...

Citations¶

`parse_citations`¶

def parse_citations(text: str) -> list[CitationRef]

Extract all [REF|key1|key2|...] tags.

`citation_tag_spans`¶

def citation_tag_spans(text: str) -> list[tuple[int, int, CitationRef]]

Same as parse_citations but includes (start, end) character positions.

`strip_inline_citation_tags`¶

def strip_inline_citation_tags(text: str) -> str

Remove all [REF|…] tags.

`strip_inline_citation_tags_preserve_leading_ref_header`¶

def strip_inline_citation_tags_preserve_leading_ref_header(text: str) -> str

Remove inline tags but keep an opening [REF|…] block header on the first line.

`CitationRef`¶

@dataclass
class CitationRef:
    refs: list[str]   # e.g. ["d_1", "a_2"]
    raw: str          # e.g. "[REF|d_1|a_2]"

Verification¶

`verify_citations`¶

async def verify_citations(
    text: str,
    store: ProvenanceStore,
    *,
    claim_context_chars: int = 720,
    source_max_chars: int = 96_000,
    source_chunk_chars: int = 1_200,
    source_chunk_stride: int = 600,
    source_max_chunks: int = 400,
    min_score: float = 0.3,
    max_keys_per_tag: int = 2,
) -> CitationVerificationReport

Steps 1 (key sanitisation) + 2 (TF-IDF overlap). Returns a CitationVerificationReport. In most cases prefer calling await provenance.verify(text) on the capability instead.

`strip_unresolvable_citation_keys`¶

def strip_unresolvable_citation_keys(
    text: str,
    store: ProvenanceStore,
) -> tuple[str, list[CitationKeyFilterResult]]

Step 1 only. Returns (sanitized_text, filter_records).

`claim_source_tfidf_cosine`¶

def claim_source_tfidf_cosine(
    claim_text: str,
    source_text: str,
    *,
    max_source_chars: int = 96_000,
    chunk_chars: int = 1_200,
    chunk_stride: int = 600,
    max_chunks: int = 400,
) -> float

Maximum TF-IDF cosine similarity between a claim and sliding windows of a source. Returns a value in [0, 1].

`context_before_span`¶

def context_before_span(
    text: str,
    start: int,
    *,
    max_chars: int = 720,
    max_sentences: int = 1,
) -> str

Extract claim context from text immediately before position start.

`entailment_agent`¶

def entailment_agent(model: Any, *, instructions: str | None = None) -> Agent

Build a pydantic-ai agent for Step 3 LLM-based entailment scoring.

`refine_claim_source_similarities`¶

def refine_claim_source_similarities(
    records: list[ClaimSourceSimilarity],
    *,
    max_top_n_keys_per_tag: int = 2,
    min_score_for_shared_source: float = 0.3,
) -> list[ClaimSourceSimilarity]

Filter similarity records: keep top-N keys per tag and drop weak sources.

`CitationVerificationReport`¶

@dataclass
class CitationVerificationReport:
    original_text: str
    text_with_verified_citations: str
    claim_source_similarities: list[ClaimSourceSimilarity]

Visualization¶

All visualization methods are on ProvenanceStore (listed in the ProvenanceStore method block above). The to_json() dict schema is:

Node — citation_key is only present when the node has a registered citation key.

{
  "id": "...",
  "type": "data_read",
  "label": "[source] Tool: read_file",
  "agent_name": "summariser",
  "run_id": "...",
  "timestamp": "2024-01-01T00:00:00+00:00",
  "data": { "file_path": "report.txt" },
  "citation_key": "d_1"
}

Edge

{
  "source": "<node_id>",
  "target": "<node_id>",
  "label": "cited_in"
}

See Visualization guide for full details.

API Reference¶

Core¶

ProvenanceCapability¶

ProvenanceStore¶

Graph primitives¶

NodeType¶

ProvenanceNode¶

ProvenanceEdge¶

ProvenanceGraph¶

Attribution¶

attribute_output¶

attribute_all_outputs¶

AttributionResult¶

AttributionPath¶

Citations¶

parse_citations¶

citation_tag_spans¶

strip_inline_citation_tags¶

strip_inline_citation_tags_preserve_leading_ref_header¶

CitationRef¶