Skip to content

API Reference

Public symbols live in their respective submodules. Import directly from the submodule (e.g. from pydantic_ai_provenance.capability import ProvenanceCapability).


Core

ProvenanceCapability

@dataclass
class ProvenanceCapability(AbstractCapability):
    source_tools: list[str] = []
    agent_name: str = "agent"
    inject_citation_instructions: bool = True

pydantic-ai AbstractCapability that hooks into agent lifecycle events to build the provenance DAG.

Parameters

Name Type Default Description
source_tools list[str] [] Tool names whose results are raw data sources. Each call gets a unique d_* citation key.
agent_name str "agent" Label used in graph nodes.
inject_citation_instructions bool True Auto-inject citation format instructions into the system prompt.

Properties

Name Type Description
store ProvenanceStore Available after the run starts. Raises RuntimeError if accessed before.

Methods

async def verify(
    self,
    text: str,
    *,
    claim_context_chars: int = 720,
    source_max_chars: int = 96_000,
    source_chunk_chars: int = 1_200,
    source_chunk_stride: int = 600,
    source_max_chunks: int = 400,
    min_score: float = 0.3,
    max_keys_per_tag: int = 2,
) -> CitationVerificationReport

Convenience wrapper around verify_citations using this capability's shared store. Safe to call on any capability in a multi-agent setup — all capabilities share the same underlying store.


ProvenanceStore

Central registry shared across all agents in a session.

store.register_data_source(node_id: str) -> str
store.register_agent_output(node_id: str) -> str
store.resolve_citation(key: str) -> str | None
store.citation_key_for_node(node_id: str) -> str | None
store.citation_summary() -> dict[str, dict]
store.to_html(title: str = "Provenance Graph") -> str
store.open_in_browser(title: str = "Provenance Graph") -> None
store.to_mermaid() -> str
store.to_dot(graph_name: str = "provenance") -> str
store.to_json() -> dict[str, Any]
store.to_json_str(indent: int = 2) -> str
Method Returns Description
register_data_source(node_id) str Assign a d_* key to a DATA_READ node.
register_agent_output(node_id) str Assign an a_* key to a FINAL_OUTPUT node.
resolve_citation(key) str \| None Look up the node ID for a citation key.
citation_key_for_node(node_id) str \| None Reverse lookup: node ID → citation key.
citation_summary() dict Human-readable map of every registered key → node metadata.

Graph primitives

NodeType

class NodeType(StrEnum):
    INPUT          = "input"
    DATA_READ      = "data_read"
    TOOL_CALL      = "tool_call"
    TOOL_RESULT    = "tool_result"
    MODEL_REQUEST  = "model_request"
    MODEL_RESPONSE = "model_response"
    AGENT_RUN      = "agent_run"
    FINAL_OUTPUT   = "final_output"

ProvenanceNode

@dataclass
class ProvenanceNode:
    id: str
    type: NodeType
    label: str
    agent_name: str
    run_id: str
    timestamp: datetime
    data: dict[str, Any]

    @classmethod
    def create(cls, type, label, agent_name, run_id, **data) -> ProvenanceNode: ...

ProvenanceEdge

@dataclass
class ProvenanceEdge:
    source_id: str
    target_id: str
    label: str = ""

ProvenanceGraph

graph.add_node(node)
graph.add_edge(source_id, target_id, label="")
graph.predecessors(node_id) -> list[ProvenanceNode]
graph.successors(node_id) -> list[ProvenanceNode]
graph.ancestors(node_id) -> set[str]
graph.final_output_nodes() -> list[ProvenanceNode]
graph.source_nodes() -> list[ProvenanceNode]
graph.all_paths_to_sources(node_id) -> list[list[ProvenanceNode]]

Attribution

attribute_output

def attribute_output(
    store: ProvenanceStore,
    output_node_id: str | None = None,
) -> AttributionResult

Full path-level attribution for one FINAL_OUTPUT node. Uses the first FINAL_OUTPUT node if output_node_id is None.

attribute_all_outputs

def attribute_all_outputs(store: ProvenanceStore) -> list[AttributionResult]

Attribution for every FINAL_OUTPUT node in the graph.

AttributionResult

@dataclass
class AttributionResult:
    output_node: ProvenanceNode
    sources: list[ProvenanceNode]
    paths: list[AttributionPath]

    @property
    def source_labels(self) -> list[str]: ...
    def summary(self) -> str: ...

AttributionPath

@dataclass
class AttributionPath:
    source: ProvenanceNode
    path: list[ProvenanceNode]

    @property
    def source_label(self) -> str: ...
    @property
    def hop_count(self) -> int: ...

Citations

parse_citations

def parse_citations(text: str) -> list[CitationRef]

Extract all [REF|key1|key2|...] tags.

citation_tag_spans

def citation_tag_spans(text: str) -> list[tuple[int, int, CitationRef]]

Same as parse_citations but includes (start, end) character positions.

strip_inline_citation_tags

def strip_inline_citation_tags(text: str) -> str

Remove all [REF|…] tags.

strip_inline_citation_tags_preserve_leading_ref_header

def strip_inline_citation_tags_preserve_leading_ref_header(text: str) -> str

Remove inline tags but keep an opening [REF|…] block header on the first line.

CitationRef

@dataclass
class CitationRef:
    refs: list[str]   # e.g. ["d_1", "a_2"]
    raw: str          # e.g. "[REF|d_1|a_2]"

Verification

verify_citations

async def verify_citations(
    text: str,
    store: ProvenanceStore,
    *,
    claim_context_chars: int = 720,
    source_max_chars: int = 96_000,
    source_chunk_chars: int = 1_200,
    source_chunk_stride: int = 600,
    source_max_chunks: int = 400,
    min_score: float = 0.3,
    max_keys_per_tag: int = 2,
) -> CitationVerificationReport

Steps 1 (key sanitisation) + 2 (TF-IDF overlap). Returns a CitationVerificationReport. In most cases prefer calling await provenance.verify(text) on the capability instead.

strip_unresolvable_citation_keys

def strip_unresolvable_citation_keys(
    text: str,
    store: ProvenanceStore,
) -> tuple[str, list[CitationKeyFilterResult]]

Step 1 only. Returns (sanitized_text, filter_records).

claim_source_tfidf_cosine

def claim_source_tfidf_cosine(
    claim_text: str,
    source_text: str,
    *,
    max_source_chars: int = 96_000,
    chunk_chars: int = 1_200,
    chunk_stride: int = 600,
    max_chunks: int = 400,
) -> float

Maximum TF-IDF cosine similarity between a claim and sliding windows of a source. Returns a value in [0, 1].

context_before_span

def context_before_span(
    text: str,
    start: int,
    *,
    max_chars: int = 720,
    max_sentences: int = 1,
) -> str

Extract claim context from text immediately before position start.

entailment_agent

def entailment_agent(model: Any, *, instructions: str | None = None) -> Agent

Build a pydantic-ai agent for Step 3 LLM-based entailment scoring.

refine_claim_source_similarities

def refine_claim_source_similarities(
    records: list[ClaimSourceSimilarity],
    *,
    max_top_n_keys_per_tag: int = 2,
    min_score_for_shared_source: float = 0.3,
) -> list[ClaimSourceSimilarity]

Filter similarity records: keep top-N keys per tag and drop weak sources.

CitationVerificationReport

@dataclass
class CitationVerificationReport:
    original_text: str
    text_with_verified_citations: str
    claim_source_similarities: list[ClaimSourceSimilarity]

Visualization

All visualization methods are on ProvenanceStore (listed in the ProvenanceStore method block above). The to_json() dict schema is:

Nodecitation_key is only present when the node has a registered citation key.

{
  "id": "...",
  "type": "data_read",
  "label": "[source] Tool: read_file",
  "agent_name": "summariser",
  "run_id": "...",
  "timestamp": "2024-01-01T00:00:00+00:00",
  "data": { "file_path": "report.txt" },
  "citation_key": "d_1"
}

Edge

{
  "source": "<node_id>",
  "target": "<node_id>",
  "label": "cited_in"
}

See Visualization guide for full details.