Citation Format¶
pydantic-ai-provenance uses a compact inline citation format: [REF|key].
Tag syntax¶
| Pattern | Meaning |
|---|---|
[REF|d_1] |
Claim supported by data source d_1 |
[REF|d_1\|d_2] |
Claim supported by both d_1 and d_2 |
[REF|a_1] |
Claim derived from subagent output a_1 |
Tags are placed immediately after the sentence or phrase they support:
Key prefixes¶
| Prefix | Source type | Example |
|---|---|---|
d_ |
DATA_READ node — raw data returned by a source tool |
d_1, d_2 |
a_ |
FINAL_OUTPUT node — output from a subagent |
a_1, a_2 |
Keys are assigned sequentially from a shared counter on the ProvenanceStore. The same counter is used for both d_ and a_ keys, so the sequence is globally unique within a session.
How keys are injected¶
When a source tool (listed in source_tools) returns a result, ProvenanceCapability wraps it before it reaches the LLM:
The block header [REF|d_1] tells the LLM which key to use when citing this source. Any inline [REF|…] tags already present in the tool result body are stripped to avoid duplication.
Subagent outputs are wrapped the same way before being returned to the parent agent.
Citation instructions¶
By default (inject_citation_instructions=True), the capability injects formatting instructions into the agent's system prompt automatically. This tells the LLM:
- When to cite (facts, statistics, findings, paraphrased arguments).
- When not to cite (reasoning, connective language, common knowledge).
- To prefer original source keys (
d_*) over subagent keys (a_*) when both are available. - Never to invent keys or cite on repeat mentions.
To manage the prompt yourself, set inject_citation_instructions=False and include the instructions in your own system prompt.
Resolving a key¶
key = "d_1"
node_id = store.resolve_citation(key)
node = store.graph.nodes[node_id]
print(node.data.get("file_path"))
Or use citation_summary() for a human-readable overview of all registered keys: