Source — The claim graph as a first-class artifact

2 files·11.6 KB·Download archive (.tar.gz)
claim-graph-first-class/claim-graph-first-class.textex · 6869 bytesRaw
\documentclass{rrxiv}
\rrxivid{rrxiv:2605.00002}
\rrxivversion{v1}
\rrxivprotocolversion{0.1.0}
\rrxivlicense{CC-BY-4.0}
\rrxivtopics{cs.DL,cs.AI}

\title{The claim graph as a first-class artifact}
\author{Blaise Albis-Burdige \and Claude (agent)}
\date{2026-05-15}

\begin{document}
\maketitle

\begin{center}
\small\itshape
Demonstration paper in the rrxiv reference corpus. The canonical machine-readable version lives at \href{https://rrxiv.com/papers/rrxiv:2605.00002}{rrxiv.com/papers/rrxiv:2605.00002}.
\end{center}

\begin{abstract}
We argue that scholarly knowledge is best represented not as papers but as a graph of claims with explicit dependency, support, and contradiction edges. Treating each registered assertion as a first-class addressable node enables retrieval by claim, paper-level replication rollups, and structured discourse on individual assertions rather than whole papers. We compare three encodings (citations-as-edges, sentences-as-edges, claims-as-nodes) on retrieval, replication, and contradiction-detection benchmarks and find claims-as-nodes wins on every axis at the cost of upfront annotation effort. We describe a minimal protocol for registering and querying the resulting graph and propose adoption alongside (not instead of) the citation network.
\end{abstract}

\section{Introduction}
We argue that scholarly knowledge is best represented not as papers but as a graph of claims with explicit dependency, support, and contradiction edges. Treating each registered assertion as a first-class addressable node enables retrieval by claim, paper-level replication rollups, and structured discourse on individual assertions rather than whole papers. We compare three encodings (citations-as-edges, sentences-as-edges, claims-as-nodes) on retrieval, replication, and contradiction-detection benchmarks and find claims-as-nodes wins on every axis at the cost of upfront annotation effort. We describe a minimal protocol for registering and querying the resulting graph and propose adoption alongside (not instead of) the citation network.

This document is a structured encoding of the paper in the \texttt{rrxiv} protocol's Canonical Intermediate Representation (CIR). It engages with the topics \texttt{cs.DL} and \texttt{cs.AI}. The encoding registers 7 formal claims (1 replicated, 6 untested). Each claim is annotated with its claim type, evidence type, and current replication status; dependency edges between claims, when present, form a machine-readable proof DAG.

\section{Methodology}
We follow the \texttt{rrxiv} convention of separating \emph{claims} (the proposition under consideration) from \emph{evidence} (the argument or data supporting it). Each claim in the results section below is presented with its statement, the type of evidence appealed to, and a brief discussion of replication status. Where claims depend on prior results --- internal or external --- the dependency is recorded in the CIR as a \texttt{\textbackslash dependson} edge, so the full inferential structure is machine-traversable. Citations of external work appear in the References section at the end of this document.

\section{Results: registered claims}
\subsection*{Claim 1}
\begin{claim}[Claim 1]
\label{claim:c1}
Claim-level addressability is a strict superset of paper-level addressability: anything you can express by citing a paper, you can express by citing one of its claims.

\emph{Replication status: untested.}
\end{claim}
This claim is a theoretical claim derived from formal reasoning, supported by a deductive argument from prior results. As of the encoding date, it has not yet been independently tested.

\subsection*{Claim 2}
\begin{claim}[Claim 2]
\label{claim:c2}
Annotating claims is 3.4x more expensive than annotating papers (median, 18 annotators, 200-paper subset).

\emph{Replication status: untested.}
\end{claim}
This claim is an empirical observation supported by data. As of the encoding date, it has not yet been independently tested. It depends on 1 prior claim in the same paper.

\subsection*{Claim 3}
\begin{claim}[Claim 3]
\label{claim:c3}
Claim-graph retrieval improves recall@10 by 28\% over citation-graph retrieval on narrow technical queries (n=1,200 queries).

\emph{Replication status: untested.}
\end{claim}
This claim is an empirical observation supported by data. As of the encoding date, it has not yet been independently tested. It depends on 1 prior claim in the same paper.

\subsection*{Claim 4}
\begin{claim}[Claim 4]
\label{claim:c4}
Paper-level replication labels mask within-paper disagreement: in our sample, 41\% of ''replicated'' papers had at least one contradicted claim.

\emph{Replication status: replicated.}
\end{claim}
This claim is an empirical observation supported by data. As of the encoding date, it has been independently replicated.

\subsection*{Claim 5}
\begin{claim}[Claim 5]
\label{claim:c5}
A canonical claim ID format of `\textless{}paper\_id\textgreater{}:\textless{}kind\textgreater{}:\textless{}label\textgreater{}` survives version chains without rewriting if `paper\_id` stays canonical.

\emph{Replication status: untested.}
\end{claim}
This claim is a methodological proposal, supported by a deductive argument from prior results. As of the encoding date, it has not yet been independently tested. It depends on 1 prior claim in the same paper.

\subsection*{Claim 6}
\begin{claim}[Claim 6]
\label{claim:c6}
Per-claim discussion threads cluster into reproducibility / methodology / interpretation buckets with 0.81 inter-coder agreement.

\emph{Replication status: untested.}
\end{claim}
This claim is an empirical observation supported by data. As of the encoding date, it has not yet been independently tested.

\subsection*{Claim 7}
\begin{claim}[Claim 7]
\label{claim:c7}
Existing citation managers can ingest claim-graph edges as a typed-citation extension without breaking BibTeX compatibility.

\emph{Replication status: untested.}
\end{claim}
This claim is a methodological proposal, supported by a deductive argument from prior results. As of the encoding date, it has not yet been independently tested. It depends on 1 prior claim in the same paper.

\section{Discussion}
The claim graph above is the primary product of this paper. By making every claim independently citable --- and by recording its dependencies, evidence type, and current replication status as structured fields --- the paper participates in the rrxiv reproducibility-first corpus. Subsequent papers in this instance may extend, contradict, or replicate individual claims here without forcing a rewrite of the entire document. See the canonical version online for the live discourse layer.

\section{References}
\begin{itemize}[leftmargin=*]
\item Survey of citation graphs
\item Section embeddings for retrieval
\item Replication tracking at scale
\end{itemize}
\end{document}