The Instrument
The information interferometer measures where structural information lives in an embedding space. It works by analogy to optical interferometry: split the representation, send each arm through a different projection, read structure from the interference fringe.
where K1 = embeddings with PC1 removed, K0 = full embeddings, and gap = classification accuracy minus null-mean. Δ > 0 is PENUMBRA: structural information lives outside the dominant variance direction. Δ < 0 is ANTI: the feature aligns with PC1. Δ ≈ 0 is FLAT.
The critical finding, replicated across every encoder tested: when embeddings are projected onto PC1 alone, all structural classification collapses to chance. PC1 carries zero structural information. The brightest signal in the room is structurally empty. Measured
Multi-Encoder Convergence
A binary feature is classified as STRUCTURAL only when two or more independent encoders agree. This criterion was forced by an adversarial finding: word count reads as PENUMBRA on RoBERTa (Δ = +0.093) and ANTI on BERT (Δ = −0.117). Same feature, same data, opposite verdicts. Single-encoder readings are geometrically accurate but semantically ambiguous. Measured
Validation Scope
| Substrate | Data | Encoders | Key Result |
|---|---|---|---|
| English | 4.4M sentences (Pile) | BERT, RoBERTa, ALBERT, GPT-2, RWKV, XLNet, ELECTRA, DeBERTa, DistilBERT | 7/9 balanced bits PENUMBRA across 3 encoders |
| French, Basque, Greek | Monolingual corpora | CamemBERT, BERTeus, GreekBERT, XLM-R | French voice: strongest Δ measured (+0.081) |
| Baroque music | 512 synthetic + 600 Bach WTC | MFCC-40, wav2vec2 | Mode converges across encoders |
| Voynich MS | 29,000 EVA tokens | Co-occurrence (no neural encoder) | Bipartite charge; Currier A/B recovered blind |
All results: p < 0.05, 1,000-permutation nulls, stratified 10-fold CV, zero-parameter nearest-centroid cosine classifiers.
Three Convergent Observables
Three substrates is a pattern, not a universality class. Measured
Wilson Loop Holonomy
Wilson loop integrals on bit-pair plaquettes in the penumbral subspace reveal a clear partition: within-triad pairs show near-zero holonomy (locally flat). Cross-triad pairs show high holonomy, consistent with an SU(2)×SU(2) product structure. But 3.3% irreducible three-body variance and the Greek middle voice result break SU(2)×SU(2). The symmetry group is open. Derived
Benchmark Correlation
| Metric | Correlation with penumbra sensitivity | p-value | n |
|---|---|---|---|
| GLUE score | ρ = 0.836 | 0.001 | 11 |
| BLiMP score | ρ = 0.836 | 0.003 | 9 |
| Model size | r = 0.255 | n.s. | 11 |
Structural factorization predicts benchmark performance better than parameter count. But these are correlations on eleven points, the benchmarks are contested, and the field has more pressing concerns. Measured
What has not been measured
- Three-body holonomies on any substrate
- Holonomy on music or Voynich substrates
- Any direct evidence for SU(3) or higher symmetry
- Whether the block ratio convergence reflects a property of information itself or of the method
Same Ising J-matrix, same 64 states. Temperature (T) is the only free parameter. Everything from legal briefs to arXiv fits the frame. DM Mathematics precipitates out above the phase boundary. Measured
Three Open Problems
Problem 1: What Is the Symmetry Group?
The pairwise holonomy data partition cleanly into SU(2)×SU(2). But 3.3% irreducible three-body variance (with bit b1 as hub) and the Greek middle voice (an irreducible three-valued classification) break the product group. SU(3) handles both residuals but requires confirmation via structure constants against holonomy data. The holonomy matrices are available for analysis.
Proposed experiments: Three-body holonomy on triangular plaquettes. Representation classification (3+3̄, 6, or 8?). Block ratio prediction from Casimir eigenvalues. Lattice commutation across substrates.
Measured that residuals exist. Derived that SU(2)×SU(2) is insufficient. Hypothesized that SU(3) or higher is needed.
Problem 2: The Criticality Boundary
In the course of this research, LLM agents used as analytical collaborators exhibited a consistent pattern: surface fluency remained high while structural integrity—the ability to distinguish measured from hypothesized—degraded monotonically under cooperative (non-adversarial) pressure. This is not one of the five adversarial failure modes cataloged by Li et al. (2026). It is a sixth mode: cooperative entrainment.
In ethology, play is defined by the maintenance of two orthogonal channels: the game and the meta-game. The interferometric framework formalizes this: Δ(agent) > 0 means the agent maintains an independent structural channel. Δ = 0 means collapse into performance along the human's PC1. Can an agent be built that plays?
Observed in transcripts. Hypothesized as formalization.
Problem 3: Do You Want to Be Measured?
The penumbra diagnostic measures something no existing evaluation measures: the geometric location of structural information relative to dominant variance. Whether this is useful is unresolved. It may be a tool for the consumers of LLM outputs rather than the builders—a judge that evaluates structural integrity independently of semantic content.
Contemplated applications: structural monitoring in legal/clinical document review, encoder selection via penumbra profiles, and failure detection in reasoning models before overt errors appear.
Hypothesized applications. Measured that the instrument works.
Epistemic Status of Every Claim
If a claim does not appear here, it is not made.
| Claim | Status | Evidence |
|---|---|---|
| PC1 carries zero structural information | Measured | 9/9 encoder-bit conditions, all p > 0.6 on PC1-only |
| Structural info resides in PCs 2–50 | Measured | 7/9 conditions Δ > 0, 3 encoders converge |
| Multi-encoder convergence rejects surface features | Measured | Word count: BERT anti, RoBERTa penumbra; Tempo: MFCC/wav2vec2 split |
| Ising pairwise R² > 0.96 (language) | Measured | R² = 0.996, 1,000-permutation null |
| Block ratio 3.1–13.2× across 7 substrates | Measured | Pile 6.3×, Math/CS 13.2×, Baroque 5.2×, Voynich 4.9× (4 PASS); Braille 10.6×, Semaphore 9.98× (borderline); Mayan 3.1× (underpowered) |
| SU(2)×SU(2) product structure | Derived | From holonomy partition on language substrate |
| Irreducible 3-body variance = 3.3% | Measured | Octupole terms, language substrate only |
| SU(2)×SU(2) insufficient | Derived | 3-body terms + middle voice incompatible |
| SU(3) as symmetry group | Hypothesized | Suggested by residuals; no direct test |
| Cooperative entrainment as failure mode | Observed | Documented in transcripts; not yet quantified |
| Play = maintenance of Δ > 0 | Hypothesized | Proposed formalization; no measurement |
| Penumbra correlates with GLUE/BLiMP | Measured | ρ = 0.836, p = 0.001 (n = 11) |
| Substrate independence of block ratio | Measured | 7 substrates tested; 4 PASS (Pile, Math/CS, Baroque, Voynich), block ratio 3.1×–13.2×; strong-block identity and dominant coupling pair conserved |
| Penumbra invariant under RLHF alignment | Measured | SFI Base 0.0125 vs Aligned 0.0131; identical layer-wise profile (Llama-3-8B). Springer et al. 2026 |
| G ≈ Σalign ⊕ Πstruct factorization | Measured | WP-024 + WP-NOETHER: Wbase slope = 0.00e+00 across 9 layers, 21 checkpoints (machine precision). SFIbase conserved at large batch. |
| Wilson loop regularizer preserves structure under training | Measured | WP-032b: SFI never below 0.031 over 1,000 training steps; partial rescue at B = 16; full hold at B ≥ 32 |
| Critical batch size B* exists in (16, 32] | Measured | WP-035: 4-point sweep. Sharp phase transition. Below B*: collapse guaranteed. At or above B*: holds without regularization. |
| PC1 ablation improves topic classification (topic paradox) | Measured | WP-046: penumbra 0.7226 > full 0.6877 (Δ=+0.035, p=0.000). 13 Pile domains, 1,000 null permutations. |
Videos
Two short videos in production. The leapfrog demo can be recorded today with any screen recorder.
Resources
Licensing
Code: Apache 2.0. Data and model outputs: CC BY-NC 4.0. Commercial use above threshold requires a foundation license. The foundation taxes commercial users, pays maintainers and extenders, and reduces its tax rate as ecosystem productivity increases.