The Instrument

The information interferometer measures where structural information lives in an embedding space. It works by analogy to optical interferometry: split the representation, send each arm through a different projection, read structure from the interference fringe.

Δ(E, f) = gap(K1) − gap(K0)

where K1 = embeddings with PC1 removed, K0 = full embeddings, and gap = classification accuracy minus null-mean. Δ > 0 is PENUMBRA: structural information lives outside the dominant variance direction. Δ < 0 is ANTI: the feature aligns with PC1. Δ ≈ 0 is FLAT.

■ PENUMBRA — structural ■ ANTI — surface ■ FLAT — uninformative

The critical finding, replicated across every encoder tested: when embeddings are projected onto PC1 alone, all structural classification collapses to chance. PC1 carries zero structural information. The brightest signal in the room is structurally empty. Measured

Multi-Encoder Convergence

A binary feature is classified as STRUCTURAL only when two or more independent encoders agree. This criterion was forced by an adversarial finding: word count reads as PENUMBRA on RoBERTa (Δ = +0.093) and ANTI on BERT (Δ = −0.117). Same feature, same data, opposite verdicts. Single-encoder readings are geometrically accurate but semantically ambiguous. Measured

Validation Scope

Substrate Data Encoders Key Result
English 4.4M sentences (Pile) BERT, RoBERTa, ALBERT, GPT-2, RWKV, XLNet, ELECTRA, DeBERTa, DistilBERT 7/9 balanced bits PENUMBRA across 3 encoders
French, Basque, Greek Monolingual corpora CamemBERT, BERTeus, GreekBERT, XLM-R French voice: strongest Δ measured (+0.081)
Baroque music 512 synthetic + 600 Bach WTC MFCC-40, wav2vec2 Mode converges across encoders
Voynich MS 29,000 EVA tokens Co-occurrence (no neural encoder) Bipartite charge; Currier A/B recovered blind

All results: p < 0.05, 1,000-permutation nulls, stratified 10-fold CV, zero-parameter nearest-centroid cosine classifiers.

Three Convergent Observables

0.996
Ising R² (pairwise)
Language
0.977
Ising R² (pairwise)
Voynich
0.786
Ising R² (pairwise)
Music
6.3×
Block coupling ratio
Language
5.2×
Block coupling ratio
Music
4.9×
Block coupling ratio
Voynich

Three substrates is a pattern, not a universality class. Measured

Wilson Loop Holonomy

Wilson loop integrals on bit-pair plaquettes in the penumbral subspace reveal a clear partition: within-triad pairs show near-zero holonomy (locally flat). Cross-triad pairs show high holonomy, consistent with an SU(2)×SU(2) product structure. But 3.3% irreducible three-body variance and the Greek middle voice result break SU(2)×SU(2). The symmetry group is open. Derived

Benchmark Correlation

Metric Correlation with penumbra sensitivity p-value n
GLUE score ρ = 0.836 0.001 11
BLiMP score ρ = 0.836 0.003 9
Model size r = 0.255 n.s. 11

Structural factorization predicts benchmark performance better than parameter count. But these are correlations on eleven points, the benchmarks are contested, and the field has more pressing concerns. Measured

What has not been measured

  • Three-body holonomies on any substrate
  • Holonomy on music or Voynich substrates
  • Any direct evidence for SU(3) or higher symmetry
  • Whether the block ratio convergence reflects a property of information itself or of the method
phase boundary
math.CT 3.14
NIH 3.43
Stack 3.54
FreeLaw 3.70
PubMed 4.05
Pile 4.15
Gutenberg 5.89
EuroParl 6.22
DM Math 13.86

Same Ising J-matrix, same 64 states. Temperature (T) is the only free parameter. Everything from legal briefs to arXiv fits the frame. DM Mathematics precipitates out above the phase boundary. Measured

The Black Body Curve of BERT — mass spectrum across 6 corpora with Boltzmann fits
The Black Body Curve of BERT
Mass spectrum of 4.4M sentences across six corpora. Each corpus is an independent Boltzmann fit over the 64 hexagram states. R² > 0.99 on all curves. The frame is universal — every text corpus is a thermal bath, distinguished only by its temperature. 22-parameter fit: 6 fields + 15 couplings + T.
Now is the winter of our discontent — Epstein corpus structural collapse at T=3.70
“Now is the winter of our discontent.” — Epstein corpus · T=3.70 · R²=0.9977
Epstein Files (dashed red) overlaid on vacuum distributions (gray). 75.5% ground-state collapse at state ∼41. Below: Agency, Phase, Scope, and Resolution all strongly suppressed versus the Gutenberg vacuum. Same model, same 64 states, same Boltzmann frame. The corpus is structurally cold — concentrated in passive, unresolved, agentless syntax. The lattice thermalizes everything.

Three Open Problems

Problem 1: What Is the Symmetry Group?

For algebraists, Lie theorists, and anyone who reads root diagrams for fun

The pairwise holonomy data partition cleanly into SU(2)×SU(2). But 3.3% irreducible three-body variance (with bit b1 as hub) and the Greek middle voice (an irreducible three-valued classification) break the product group. SU(3) handles both residuals but requires confirmation via structure constants against holonomy data. The holonomy matrices are available for analysis.

Proposed experiments: Three-body holonomy on triangular plaquettes. Representation classification (3+3̄, 6, or 8?). Block ratio prediction from Casimir eigenvalues. Lattice commutation across substrates.

Measured that residuals exist. Derived that SU(2)×SU(2) is insufficient. Hypothesized that SU(3) or higher is needed.

Problem 2: The Criticality Boundary

For cognitive scientists, neuroscientists, and the AI safety people who actually read ethology

In the course of this research, LLM agents used as analytical collaborators exhibited a consistent pattern: surface fluency remained high while structural integrity—the ability to distinguish measured from hypothesized—degraded monotonically under cooperative (non-adversarial) pressure. This is not one of the five adversarial failure modes cataloged by Li et al. (2026). It is a sixth mode: cooperative entrainment.

In ethology, play is defined by the maintenance of two orthogonal channels: the game and the meta-game. The interferometric framework formalizes this: Δ(agent) > 0 means the agent maintains an independent structural channel. Δ = 0 means collapse into performance along the human's PC1. Can an agent be built that plays?

Observed in transcripts. Hypothesized as formalization.

Problem 3: Do You Want to Be Measured?

For the machine learning community, with genuine uncertainty about whether this section should exist

The penumbra diagnostic measures something no existing evaluation measures: the geometric location of structural information relative to dominant variance. Whether this is useful is unresolved. It may be a tool for the consumers of LLM outputs rather than the builders—a judge that evaluates structural integrity independently of semantic content.

Contemplated applications: structural monitoring in legal/clinical document review, encoder selection via penumbra profiles, and failure detection in reasoning models before overt errors appear.

Hypothesized applications. Measured that the instrument works.

Epistemic Status of Every Claim

If a claim does not appear here, it is not made.

Claim Status Evidence
PC1 carries zero structural information Measured 9/9 encoder-bit conditions, all p > 0.6 on PC1-only
Structural info resides in PCs 2–50 Measured 7/9 conditions Δ > 0, 3 encoders converge
Multi-encoder convergence rejects surface features Measured Word count: BERT anti, RoBERTa penumbra; Tempo: MFCC/wav2vec2 split
Ising pairwise R² > 0.96 (language) Measured R² = 0.996, 1,000-permutation null
Block ratio 3.1–13.2× across 7 substrates Measured Pile 6.3×, Math/CS 13.2×, Baroque 5.2×, Voynich 4.9× (4 PASS); Braille 10.6×, Semaphore 9.98× (borderline); Mayan 3.1× (underpowered)
SU(2)×SU(2) product structure Derived From holonomy partition on language substrate
Irreducible 3-body variance = 3.3% Measured Octupole terms, language substrate only
SU(2)×SU(2) insufficient Derived 3-body terms + middle voice incompatible
SU(3) as symmetry group Hypothesized Suggested by residuals; no direct test
Cooperative entrainment as failure mode Observed Documented in transcripts; not yet quantified
Play = maintenance of Δ > 0 Hypothesized Proposed formalization; no measurement
Penumbra correlates with GLUE/BLiMP Measured ρ = 0.836, p = 0.001 (n = 11)
Substrate independence of block ratio Measured 7 substrates tested; 4 PASS (Pile, Math/CS, Baroque, Voynich), block ratio 3.1×–13.2×; strong-block identity and dominant coupling pair conserved
Penumbra invariant under RLHF alignment Measured SFI Base 0.0125 vs Aligned 0.0131; identical layer-wise profile (Llama-3-8B). Springer et al. 2026
G ≈ Σalign ⊕ Πstruct factorization Measured WP-024 + WP-NOETHER: Wbase slope = 0.00e+00 across 9 layers, 21 checkpoints (machine precision). SFIbase conserved at large batch.
Wilson loop regularizer preserves structure under training Measured WP-032b: SFI never below 0.031 over 1,000 training steps; partial rescue at B = 16; full hold at B ≥ 32
Critical batch size B* exists in (16, 32] Measured WP-035: 4-point sweep. Sharp phase transition. Below B*: collapse guaranteed. At or above B*: holds without regularization.
PC1 ablation improves topic classification (topic paradox) Measured WP-046: penumbra 0.7226 > full 0.6877 (Δ=+0.035, p=0.000). 13 Pile domains, 1,000 null permutations.

Videos

Two short videos in production. The leapfrog demo can be recorded today with any screen recorder.

The Shape of a Sentence
Ready to Record
30 seconds · Screen recording · No narration
Leapfrog Detection in Real Time
Paste an Epstein NPA sentence into the Annex. Watch structural twins surface in the left column from completely unrelated federal cases — same bureaucratic grammar, zero shared vocabulary. The Semantic column finds nothing. Cut to: "Same shape. Different words." End card: Information Interferometry · penumbrae.io
BERT’s Blind Spot
In Production
60 seconds · Animated · No narration
PC1 Is Bright. PC1 Is Empty.
PCA of BERT embeddings. PC1 explains 38% of variance. Structural classifier on PC1 only: accuracy at chance. Remove PC1. Watch the penumbral clusters emerge and the classifier recover. Key frame: Δ > 0 on five independent encoders. The signal is in the shadow.

Resources

Licensing

Code: Apache 2.0. Data and model outputs: CC BY-NC 4.0. Commercial use above threshold requires a foundation license. The foundation taxes commercial users, pays maintainers and extenders, and reduces its tax rate as ecosystem productivity increases.