Research
All findings are reported honestly. This page links only to verified or clearly-labeled work.
Core Papers
Why Normal Metric Judgment Cannot Determine the Value of THEOS
The February 2026 auto-scoring experiment showed THEOS scoring 9.77/15 versus single-pass at 14.03/15 (Cohen’s d = −3.46, large effect). This paper explains why that is not a THEOS failure — it is empirical evidence that standard evaluation is structurally incompatible with dialectical reasoning.
Standard metrics reward confident completeness. THEOS produces dialectical tension, hidden structure discovery, and productive disagreement. The instrument does not match the phenomenon.
Comparative Study: THEOS vs. Single-Pass AI
10 conceptual questions given as bare prompts to ChatGPT, Perplexity, Gemini, Copilot, and Grok (single-pass). The same questions run through THEOS (dual-engine, two inner passes per engine).
Finding: Single-pass AIs describe. THEOS deduces and interrogates.
Experiment Design
The Insight Detection Rubric (IDR)
The new rubric designed for what THEOS actually produces:
| Dimension | Question it answers |
|---|---|
| Revelation | Did you understand something you genuinely did not before? |
| Structural Discovery | Did it find dimensions hidden in the question? |
| Productive Tension | Does the dialectic produce more than either perspective alone? |
| Consequence Derivation | Does it derive non-trivial consequences? |
| Question Interrogation | Does it name hidden assumptions? |
→ Full Insight Detection Rubric and Experiment Design
What the Evidence Shows
| Claim | Status |
|---|---|
| Two-pass I→A→D→I finds non-obvious structure | SUPPORTED — egotism/arrogance, π examples |
| THEOS scores lower on standard metrics | CONFIRMED — and explained: wrong instrument |
| Standard rubrics cannot evaluate THEOS | SUPPORTED — specified in advance, not post-hoc |
| THEOS outperforms single-pass on insight metrics | NOT YET TESTED — IDR experiment pending |
| 402,000× cache speedup on repeat queries | MEASURED — real benchmark data |
What Is NOT Claimed
- No “95% accuracy” statistics — no source
- No “70% token reduction” — no source
- No consciousness claims — unknowable
- No fabricated performance charts — quarantined in archive
The foundation is small and honest. It is being built correctly.