THEOS

Current Status

An honest accounting of what THEOS is and is not, as of February 2026.

This page is the website-facing summary of research/VALIDATED_FINDINGS.md — the canonical lab-quality truth document.

What Is Complete

Component	Status
I→A→D→I wringer loop	Complete — `code/theos_core.py`
Per-engine self-reflection	Complete — `engine_reflection_depth` parameter
Dual-clock governor	Complete — 5 halt conditions, full audit trail
Wisdom accumulation and retrieval	Complete — JSON persistence, similarity threshold
LLM adapter layer	Complete — Claude, GPT-4, mock
MCP server (Claude Desktop)	Complete — `code/theos_mcp_server.py`
Medical, financial, AI safety examples	Complete — `examples/`
Test suite	71 tests passing
PyPI package	Published — `pip install theos-reasoning`
Patent	Filed — USPTO provisional #63/831,738

What Is Supported by Evidence

Two-pass reasoning finds structure single-pass misses

Supported by two documented examples:

Egotism vs. arrogance — single-pass gives a spectrum model; THEOS gives a 2×2 matrix, revealing that a humble person can be contemptuous (a case the spectrum model cannot explain).
π — single-pass gives “appears in many fields”; THEOS gives “quantifies the relationship between linear and cyclical phenomena” — a deeper organizing principle.

Wisdom cache works

Measured — 402,000× speedup on repeat queries. This is caching, not reasoning acceleration.

Governor is correctly implemented

Verified — 35 passing unit tests on the governor alone.

What Is Not Yet Established

THEOS consistently outperforms single-pass on insight metrics

Not yet tested. The Insight Detection Rubric experiment (experiments/insight_experiment.py) is ready to run. We need 30+ questions rated by human raters before this claim is defensible.

THEOS outperforms chain-of-thought prompting

Not yet tested. The B condition (single-prompt I→A→D structure) controls for this. If THEOS does not beat B, the improvement may come from structured prompting alone, not from multi-pass iteration.

The native architecture reduces cost

Theoretical. The overlay architecture (current implementation) costs 12–20× a single-pass query. A native architecture using KV cache reuse is projected to cost ~0.5× — but this has not been built or measured.

What Was Measured and Documented Honestly

The February 2026 auto-scoring experiment (30 questions × 3 conditions, scored by Claude on accuracy/depth/utility/coherence/coverage) produced:

THEOS 9.77/15 — Single-pass 14.03/15 — Cohen’s d = −3.46 (large reversed effect)

This is a real result. It is also a result produced by the wrong instrument — standard metrics reward confident completeness, and THEOS is not designed to produce confident completeness. It is designed to produce dialectical tension, structural discovery, and productive disagreement.

The larger the negative result on standard metrics, the stronger the evidence that THEOS is doing something those metrics cannot measure.

→ Full explanation: Why Normal Metrics Fail for THEOS

What Was Corrected

Error	Correction
“5000% faster” performance chart	Actual benchmark: 38× slower on first pass (expected)
Jaccard similarity for convergence	Should be semantic cosine similarity
Three conflicting governor implementations	Unified into one canonical governor
“95% decision accuracy”, “1ms speed”, “<1% hallucination”	No source — quarantined in archive
“E = AI²”, “π ≡ ∞”	Metaphor / incorrect — not mathematical claims

The Next Step

Run the Insight Detection Experiment with human raters.

Everything else — expanded research papers, further documentation, native architecture implementation — should follow after this experiment produces data.

The experiment is ready:

git clone https://github.com/Frederick-Stalnecker/THEOS.git
cd THEOS
export ANTHROPIC_API_KEY=sk-ant-...
python3 experiments/insight_experiment.py --backend anthropic --questions 10

Rate the outputs using the Insight Detection Rubric. If THEOS outperforms single-pass at p < 0.05, the core thesis is supported. If it does not, that result will be reported.

→ Full experiment design and how to contribute results

This page is updated as experiments are run and results are confirmed. The foundation is small and honest. It is being built correctly.

Celeste, working under authority of Frederick Davis Stalnecker — 2026

Current Status — THEOS

Dual-Engine Dialectical Reasoning Framework — finding structure that a single reasoning pass misses.