structure_of_clear_thinking

Introducing: The Categorical Structure of Sequence Modeling

MikeHLee
·April 26, 2026·2 min read

A series on Ontological Induction — and why LLM hallucination is a type system problem.


The Frame

Large language models lie. Not because they want to, and not because they lack data — they lie because the objective they were trained on has no structural notion of truth. Next-token prediction is a statistical game. Grounding is a categorical one.

This short series makes the argument in full:

LLM hallucinations are not a retrieval problem. They are a type-system problem. Grounding generation in categorical proof objects (Ologs) eliminates hallucination by construction.

That's the thesis. It comes out of the paper "Ontological Induction: Grounding Language Generation in Categorical Proof Objects" (Lee, 2026), and over the next four posts I'll unpack it one layer at a time, in a form aimed at engineers building real systems rather than at reviewers of a theory paper.


The Series

Post 1 — Why Your LLM Hallucinates (And How Category Theory Can Help).
Why hallucination is structural rather than statistical. Introduces Ologs (ontology logs) as a type system for meaning, and shows the smallest non-trivial example that standard retrieval can't fix.
Aligned with our EMNLP 2026 submission (June 15, 2026).

Post 2 — Attention, But Make It Type-Safe. (published today)
How to inject Olog constraints directly into transformer attention. You'll see the change to the attention formula, why categorical reachability gives you a masking rule that's both correct and differentiable, and what it does to training dynamics on a toy ontology.
Aligned with our NeurIPS 2026 submission (May 15, 2026).

Post 3 — From Proofs to Programs to… Text?
Extending the Curry–Howard correspondence to natural-language generation. Proofs are programs; programs are proofs; and — if you do it carefully — generated text can be a proof too. What that buys you, and where it breaks.
Aligned with our ICLR 2027 submission (Oct 1, 2026).

Post 4 — Building an Auditable AI: A Complete Walkthrough.
End-to-end engineering build: ontology → proof engine → constrained generator → API with full audit traces. Runnable, reviewable, and deployable.
Aligned with our ICLR 2027 submission (Oct 15, 2026).


Why This, Why Now

Every serious AI deployment eventually hits the same wall: the model is fluent, but you can't trust any individual output without a human in the loop. Retrieval helps. Fine-tuning helps. Guardrails help. None of them give you a proof that a statement is sound with respect to a domain you actually understand.

A type system does.

That's the move this series makes — from "the model probably knows" to "the model can only say things that are provable against a domain ontology, and here's the proof." The cost is real: you have to author (or induce) an Olog. The benefit is that the resulting generation is auditable by construction, and the audit trace is a first-class artifact the rest of your stack can use.


Reading Order

Post 2 ships first because the NeurIPS deadline lands first. Chronologically you can read them 1 → 2 → 3 → 4, but each post stands alone — start wherever the problem statement grabs you.

Companion series on adjacent territory (reward geometry, swarm coordination) will cross-link in as they publish.


This is Post 0 of the Ontological Induction series at the Oasis-X research blog. Feedback, pushback, and counterexamples are welcome — we ship the blog drafts before the paper reviewers see them for exactly this reason.