If you ask what Ontic does, the short answer is simple: we turn documents into governed answers.
The longer answer is that we do not treat AI output as a single model problem. We treat it as a systems problem.
Imagine a company asking: "What parts of ISO 27001 apply to our cloud deployment?" A typical AI system guesses from training data. Ontic retrieves the actual clauses from the normalized corpus, constrains the model to interpret those clauses under explicit rules, and records exactly which evidence produced the answer.
A useful system has to do five things well:
- prepare source material cleanly
- turn it into a retrieval-grade corpus
- retrieve the right evidence at answer time
- make the model speak inside a controlled contract
- preserve a forensic trail after the answer is delivered
That full chain is what Ontic is built to do.
The problem we are solving
Most AI products stop at one of these layers:
- a model with a prompt
- a vector database with semantic search
- a chat UI with citations
- a monitoring dashboard after the fact
Those pieces matter. They are not enough.
If the source document is malformed, retrieval quality degrades. If retrieval is weak, the model improvises. If the model is not governed, the answer can sound stronger than the evidence. If there is no forensic trail, nobody can explain what happened after a bad answer ships.
That is the failure pattern Ontic is designed to avoid.
What the platform does
At a high level, Ontic takes raw inputs such as standards, regulations, RFCs, internal docs, PDFs, and markdown, then moves them through a governed pipeline until they become usable evidence for live answers.
The platform has five layers.
1. Prepare the source documents
The first job is to make source material usable.
Our Python pipeline ingests PDFs, markdown, and other document formats, then normalizes them into a stable contract:
- cleaned markdown
- frontmatter
- S.I.R.E. metadata
- deterministic chunking
- watermarking and provenance
- embedding-ready
chunks.jsonl
This is not cosmetic cleanup. It is where the system decides whether a document is actually fit to become evidence.
If a PDF still contains broken ligatures, ghost headers, shattered words, or fake section boundaries, retrieval quality will be wrong later no matter how good the model is.
So Ontic treats source preparation as a first-class production concern, not a preprocessing detail.
2. Organize the knowledge corpus
Once documents are normalized, we build a governed corpus.
That corpus is not just a bag of chunks. It includes:
- canonical document identities
- framework and subject metadata
- catalog entries so the system knows what it contains
- crosswalk artifacts so related sources can be mapped across frameworks and domains
This matters because real questions are rarely about one paragraph in one document. They are often about overlap, difference, applicability, or evidence coverage across multiple authorities.
A useful platform has to know not only what a chunk says, but what role that chunk plays inside the broader body of knowledge.
3. Retrieve the right evidence
When a user asks a question, Ontic does not just hand the prompt to a model.
The system first decides what evidence needs to be retrieved, then gathers the relevant oracle chunks from the live corpus.
That includes internal RFCs, external standards, derived crosswalks, and other embedded material already staged in Supabase for Goober retrieval.
The point is not to produce the longest context window possible. The point is to retrieve the right evidence with enough structure that the answer can stay grounded.
That is why we care about catalog refreshes, crosswalk refreshes, corpus QA, and retrieval smoke tests. If the corpus is stale, the answer path is stale.
4. Constrain the model's answer
The answer layer is Goober.
Goober is not just a chat model with a personality file. It runs inside a contract.
That contract defines:
- identity and voice
- evidence rules
- citation format
- boundary behavior
- response modes
- message architecture
The result is that the model is not asked to pretend it is a source of truth. It is asked to interpret evidence under explicit rules.
In practice, that means Goober can:
- answer casually when the request is casual
- switch into governed behavior when the request is evidence-sensitive
- separate grounded claims from general guidance
- cite the oracle or crosswalk material it actually used
The model is useful because the system around it is disciplined.
5. Record the forensic trail
A governed answer is only half the job. The other half is being able to inspect what happened.
Ontic records the answer path as an envelope that can include:
- request identifiers
- retrieved sources
- governance outcomes
- chunk and prompt context
- chat completion telemetry
- log probabilities
- entailment checks
- forensic summaries
That gives operators a way to do more than say, "the model answered badly."
They can ask:
- what sources were retrieved?
- what evidence was missing?
- what mode was active?
- was the answer supported?
- what exactly did the system do before it produced this text?
That is the difference between a demo and an operational system.
What makes this different from ordinary RAG
A lot of teams say they have RAG when they mean one of two things:
- semantic search plus a prompt template
- a citation list attached to the model output
Ontic is stricter than that.
We care about:
- whether the source docs are clean enough to embed
- whether the corpus metadata is coherent
- whether the retrieval path is scoped correctly
- whether the answer contract is explicit
- whether the full chain is inspectable afterward
That means the platform is doing more work than a standard chatbot stack. It also means the output is easier to govern, debug, and improve.
What this enables
When the full chain is working, the system can do something most AI stacks cannot do reliably:
- ingest a new source document
- clean and normalize it
- embed it into the live corpus
- refresh the catalog and crosswalk artifacts
- retrieve it in production
- answer from it with evidence
- preserve the forensic record of how the answer was produced
That is not one feature. That is an operating model.
The practical takeaway
Ontic is not trying to make models sound smarter.
We are building the infrastructure that lets organizations move from raw documents to governed answers with a usable audit trail.
That means:
- source prep has to be real
- embeddings have to be real
- retrieval has to be real
- answer constraints have to be real
- forensics have to be real
If any one of those layers is weak, the rest of the stack starts performing confidence theater.
When the source preparation, retrieval, and governance layers are weak, AI systems produce something that looks like knowledge but cannot be explained or defended afterward.
The platform exists to stop that from happening.
In one sentence
Ontic turns source material into governed, retrievable evidence and carries that evidence all the way through chat completion and forensic review.