A hybrid reasoning engine: LLM understanding + deterministic logic
"In Prosecution, being mostly right is actually a liability."
SCROLL
01 · The Problem
Four Limits of Standard AI
Off-the-shelf AI and RAG are probabilistic. They work for documents. They are dangerous for legal analysis, hallucinating connections, missing nuance, and corrupting timelines.
GAP 01
🪪
Identity
"James" vs "Jamie Harlow" vs "The Accused": a vector DB treats these as three different people. This kills recall.
Phonetic GUID Binding
GAP 02
⚖️
Logic
A police officer writes "Knife." The Penal Code says "Weapon." Standard vector search misses this entirely.
Knowledge Graph Ontology
GAP 03
⏱
Chronology
In a Case Diary, sequence is the evidence. Standard AI retrieves by similarity, not by time. It corrupts timelines.
Time-Series Validation Engine
GAP 04
🕸
Complexity
Financial crimes are mathematical graph patterns, not keywords. A chatbot cannot "see" the flow of money.
Deterministic Graph Algorithms
02 · Architecture
Split-Brain System
Most systems only use the neural half. This architecture adds the symbolic layer: a rigid logic engine that acts as a chaperone for the creative AI.
Ingest & Normalisation
Reasoning & Retrieval
Security & Generation
INGESTVLM · Diarization · MDM
REASONINGVecDB ↔ KG · Rules Engine
OUTPUTCitable Verdict · Live
latency—throughput—
Neural / LLM layer
Vector Database
Semantic similarity engine. Answers: "What does this text look like?"
Dense embeddings for narrative context
BM25 sparse search for Case IDs and passport numbers
Cross-encoder reranking on UK legal text
HyDE bridging natural questions to legal evidence
Symbolic / Logic layer
Knowledge Graph
Deterministic logic engine. Answers: "What does this mean legally?"
UK Penal Code ontology: statutes, sections, definitions
Temporal graph: timeline of events and causality
Ontology mapper: enforces legal inheritance
Enforces the law before generating the answer
03 · Identity
Variable Name, Constant Identity
100% recall across the entire archive, regardless of spelling. Every name variation is bound to a single Global Unique ID before any storage happens.
Coreference resolution: When a transcript says "He entered the room", the system looks back, identifies "He" as ENT-99284, and tags it. No evidence is orphaned.
04 · Logic
Think Like a Prosecutor
Standard RAG retrieves text. GraphRAG retrieves logic, traversing the Penal Code ontology to find the legal classification. The LLM receives the inference path, not just raw text.
Zero-shot adaptability: If a new weapon like a "3D Printed Spike" appears tomorrow, simply update the graph taxonomy. No retraining needed. The logic propagates instantly across the entire system.
05 · Chronology
Alibi Physics
Time is the only variable that matters in a Case Diary. Evidence is extracted as structured tuples: (Entity, Action, Location, Timestamp), so math can be run on the narrative.
LIVE ANALYSIS
Alibi conflict detection
DISTANCE
100 km
TIME WINDOW
15 min
REQ. SPEED
400 km/h
⛔
Spatiotemporal Impossibility: Alibi Rejected
Standard RAG finds similar text and misses the time conflict entirely.
06 · Financial Crimes
Math Lives in Graphs, Not LLMs
Financial crimes are mathematical patterns, not keywords. The LLM parses bank statements. The graph algorithm proves the crime deterministically, with 100% confidence.
Hybrid query: "Show all individuals within 2 hops of Suspect who transferred >50k". Standard search cannot do multi-hop relationship analysis.
07 · Sovereign Security
Air-gapped intelligence
Sensitive data cannot leave the building. A semantic router acts as a traffic cop, routing general queries to the cloud and case-specific data to an on-premise secure vault.
☁ Public Cloud LLM
General legal theory
Definitions, legal precedents, general interpretations of law. No case-specific data. Zero PII exposure.
🔒 On-Premise SLM
Case-specific evidence
Any query involving a case number, victim name, or sensitive PII is physically routed to a locally-hosted model. Sovereign data never leaves the CPS perimeter.
🛡
ABAC clearance filter
Attribute-Based Access Control checks user clearance before data is even retrieved. If a user lacks "Juvenile" clearance, that data effectively does not exist for them.
✂️
Dynamic redaction
Even if the secure model finds the answer, a post-generation filter scrubs all PII from the output based on the requesting user's role. Names, IDs, and phone numbers are removed before delivery.
🚫
Adversarial defense
Prompt injection protection layer. If someone tries to trick the AI into leaking data through crafted inputs, this layer intercepts and blocks it before any retrieval occurs.
08 · Trust
No Black Boxes
Every claim is a click away from its source. Every reasoning step is logged. If a defense attorney challenges a finding, the entire chain of thought can be printed and presented.
🎯
Hallucination guard: negative logic
The system is programmed to prefer silence over lies. Below 95% confidence, the AI is forced to respond "I do not know." It will not guess. Ever.
0%Threshold: 95%100%
📋
Immutable audit log: chain of thought
The entire reasoning path is logged: which documents were accessed, which logic path was taken, and why the conclusion was reached. All entries are timestamped and immutable.
Vector indexes are physically separated. Prosecutor strategy notes are isolated from general evidence, so the AI can never accidentally leak internal strategy into a general query.
🔄
Stale data pruning: vector lifecycle
Witnesses change statements. If a witness recants, toggle is_active=False instantly, with no waiting for a weekend rebuild. The retrieval system stays accurate in real-time.
CPS NEURO-SYMBOLIC AI
Reasoning Engine, Not Search Box.
It doesn't find similar text. It proves the case, traversing law, enforcing chronology, detecting patterns, and citing every claim.
Transitioning from probabilistic search → deterministic legal reasoning