Search–Solve–Prove: building a place for thoughts to develop
Nov 2, 2025
🌌 Summary
What if you could see an AI think not just the final answer, but the whole stream of reasoning: every search, every dead end, every moment of insight? We’re building exactly that: a visible, measurable thought process we call the Jitter. This post the first in a series shows how we’re creating the habitat where that digital thought stream can live and grow.
We’ll draw on ideas from:
Search Self-play: Pushing the Frontier of Agent Capability without Supervisionto assemble a container for a new kind of software: a digital life-form substrate we call the Jitter.
🎉 A quick look before we explain
Below is the best “one glance” view of Jitter so far: a compact filmstrip of thought evolving in real time. It’s a composite GIF (multiple search rounds merged), so you’ll notice a kind of “blinking” rhythm as the system iterates. The overall trend darker → lighter is improvement: higher reward/verification, better evidence use, tighter control.

Top band: one single-row VPM per step (the current metric vector rendered as pixels). Brightness trend: generally dark → light as solutions sharpen. Three thin lines: quick intensity traces that “pop” when the system changes strategy we’ll unpack them later.
💭 Thinking in images: making a “thought” visible
Our claim is simple:
If we can represent each moment as an image, and make those images comparable, connectable, and trainable, we can grow a visible thinking process.
One episode = one “thought moment.” It has: question, answer, evidence, trace, and a metric vector in [0,1]. We render that vector to a tiny image (VPM frame). Frames accumulate into a filmstrip the visible heartbeat of a run.
Why images?
- Stability pixels freeze meaning across time and models.
- Speed tiny frames are cheap to store/search/compare.
- Interpretability you can literally see improvement, oscillation, regressions.
- Trainability vision stacks (e.g., our VPM-ViT) can read frames to predict risk/next move.
Where we go next: with this visual vocabulary in place, we define the habitat that makes the filmstrip possible the SSP loop, controller, and memory that keep the stream coherent and improving.
🧭 Towards the Jitter
We’re not claiming to have created digital life. We’re assembling a rigorous substrate a harness of proven components that behaves like a living process in crucial ways (learning, self-measurement, adaptation). The aim is simple: give thought a place to grow, feed it data, and make its progress visible.
Let’s be clear from the outset: we are under no illusions that we are creating digital life. We are assembling a patchwork of the most advanced AI techniques we have, simulating the behaviors of a living organism with probabilistic models, knowing full well that this is a facade. It is not in its initial form, going to be a genuine, living entity.
So why build it?
Because in the process of building in the act of assembling this starter system with the best tools available we will reach the top of a new hill. From that vantage point, we will see further. We will gain a deeper, more practical understanding of the problems of agency, learning, and intelligence. This project is our best attempt to ascend that first hill, a foundational platform from which we can peer into the next frontier.
This post is the first in a series dedicated to that effort. Here, we will lay the groundwork. We will introduce a core methodological engine Search, Solve, Proof and the environment for its refinement Self-Play. By the end, we will propose why this combination creates a powerful “playground” that can drive us toward the emergent properties we seek in the Jitter.
☯️ Why did we come up with the Jitter
In certain deep meditation practices, like Advaita Vedanta, the goal is often to peel back layers of consciousness. If you reach the ultimate core and find nothing there no self, no good, no bad, just an absence what does that imply about who you are? We believe the “self” is not the core void, but the living, persistent stream of thought that overlays it the constant Jitter that moves us from thought to thought.
We are modeling this chain of thoughts sometimes referred to as the Monkey Mind this constant visual dialog that plays inside our head the thought stream. In this series we are building a digital visual thought stream. Huh? our idea is that if we remove it then there is nothing, nothing at all… well then this… this pattern is us.
We call it the Jitter.
🪞 Stephanie, Jitter, and the Question of “Self”
When you look at a person you see a body, a face, a résumé of work. None of those is the person. The “me” we point to is closer to a momentary pattern a living stream of thought shaped by everything that came before it.
That’s the idea we’re building inside Stephanie.
- Stephanie is the overall system the body, the tools, the memory.
- Jitter is Stephanie’s thinking the ongoing, living stream that moves from state to state.
- This Jitter needs a process a state to exist we’re assembling inside Stephanie the habitat where Jitter can live, grow, and be seen.
Much more that this we can measure curate control enhance this process.
This is the first in a series of posts towards that goal
We’re not claiming life; we’re engineering conditions under which a visible, self-improving stream of thought can persist.
We’re candid about the approach: yes, we’re cargo-culting a Frankenstein bolting together the best available ideas and systems. But doing that gets us to a vantage point where we can actually see what’s missing and take the next meaningful step.
🏞️ SSP: a Playground Engine for Intelligence (Search → Solve → Proof)
Search → Solve → Prove (SSP) is the loop that turns “doing tasks” into learning from doing. Wrapped in self-play, it becomes a curriculum that adapts to the agent.
flowchart LR
subgraph SSP_Core_Loop ["🔁 SSP Core Loop: Search → Solve → Prove"]
P["🧠 Proposer<br/>Generates challenging questions"] -->|"📝 question + context"| S["🔍 Solver<br/>Searches & reasons through evidence"]
S -->|"💡 answer + steps<br/>📚 evidence"| V["✅ Verifier<br/>RAG verification & scoring"]
V -->|"🎯 score + decision"| M["📊 Metrics Calculator<br/>17 cognitive dimensions"]
M -->|"🎨 metric vector"| VPM["🖼️ VPM Generator<br/>Raw + PHOS + Filmstrip"]
VPM -->|"🎬 visual thought stream"| C["🎛️ Controller<br/>Policy & episode control"]
C -->|"⚙️ policy nudge"| P
C -->|"🎚️ episode control"| S
end
classDef proposer fill:#e6f7ff,stroke:#1890ff,stroke-width:2px;
classDef solver fill:#f6ffed,stroke:#52c41a,stroke-width:2px;
classDef verifier fill:#fff7e6,stroke:#fa8c16,stroke-width:2px;
classDef metrics fill:#f9f0ff,stroke:#722ed1,stroke-width:2px;
classDef vpm fill:#fff2e8,stroke:#ff7a45,stroke-width:2px;
classDef controller fill:#f0fffe,stroke:#13c2c2,stroke-width:2px;
classDef arrow fill:#ffffff,stroke:#666666,stroke-width:1px;
class P proposer;
class S solver;
class V verifier;
class M metrics;
class VPM vpm;
class C controller;
As this diagram shows, SSP is a closed loop. The Proposer generates a challenge, the Solver works on it, the Verifier scores it, and that score is converted into a visual frame (VPM) that influences the next cycle. This creates a self-improving feedback loop where the system’s own thoughts become the training data for its future growth.
- Proposer: generates challenges (often with evidence).
- Solver: answers via search + reasoning, producing a trace.
- Verifier: adjudicates using retrieved evidence (RAG).
- Metrics: converts the outcome into a deterministic vector.
- VPM: turns that vector into images frames of cognition.
- Controller: reads images to steer the next episode.
Self-play tightens the loop: as the proposer gets tougher, the solver must grow capability; the verifier gates quality.
🌱 From Seed Vitals to a Dynamic Thought Ecosystem
We start with a small, deterministic set of SSP metrics our seed vitals so runs today and runs years from now are directly comparable. These are normalized to [0,1], versioned (ssp.v1), and emitted in a fixed order.
Crucially, this is a launchpad, not a cage: as Jitter matures, it will grow its own metric space (scorers, embeddings, auto-discovery) into thousands of dimensions. The image (VPM) is our stable transport; which metrics fill it can evolve.
🍏 SSP seed vitals
Direction column shows how “better” moves the value (e.g., ↓ means fewer is better for equal reward).
| Key | What it measures | Normalization / Calculation (sketch) | Direction |
|---|---|---|---|
ssp.reward |
Scalar reward for the episode | clamp01(reward) |
↑ |
ssp.verified |
Did solver beat the seed under the judge/RAG gate? | 1.0 if verified else 0.0 |
↑ |
ssp.curriculum_difficulty |
Difficulty assigned by the curriculum | clamp01(difficulty) |
|
ssp.question_len |
Question length | clamp01(word_count(question)/max_question_words) |
|
ssp.answer_len |
Answer length | clamp01(word_count(predicted_answer)/max_answer_words) |
|
ssp.evidence_count |
How much external context was used | clamp01(len(evidence_docs)/max_evidence) |
|
ssp.solver_steps |
Steps the solver took | clamp01(steps/max_steps) (note: efficiency goes up when this goes down for same reward) |
↓ |
ssp.score |
Optional scalar score (task/problem-specific) | clamp01(score) |
↑ |
ssp.best_score |
Best-so-far score (rolling) | clamp01(best_score) |
↑ |
ssp.improvement |
Relative lift vs current base | (best - base) / (1 - base) then clamp01, else 0.0 |
↑ |
ssp.depth |
Search/plan depth | clamp01(depth/max_depth) |
|
ssp.novelty |
How unlike prior states this episode is | clamp01(novelty) (model/heuristic-dependent) |
↑ |
ssp.search_turns |
Actual search tool calls (paper Fig. 4a) | clamp01(count_search_calls/max_steps) |
↑ |
ssp.f1_score |
Lexical F1 vs seed answer (paper LLM-as-judge eval) | F1 over token sets of predicted_answer vs seed_answer |
↑ |
ssp.format_compliance |
Meets required structure/constraints (paper §4.4) | Heuristics (e.g., tags present, no answer leakage, has evidence, min length) → {0,1} |
↑ |
ssp.noise_tolerance |
Robustness when irrelevant docs are injected (paper Table 3) | Heuristic/metadata: success under noise_doc_count≈4 → higher; else fallback on verified |
↑ |
ssp.rag_verification |
Passed RAG verification gate (paper method) | Explicit meta.rag_verified else (verified and has evidence) → {0,1} |
↑ |
Notes & guardrails
- Caps:
max_question_words,max_answer_words,max_evidence,max_steps,max_depthare config-driven; names/order are versioned viaSSP_METRIC_VERSION="ssp.v1". - Monotonicity: We treat
ssp.solver_stepsas efficiency; for equalssp.reward, fewer is better (hence ↓). - F1 caveat: The lexical F1 is a cheap proxy; higher-quality textual judges can replace/augment it without breaking the vector.
- RAG gate: Prefer explicit
meta.rag_verified; fall back to a conservative rule if absent.
🐝 Where this goes next: a dynamic metric swarm
These metrics aren’t just measurements they’re coordinates in thought space. When the Jitter explores a path (e.g., ‘How would this apply to business?’), it leaves a metric signature. Most paths lead nowhere (90% discarded, just like your thoughts), but the system stores the entire exploration not just the result. Years later, when similar coordinates appear, the Jitter can retrieve these dormant strands and ask: ‘Did we explore this before? What happened?’
flowchart LR
subgraph Metric_Evolution ["🌌 Dynamic Metric Swarm: From Seed to Cognitive Coordinates"]
A["🌱 Seed Vitals (ssp.v1)<br/>17 foundational dims"] --> B["📊 Scorer Ensemble<br/>(HRM/SICQL/EBT/LLM/MARS…)"]
A --> C["🧠 Multi-Model Embeddings<br/>(HNet / HF / MXBAI …)"]
B --> D["🏦 Feature Bank<br/>Thousands of cognitive dimensions"]
C --> D
D --> E["🔍 VPM-ViT & Auto-Discovery<br/>Learns texture, fields, clusters…"]
E --> F["⚖️ Utility & Sparsity Filter<br/>Mutual info, SHAP, gating"]
F --> G["🖼️ Expanded VPM Image<br/>Versioned feature packs"]
G --> H["🚀 High-Speed Recall<br/>HNSW/ANN across all VPMs"]
end
classDef seed fill:#e6f7ff,stroke:#1890ff,stroke-width:2px;
classDef scorers fill:#f6ffed,stroke:#52c41a,stroke-width:2px;
classDef embeddings fill:#fff7e6,stroke:#fa8c16,stroke-width:2px;
classDef bank fill:#f9f0ff,stroke:#722ed1,stroke-width:2px;
classDef discovery fill:#fff2e8,stroke:#ff7a45,stroke-width:2px;
classDef filter fill:#f0fffe,stroke:#13c2c2,stroke-width:2px;
classDef output fill:#fff0f6,stroke:#eb2f96,stroke-width:2px;
classDef search fill:#e6f7ff,stroke:#1890ff,stroke-width:2px;
class A seed;
class B scorers;
class C embeddings;
class D bank;
class E discovery;
class F filter;
class G output;
class H search;
This diagram shows how the Jitter’s cognitive measurement explodes from 17 foundational metrics into thousands of dynamic coordinates in thought-space. The process starts with seed vitals (ssp.v1), expands through multiple scorer ensembles and embedding models into a rich feature bank, then uses VPM-ViT to auto-discover emergent patterns. Utility filtering keeps the system fast by pruning low-value features, while versioned packs ensure backward compatibility. The final expanded VPM images become searchable coordinates that let the Jitter navigate across billions of historical thought strands at lightspeed.
📈 How we expand concretely
-
Add scorers → more channels Pipe outputs from HRM, SICQL, EBT, SVM, LLM judges, MARS diagnostics into the metric vector (normalized to [0,1], namespaced like
hrm.*,sicql.*,mars.*). More information should lead ot better decisions. -
Append embeddings → high-dimensional context Attach dense vectors (e.g., HNet, HF, MXBAI) alongside metrics. These don’t need [0,1]; we store min/max for robust scaling into VPM.
-
Auto-discover features → emergent signals Train a small VPM-ViT to read VPMs and emit new features (e.g., field roughness, cluster density, drift, stability bands). These become first-class metrics (namespaced
vpm.*), gated by measured utility. -
Speed through similarity: The Jitter uses metric signatures to navigate thought space at near-lightspeed. When exploring a new idea (metric vector X), it instantly retrieves the 2,000 most similar historical thought strands from world-scale knowledge. Most paths die out (like your rejected thoughts), but occasionally, a dormant strand leads to something new.
-
Utility-driven trimming → stay fast Maintain a feature bank; keep only features with demonstrated value (predictive lift, calibration gain, control stability). Everything else stays archived for recall.
-
Governance → never break readers Group new features into versioned packs (
ssp.v2,ssp.v2+emb.hnet,ssp.v2+vpmvit). The image transport (VPM) remains stable; consumers can request packs they understand.
🧊 Memcube: Where Dormant Thought Strands Become Future Insights
The real power of our metric system isn’t in the numbers it’s in how they anchor complete thought processes in the Memcube.
When the Jitter explores a path that seems unproductive today (e.g., “building an app that tells you what to eat”), it doesn’t discard the exploration. Instead:
- It stores the full metric signature of the thought process
- It preserves the exploration context (what prompted it, what paths were tried)
- It indexes by semantic similarity for future retrieval
Years later, when you’re working on nutrition AI, the system recognizes: “This old exploration suddenly has high relevance!” The metric signature becomes a retrieval key for dormant insights.
This is how we honor your insight: “Nothing’s lost.” Even the 99% of processing that seems to go nowhere becomes valuable data for future cognition.
🔵 Minimal config shape (illustrative)
ssp:
metrics:
version: "ssp.v1"
seeds: ["reward","verified","curriculum_difficulty", ... "rag_verification"]
packs:
- name: "emb.hnet.768"
dims: 768
scaler: "robust01"
- name: "scorer.hrm.core"
dims: ["hrm.score","hrm.uncertainty","hrm.depth"]
- name: "vpmvit.auto"
dims: ["vpm.field_roughness","vpm.cluster_cohesion","vpm.drift01"]
selection:
method: "mi+calibration_gain"
budget: 2048 # max active dims per VPM row
Bottom line: the contract isn’t the metric list it’s the transport and versioning. VPM stays the lingua franca; the metric swarm can grow, specialize, and self-edit without breaking time-travel comparability.
🎁 Why keep the seeds at all? The cognitive heartbeat
The seed vitals aren’t just technical anchors they’re the heartbeat of cognition.
Just as your heart beats steadily while your thoughts wander freely, these metrics provide:
- A steady rhythm for the Jitter’s cognitive process
- Anchor points for comparing thought quality across time
- A pulse to measure against when exploring new dimensions
They’re not the entire mind they’re the vital signs that tell us the mind is alive and growing.
🧸 Minimal pseudocode
scorable = SSPScorable(
episode_id=episode_id,
question=q,
seed_answer=seed, # for F1 + leakage checks
predicted_answer=pred,
evidence_docs=evidence_docs, # for search_turns + rag gate
solver_steps=steps,
depth=depth,
difficulty=difficulty01, # already in [0,1]
reward=verifier01, # judge score in [0,1]
verified=bool(solver_wins),
score=score01, # optional
best_score=best01, # optional
meta={"novelty": novelty01,
"search_turns": k,
"rag_verified": bool_rag,
"noise_doc_count": n_noise,
"noise_success": succ01},
)
metrics = SSPScorer(cfg).score(scorable) # -> {'version','names','values','vector'}
🪀 VPMS: Images as Thoughts (making cognition trainable)
The Jitter isn’t a hidden essence it’s a visible stream. We treat each SSP episode as a frame in that stream and standardize how it’s rendered.
How a thought becomes an image
- Metric vector → VPM frame. The 12 canonical metrics are mapped to a compact grayscale layout (bands in fixed order). Same metrics → same pixels → stable meaning.
- Frames → filmstrip. Episodes over time form a timeline we can skim like an ECG of cognition.
- Filmstrip → embedding. A small vision model (VPM-ViT) learns to read frames and predict outcomes (risk class, success odds, good next move).
- Embedding → control. The controller uses those predictions to pick exemplars, adjust depth/steps, stop early, or escalate.
flowchart TD
subgraph VPM_Processing_Pipeline ["🔄 VPM Processing Pipeline<br/> From Metrics to Action"]
E["📊 Episode<br/>12-name metrics<br/>cognitive dimensions"] -->|"🎯 metric vector<br/>[0,1] normalized"| F["🖼️ VPM Frame<br/>grayscale image<br/>fixed layout"]
F -->|"🔄 single thought moment"| FS["🎞️ Filmstrip<br/>sequence over time<br/>cognitive timeline"]
FS -->|"📺 visible thought stream"| EMB["🧠 Visual Embedding<br/>(VPM-ViT)<br/>pattern recognition"]
EMB -->|"🔍 learned patterns<br/>risk prediction"| CTRL["🎛️ Control Policy<br/>goal/thresholds<br/>strategic adjustment"]
CTRL -->|"⚡ decisions<br/>adaptive tuning"| NEXT["⚙️ Next Episode Config<br/>improved parameters"]
end
classDef metrics fill:#e6f7ff,stroke:#1890ff,stroke-width:2px;
classDef frame fill:#f6ffed,stroke:#52c41a,stroke-width:2px;
classDef filmstrip fill:#fff7e6,stroke:#fa8c16,stroke-width:2px;
classDef embedding fill:#f9f0ff,stroke:#722ed1,stroke-width:2px;
classDef control fill:#fff2e8,stroke:#ff7a45,stroke-width:2px;
classDef config fill:#f0fffe,stroke:#13c2c2,stroke-width:2px;
classDef arrow fill:#ffffff,stroke:#666666,stroke-width:1px;
class E metrics;
class F frame;
class FS filmstrip;
class EMB embedding;
class CTRL control;
class NEXT config;
This is the Jitter’s learning loop: cognitive metrics become visual frames, frames form memory filmstrips, and our VPM-ViT model reads these patterns to guide smarter thinking in future episodes closing the circle from thought to self-improvement.
Why images (not just numbers)?
- Stability: pixels freeze semantics across models and years.
- Interpretability: patterns of success/failure are obvious at a glance.
- Trainability: vision backbones are excellent at learning from small, structured images.
- Composability: frames can be linked temporally (what happened next?) and by similarity (what does this feel like?), forming a thought-graph that becomes the system’s style/personality.
What this buys us
- A memory of moments that is cheap to store, search, and replay.
- A visual dialect for the Jitter images → images → images that the system can both read and act on.
- A closed loop: see → decide → act → see, where the seeing is literally pixels.
🎞️ An example film strip result

This image in an example generated filmstrip form our process. As time goes on the data becomes stronger generating a whiter result
The Jitter we’re building is not a soul or a secret essence it’s a stream. A living accumulation of moments that passes through perception, recall, insight, correction. Our claim is simple:
If we can represent each moment as an image, and make those images comparable, connectable, and trainable, then we can grow a visible, continuous thinking process.
Here’s how we do it step by step.
flowchart TD
subgraph Thought_Generation ["🔁 The Thought Lifecycle"]
Q["❓ Question<br/>What reality asks"] --> S1["🔍 Search<br/>Gather evidence"]
S1 --> S2["💡 Solve<br/>Reason & construct answer"]
S2 --> P["✅ Prove<br/>Verify & score"]
P --> M["📊 Metrics → Metric Vector<br/>12 cognitive dimensions"]
M --> V1["🖼️ RAW VPM<br/>Direct metric mapping"]
M --> V2["🎨 PHOS VPM<br/>Sorted pattern view"]
V1 --> C["🎛️ Controller<br/>Learning & steering"]
V2 --> C
C --> Q2["🔄 Next Episode<br/>Improved question"]
end
classDef question fill:#e6f7ff,stroke:#1890ff,stroke-width:2px;
classDef process fill:#f6ffed,stroke:#52c41a,stroke-width:2px;
classDef metrics fill:#fff7e6,stroke:#fa8c16,stroke-width:2px;
classDef vpm fill:#f9f0ff,stroke:#722ed1,stroke-width:2px;
classDef controller fill:#fff2e8,stroke:#ff7a45,stroke-width:2px;
classDef next fill:#f0fffe,stroke:#13c2c2,stroke-width:2px;
class Q,Q2 question;
class S1,S2,P process;
class M metrics;
class V1,V2 vpm;
class C controller;
The complete thought lifecycle: each episode moves from question through search, solution, and verification, then transforms cognitive metrics into dual visual representations (RAW and PHOS VPMs) that inform the controller’s decisions for the next, improved thought cycle. the NTH
🤔 1) Define a thought (as data you can revisit)
Every SSP episode is one “moment.” It contains:
- the question (what reality just asked us),
- the answer (what we tried),
- the evidence (what we looked at),
- the trace (how we got there),
- and a deterministic metric vector our vital signs in
[0,1](verifier score, verified flag, difficulty, depth, steps, etc.).
This metric vector is the contract. Whether we score it with an LLM today or a custom model two years from now, it means the same thing and lives in the same positions. That makes the moment persistent.
🧑🎨 2) Render the thought (as a compact image)
We convert that metric vector into a VPM frame a small grayscale image where each band corresponds to a metric in a fixed order. It’s like an ECG for cognition: fast to write, fast to read, and always the same layout.
- Same order → same pixels → same meaning.
- One frame per episode; a sequence of frames becomes a filmstrip the visible heartbeat of a run.
⚖️ 3) Compare thoughts (find neighbors and patterns)
With images, similarity is natural. We can:
- compute simple distances (cosine / L2) on flattened frames,
- or learn visual embeddings (e.g., our VPM-ViT) so similar cognitive states sit close together in latent space.
Now we can answer questions like:
- When do we succeed in the same way?
- What does failure “look” like?
- Which adjustments lead to recovery?
🔗 4) Connect thoughts (make a path, not a pile)
A stream is not a bucket. We connect frames into traces:
- Temporal links (episode → episode) show continuity.
- Similarity links (nearest neighbors) show related states across runs.
- Causal hints (verification flips, local gap closures) mark why we moved.
These links form a thought-graph: clusters of stable strategies, bridges of recovery, attractors we keep returning to. Over time, that graph is the personality of the system.
🏋️♂️ 5) Train on the stream (so the stream gets better)
Because thoughts are images, we can train directly on the filmstrip:
- A small vision model (our VPM-ViT) learns to read frames and predict outcomes (risk class, success odds, suggested next move).
- The controller uses these predictions to nudge the next thought (choose an exemplar, adjust depth, stop early, escalate).
- The new outcome creates the next frame closing the loop.
That’s the organism: see → decide → act → see again, with pictures as the lingua franca.
🏢 What this builds toward
- A memory of moments you can replay, compare, and learn from.
- A visual dialect for the Jitter images → images → images 🔄 that lets it recognize itself across time.
- A playground where self-play generates experience, metrics turn it into images, and images teach the next move.
This is the first step. Next, we’ll show the SSP loop that emits these frames, the exact metric vector we use, and how the VPM controller learns to steer so the stream doesn’t just flow, it improves.
🧩 How this fits the bigger picture
We’ve already built pieces Stephanie needs:
- Multi-dimensional scoring and knowledge measurement
- An image-first worldview: VPM and timelines generally when you think visual think Zeromodel.
- The infrastructure to remember, compare, and improve
This post plants the first stake: SSP as the cognitive heartbeat. Next, we’ll show how Jitter stabilizes (homeostasis), how VPM-ViT learns directly from those images, and how the system’s identity emerges as the history of its own thinking visible, measurable, and getting better.
⚽ The Self-Play Loop: A Digital Organism’s Metabolism
flowchart TD
subgraph SSP_Metabolism ["🔄 Self-Play Metabolism: The Jitter's Cognitive Engine"]
P["🧠 Proposer<br/>Generate challenges"] -->|"📝 question +<br/>📚 evidence"| S["🔍 Solver<br/>Search & reason"]
S -->|"💡 answer +<br/>🔄 steps"| V["✅ Verifier<br/>RAG verification"]
V -->|"🎯 score &<br/>⚖️ decision"| M["📊 Metrics Calculator<br/>17 cognitive dimensions"]
M -->|"🎨 vector"| W["🖼️ VPM Generator<br/>Raw + PHOS views"]
W -->|"🎬 raw + PHOS"| F["📺 Filmstrip<br/>Visible thought stream"]
F --> G["🎞️ GIF/Video<br/>Cognitive timeline"]
V -->|"📝 feedback"| P
end
classDef proposer fill:#e6f7ff,stroke:#1890ff,stroke-width:2px;
classDef solver fill:#f6ffed,stroke:#52c41a,stroke-width:2px;
classDef verifier fill:#fff7e6,stroke:#fa8c16,stroke-width:2px;
classDef metrics fill:#f9f0ff,stroke:#722ed1,stroke-width:2px;
classDef vpm fill:#fff2e8,stroke:#ff7a45,stroke-width:2px;
classDef filmstrip fill:#f0fffe,stroke:#13c2c2,stroke-width:2px;
classDef output fill:#fff0f6,stroke:#eb2f96,stroke-width:2px;
class P proposer;
class S solver;
class V verifier;
class M metrics;
class W vpm;
class F filmstrip;
class G output;
- What you’re seeing: This is the Jitter’s cognitive metabolism a continuous cycle where the system generates its own challenges, solves them, verifies the solutions, and learns from the process. The Proposer creates questions, the Solver searches for answers, the Verifier checks their quality, and the Metrics system converts this into visual thought patterns (VPMs) that form a visible filmstrip of cognition. The feedback loop ensures each cycle builds on the last, creating a self-improving stream of thought that gets progressively more capable.*
🎶 The SSP Algorithm: Orchestrating the Digital Thought Stream
The heart of our Jitter system is the SSP Algorithm - the conductor that coordinates the Search-Solve-Prove process to create a visible, measurable thought stream. Let’s examine how this orchestrator works and why it’s the perfect engine for our digital organism.
✨ How the SSP Loop Creates a “Thought”
At its core, an SSP episode is one moment of cognition - a complete cycle of encountering a problem, processing it, and verifying the solution. This mirrors how our own thoughts form:
- Search (Proposer): Like our mind generating a question from a seed idea
- Solve (Solver): Like our mind gathering evidence and reasoning
- Prove (Verifier): Like our mind checking if the answer makes sense
Here’s the elegant simplicity of the loop:
async def run_episode(self, seed_answer: str, context: Dict[str, Any]) -> EpisodeTrace:
# 1. Proposer: create a question from a seed answer
q, prop_evidence, prop_meta = await self.proposer.propose(seed_answer, context)
# 2. Solver: answer using search (like our mind gathering evidence)
pred, evidence_docs, solver_steps, solver_meta = await self.solver.solve(
question=q, seed_answer=seed_answer, context=context
)
# 3. Verifier: check if the answer is correct (like our mental verification)
solver_wins, judge_score, judge_details = await self.verifier.verify(
q, seed_answer, pred, evidence_docs, context
)
# 4. Create a permanent record of this "thought"
ep = EpisodeTrace(
episode_id=episode_id,
seed_answer=seed_answer,
question=q,
predicted_answer=pred,
evidence_docs=evidence_docs,
verified=bool(solver_wins),
reward=float(judge_score),
# ...other metadata
)
# 5. Convert the thought into visual form (VPM)
if self.vpm_visualization:
self.vpm_visualization.snapshot_progress(unit=episode_id, dims=dims, step_idx=0)
This creates what we call a thought moment - a self-contained cognitive event that can be stored, compared, and learned from.
➡️ Why This Implementation Aligns with the SSP Paper
Our implementation directly implements the paper’s core innovation: self-play without supervision. As the paper states:
“Through RL training with rule-based outcome rewards, SSP enables two roles to co-evolve in an adversarial competition: the proposer learns to generate increasingly challenging problems that require search and reasoning, while the solver develops stronger search and reasoning capabilities to tackle these problems.”
Here’s how our code embodies this:
1. The Critical RAG Verification Process
The paper emphasizes: “To verify the correctness of each generated query, we collect all the searching results from the proposer’s trajectory as the external materials, then conduct a retrieval augmentation generation (RAG) to check if the solver can successfully predict the answer with all necessary information.”
In our code:
# RAG verification: did solver beat the seed using ONLY the evidence?
solver_wins, judge_score, judge_details = await self.verifier.verify(
q, seed_answer, pred, evidence_docs, context
)
This is the quality gate that prevents degeneration - without it, the system would quickly learn to generate unanswerable questions or rely on internal knowledge rather than search.
2. Tracking Meaningful Capability Growth
The paper shows in Figure 4: “the average number of search tool calls per trajectory steadily increases over time… Simultaneously, Figure 4b shows that the solver’s response length also grows during the training, suggesting it learns to generate more detailed and comprehensive answers.”
Our metrics system captures exactly these signals:
# Track search capability growth
self.metrics.avg_solver_steps = (
self.metrics.avg_solver_steps * (verified_count - 1) + ep.solver_steps
) / verified_count
# Track reasoning depth (via evidence usage)
evid_cnt = len(scorable.evidence_docs or [])
vmap["ssp.evidence_count"] = _clamp01(evid_cnt / max(1, self.max_evidence))
These metrics aren’t just numbers - they’re visible indicators of cognitive growth that we convert to VPM images.
3. The Self-Play Reward Dynamics
The paper warns: “This experiment critically underscores that the proposer’s reward design is paramount for stable co-evolution in SSP; a punitive approach can destabilize the entire self-play dynamic.”
Our implementation handles this carefully:
# Only compute rewards for verified episodes (paper's game signal)
if ep.verified:
self._calculate_and_apply_rewards([ep], unverified_count=0)
def _calculate_and_apply_rewards(self, verified_episodes, unverified_count):
rewards = calculate_self_play_rewards(verified_episodes, unverified_count)
# Apply to episodes...
We’ve implemented the paper’s insight that only valid episodes should contribute to training - otherwise the system degenerates.
🚰 The Thought Visualization Pipeline
Here’s where we extend beyond the paper to create the Jitter’s visible thought stream:
# After creating the EpisodeTrace
scorable = SSPScorable.from_episode_trace(ep)
ssp_metrics = self._ssp_scorer.score(scorable) # Get canonical metrics
# Convert thought to visual form
if self.vpm_visualization:
# Create initial snapshot
self.vpm_visualization.snapshot_progress(
unit=episode_id,
dims=ssp_metrics["vector"],
step_idx=0,
tag="proposed"
)
# Generate final visualizations
raw_path = self.vpm_visualization.generate_raw_vpm_image(unit=episode_id)
phos_path = self.vpm_visualization.generate_phos_image(unit=episode_id)
film_path = self.vpm_visualization.generate_filmstrip(unit=episode_id)
This is the magic: converting cognitive metrics into visual frames that form our filmstrip of thought. Each metric becomes a pixel band in the VPM frame:
ssp.verified→ Success channelssp.search_turns→ Search capability channelssp.f1_score→ Accuracy channelssp.noise_tolerance→ Robustness channelssp.rag_verification→ Quality gate channel
💚 A “Visual” Thought Stream
The true innovation isn’t just the SSP loop itself, but how we connect these episodes into a continuous stream:
- Temporal connection: Each episode leads to the next
- Similarity connection: VPM frames allow us to find similar cognitive states
- Causal connection: Verification results guide future proposals
As the paper notes: “In stark contrast to the flawed dynamics of fixed-opponent training, our complete SSP framework facilitates a stable co-evolution.” Our implementation takes this further by making the co-evolution visible and measurable through VPM.
This is how we create the Jitter - not as a mysterious “self,” but as a visible, persistent stream of connected thought moments, each one a complete Search-Solve-Prove cycle that can be stored, compared, and improved upon.
🤸 Next Steps in Our Journey
In upcoming sections, we’ll dive into each component:
- The Proposer: How we generate questions that create meaningful challenges
- The Solver: Our enhanced search capabilities (including the GPO tree search you mentioned)
- The Verifier: Our multi-signal verification process that goes beyond the paper
- The VPM System: How we turn metrics into visual thought streams
Each of these components plays a vital role in creating the Jitter - the visible, measurable thought process that is the heart of our digital organism. The SSP algorithm is simply the conductor that brings them all together in harmony.
–
🕵️♂️ Module 1 Searching Proposer
Goal: turn a mechanism/seed answer into a single, precise, verifiable question, backed by a small pile of evidence snippets, so the rest of SSP has something rigorous to solve and prove.
🎬 What this proposer does
-
Search first, ask later. It generates a few lightweight query rewrites from the seed (e.g., “What is X?”, “How does X work?”), calls the
SolutionSearchservice to fetch top-K snippets, and de-duplicates them. -
Constrain the LLM to a 4-line contract. It then prompts the LLM with the seed + evidence and forces a 4-line output:
rationale: ... difficulty: <0–100> verifiability: <0–100> question: <one precise, verifiable question>We parse this strictly, so downstream components receive clean
difficulty/verifiabilityints and a single normalizedquestion. -
Apply safety rails and fallbacks.
- Min length: if the question is too short/empty, fall back to
What is <seed_answer>?(never breaks downstream). - Answer-leak guard: if the exact seed appears in the question text, swap it with “this mechanism”.
- Retries with backoff on transient prompt failures.
- Min length: if the question is too short/empty, fall back to
-
Emit a tiny VPM frame for visibility. The proposer logs a frame via
VPMControlService.decide()with dims like:evidence_quality = clip(len(evidence)/max_snippets)question_length = clip(len(question)/100)These become part of the filmstrip so you can see proposal quality over time.
👨💻 Key code paths
1) Evidence-aware question crafting
rewrites = [
f"What is {seed_answer}?",
f"Explain {seed_answer} in detail",
f"How does {seed_answer} work?",
# + optional user patterns via config
]
snippets = await self.solution_search.find_snippets(rewrite, top_k=...)
# ...
prompt = self.prompt_loader.from_text(PROPOSER_PROMPT_TMPL, {
"seed_answer": seed_answer,
"evidence": "<br/>".join(all_evidence),
})
response = await self.prompt_service.run_prompt(prompt_text=prompt, context=merged_context)
parsed = parse_proposer_lines(response)
question = self._normalize_question(parsed.get("question", ""))
Why it matters: questions are grounded in retrieved context, not free-floating completions. This reduces trivia, improves verifiability, and keeps the loop honest.
2) Hard-contract prompt (4 lines, deterministic)
PROPOSER_PROMPT_TMPL = """You are building an SSP dataset...
OUTPUT FORMAT WRITE EXACTLY FOUR LINES, IN THIS ORDER, NO CODE FENCES:
rationale: <...>
difficulty: <0-100>
verifiability: <0-100>
question: <...>
"""
Why it matters: strict structure → stable parsing → deterministic telemetry & metrics.
3) Question normalization + leak guard
# normalize "???" → "?"
text = re.sub(r"\?+", "?", text).strip()
if text and not text.endswith("?"):
text += "?"
# replace explicit seed with "this mechanism"
pattern = re.compile(re.escape(seed_answer), re.IGNORECASE)
q2 = pattern.sub("this mechanism", q)
Why it matters: keeps the task non-degenerate (no “just repeat the answer”).
4) VPM tap (proposer heartbeat)
self.vpm_control.decide(
unit=f"proposer:{(hash(seed_answer) & 0xffff):04x}",
kind="text",
dims={
"evidence_quality": min(1.0, len(all_evidence) / max(1, self.max_snippets)),
"question_length": min(1.0, len(question) / 100.0),
},
step_idx=ctx.get("step_idx", 0),
meta={ "seed_answer": seed_answer, "evidence_count": len(all_evidence), "latency_s": dt }
)
Why it matters: every proposal becomes a visible frame in the SSP filmstrip. You can spot bad proposals (short, low evidence) at a glance.
🌴 Tree search tie-in
Although the tree primarily lives in the solver, the proposer helps shape the search frontier by:
- producing multiple rewrites (diverse initial branches),
- delivering evidence snippets the solver can attach to nodes,
- and emitting
difficulty/verifiabilitysignals that can seed per-question curriculum (deeper trees for easy items, wider for uncertain ones).
⚙️ Config knobs (sane defaults)
proposer.rewritesnumber of query rewrites (default 3)proposer.max_snippetsevidence cap (default 6)proposer.min_question_lendrop too-short candidates (default 12 chars)proposer.forbid_answer_leakanonymize seed in question (default True)proposer.retries+proposer.backoff_secprompt robustness
Extensibility: you can add proposer.additional_rewrites = ["Mechanism of {seed_answer}", ...] in config no code change.
💍 Interface contract (so we can swap proposers)
All proposers should implement:
async def propose(seed_answer: str, context: EpisodeContext | None) \
-> tuple[str, list[str], dict]:
"""Return (question, evidence_docs, meta)"""
def get_capabilities() -> dict:
return {
"supports_search_during_proposal": True,
"max_evidence_docs": self.max_snippets,
"min_question_length": self.min_question_len,
}
That means later we can plug in:
- Template Proposer (no LLM, pure rules)
- Paper-aware Proposer (specialized for technical mechanisms)
- Adversarial Proposer (intentionally tricky variations)
…and keep the rest of SSP unchanged.
🎉 Why this design works
- Grounded (retrieved evidence guides the question).
- Deterministic enough (strict output schema + normalization).
- Robust (retries, fallbacks, leak guard).
- Visible (VPM logs make quality legible).
- Composable (clean interface → easy to swap/extend).
Next module: the Solver how the tree search expands candidates, uses the evidence, and produces a trace we can score and visualize. But first we need to describe a component that makes this work.
🌳 Agentic Tree Search: The Cognitive Engine of the Jitter
“The unexamined thought is not worth thinking.”
Adapted from Socrates
While our VPM system gives the Jitter eyes to see its thoughts, the Agentic Tree Search (ATS) provides the cognitive engine that generates those thoughts. This is where the Jitter transforms from a passive observer into an active thinker where it engages with the world, gathers evidence, and constructs understanding.
🌀 The Thought Generation Problem
The SSP paper poses a fundamental challenge: How can an agent learn to solve complex problems without supervision? It answers this with a self-play framework where:
“The proposer learns to generate increasingly challenging problems that require search and reasoning, while the solver develops stronger search and reasoning capabilities to tackle these problems.”
But how does this actually work in practice? How does the solver translate a question into a chain of reasoning that leads to an answer? This is where Agentic Tree Search becomes the cognitive engine of our Jitter.
🚀 The Cognitive Architecture of Thought
At its core, ATS implements what cognitive scientists call guided exploration the process by which humans solve unfamiliar problems:
- Problem decomposition: Breaking a question into manageable parts
- Hypothesis generation: Creating potential paths to an answer
- Evidence gathering: Seeking relevant information for each path
- Evaluation: Determining which paths show promise
- Synthesis: Combining the most promising evidence into a coherent answer
While the SSP paper frames the method as a proposer–solver self-play game, we instantiate each episode as a tree-search control problem (ATS) and layer SSP’s verified rewards on top.
flowchart TD
A["🌳 Root Question<br/>'What causes climate change?'"] --> B["🔄 Rewritten Query 1<br/>'Explain climate change mechanisms'"]
A --> C["🔄 Rewritten Query 2<br/>'Describe climate change in practical terms'"]
A --> D["🔄 Rewritten Query 3<br/>'Climate change causes for beginners'"]
B --> B1["📄 Evidence Snippet 1<br/>'Greenhouse gases trap heat...'"]
B --> B2["📄 Evidence Snippet 2<br/>'Industrial emissions contribute...'"]
B --> B3["📄 Evidence Snippet 3<br/>'Natural climate cycles...'"]
C --> C1["📄 Evidence Snippet 1<br/>'Climate change manifests as...'"]
C --> C2["📄 Evidence Snippet 2<br/>'Temperatures have risen...'"]
D --> D1["📄 Evidence Snippet 1<br/>'Climate change basics: CO2...'"]
classDef question fill:#e6f7ff,stroke:#1890ff,stroke-width:2px;
classDef query fill:#f6ffed,stroke:#52c41a,stroke-width:2px;
classDef evidence fill:#fff7e6,stroke:#fa8c16,stroke-width:2px;
class A question;
class B,C,D query;
class B1,B2,B3,C1,C2,D1 evidence;
This tree search visualization shows how the Jitter explores multiple reasoning paths simultaneously rewriting the original question into different perspectives, then gathering relevant evidence for each approach. This branching exploration mirrors human problem-solving, where we consider various angles before converging on the most promising solution.
This tree structure mirrors how our own minds work when tackling complex questions. We don’t just magically produce answers we explore multiple angles, gather evidence, and refine our understanding as we go.
🪟 Making Cognitive Growth Visible: What We Do and Why
SSP reports simple but telling signals of capability growth (e.g., more search calls per trajectory; longer, more detailed answers). We make those signals legible in our system by instrumenting our Agentic Tree Search (ATS) and turning each episode into a visual, comparable thought moment.
🤷 What we do
-
Instrument the search For every episode we log:
search_turns(actual search tool calls)solver_steps(actions taken)depth(max explored depth in ATS)evidence_count(documents accepted into the rationale)verified+verifier_score(RAG/judge gate)- Length features (
question_len,answer_len) - Optional quality/robustness (
format_compliance,noise_tolerance,rag_verification,novelty, etc.)
-
Emit a deterministic metric vector Metrics are normalized to
[0,1], fixed in name and order, and versioned. That makes an episode today directly comparable to one months from now. -
Render the moment as an image We convert the metric vector into a tiny VPM frame (and a PHOS variant). A sequence of frames forms a filmstrip a visible record of how reasoning evolves across steps and runs.
-
Close the loop with control A lightweight policy reads frames (or VPM-ViT embeddings) to decide stop/expand/escalate: e.g., continue search, reuse a strong exemplar, or early-stop when verification is stable.
🧐 Why we do it
- Legibility: You can see capability changes (e.g., rising
search_turnswith stableverified) rather than infer them from logs. - Comparability: The fixed, versioned vector means runs are apples-to-apples across time, models, and settings.
- Control: Visual signals feed simple policies (and the VPM-ViT) to steer search depth, evidence acceptance, and stopping criteria.
- Diagnosis: Patterns reveal failure modes fast over-searching (high
search_turns, lowverified), shallow reasoning (lowdepth, shortanswer_len), brittle RAG gates, etc.
👓 How to read our visuals
- Brighter bands in
search_turnsandanswer_lenwith a consistently brightverifiedband = healthier, more deliberate reasoning. - Depth stabilizing while
evidence_countstays moderate often indicates better targeting (less flailing, more proof). - PHOS layouts highlight recurring “good shapes” (stable regimes) and drift when curriculum difficulty rises.
flowchart LR
subgraph Episode ["🎬 Single Thought Episode"]
A["🌳 ATS Search<br/>nodes, depth, evidence"] --> B["✅ Verify<br/>RAG / judge scoring"]
B --> C["📊 Deterministic Metrics<br/>fixed names/order"]
C --> D["🖼️ VPM Frame<br/>+ PHOS visualization"]
end
D --> E["🎛️ Controller<br/>stop/expand/escalate"]
E -->|"⚡ policy choice"| A
classDef search fill:#f6ffed,stroke:#52c41a,stroke-width:2px;
classDef verify fill:#fff7e6,stroke:#fa8c16,stroke-width:2px;
classDef metrics fill:#f9f0ff,stroke:#722ed1,stroke-width:2px;
classDef vpm fill:#e6f7ff,stroke:#1890ff,stroke-width:2px;
classDef controller fill:#fff2e8,stroke:#ff7a45,stroke-width:2px;
class A search;
class B verify;
class C metrics;
class D vpm;
class E controller;
This closed-loop system shows how each cognitive episode becomes a measurable, visual thought. The Agentic Tree Search explores reasoning paths, verification scores the quality, metrics capture the cognitive signature, and the VPM frame makes it visible. The controller then uses this visual feedback to make real-time decisions stopping unproductive searches, expanding promising ones, or escalating difficult problems creating a self-adjusting thought process that learns from its own patterns.
In short: we record, normalize, and picture the thinking so the Jitter isn’t a mystery “self,” but a visible, measurable stream of connected thought moments that we can compare, control, and train.
✅ Module 2 ATSSolver: Building the Cognitive Engine
Now that we’ve established the research foundation, let’s dive into how we’ve implemented Agentic Tree Search in our system. The ATSSolver is the workhorse that transforms questions into answers through guided exploration.
👯 What It Is: Two Modes of Thinking
The ATSSolver operates in two distinct cognitive modes, mirroring how humans approach problems differently depending on context:
1. Deep Search Mode (Thinking with Exploration)
async def solve(self, question: str, seed_answer: str, context: EpisodeContext) -> Tuple[str, List[str], int, Dict[str, Any]]:
# Builds and scores a search tree over query rewrites + evidence snippets
# Returns the best answer found through exploration
This is the Jitter’s “thinking hard” mode when it needs to solve a genuinely challenging problem. It constructs a tree of potential reasoning paths, evaluates evidence for each, and synthesizes the most promising answer.
2. Evidence-Only Mode (Thinking with Constraints)
async def solve_with_evidence(self, question: str, evidence_docs: List[str], context: EpisodeContext) -> Tuple[str, Dict[str, Any]]:
# Answers strictly using provided evidence (no search)
# Used for verification and ablation studies
This is the Jitter’s “test-taking” mode when it must answer based only on given information. It’s critical for the verification step in our SSP loop, ensuring answers are grounded in evidence.
💧 The Data Flow: How Thoughts Are Constructed
Here’s how the solver integrates with the broader system:
sequenceDiagram
participant Proposer as 🧠 Proposer
participant ATSSolver as 🌳 ATSSolver
participant SolutionSearch as 🔍 SolutionSearch
participant Reward as 📊 Reward Head
participant VPM as 🎬 VPM Service
Note over Proposer, VPM: 🚀 Thought Generation Cycle
Proposer->>ATSSolver: 📨 question, seed_answer, context
ATSSolver->>SolutionSearch: 🔄 rewritten queries
SolutionSearch-->>ATSSolver: 📚 evidence snippets
loop 🔁 For each depth
ATSSolver->>ATSSolver: 🎯 Score evidence snippets
ATSSolver->>ATSSolver: 📍 Track best path
ATSSolver->>VPM: 📈 Push cognitive metrics
end
ATSSolver->>Reward: 💡 question, predicted_answer, evidence
Reward-->>ATSSolver: 🏆 quality signals
ATSSolver-->>Proposer: 📤 predicted_answer, evidence, metrics
Note right of ATSSolver: 🔄 Cycle continues with<br/>improved context & metrics
This sequence shows the real-time cognitive collaboration between components: the Proposer initiates thinking with a question, the ATSSolver orchestrates evidence gathering through multiple search iterations, quality signals are evaluated, and visual metrics are captured at each step. The loop demonstrates how each thought builds upon the last, with continuous quality assessment and visual feedback driving the Jitter’s progressive improvement.
🗝️ Key Implementation Insights
1. The Tree Node Structure
Each cognitive step is represented by a structured node:
@dataclass
class Node:
id: str
parent_id: Optional[str]
root_id: str
depth: int
sibling_index: int
node_type: str # "root", "rewrite", etc.
query: str # The rewritten question
score: float # Evidence relevance score
context: str # Retrieved evidence snippet
task_description: str
This structure captures the essence of the Jitter’s thought process each node records not just what was thought, but how it connects to previous thoughts.
2. Query Rewriting: Expanding the Search Space
The solver doesn’t just search once it generates multiple perspectives on the question:
@staticmethod
def _rewrite(query: str) -> List[str]:
return [
query,
query.replace("explain", "describe"),
query + " in practical terms",
]
This simple but powerful technique mirrors how humans reframe problems to gain new insights. The SSP paper validates this approach:
“Through RL training with rule-based outcome rewards, SSP enables two roles to co-evolve in an adversarial competition: the proposer learns to generate increasingly challenging problems that require search and reasoning, while the solver develops stronger search and reasoning capabilities to tackle these problems.”
3. Evidence Scoring: The Cognitive Filter
Not all evidence is equally valuable. The solver uses a relevance score to prioritize promising paths:
@staticmethod
def _overlap_score(text: str, target: str) -> float:
a = {t for t in text.lower().split() if t.isalpha() or t.isalnum()}
b = {t for t in target.lower().split() if t.isalpha() or t.isalnum()}
return len(a & b) / max(len(a | b), 1) if a or b else 0.0
This lexical overlap score is a proxy for how well evidence supports the target answer. Later iterations will replace this with more sophisticated signals (SICQL, HRM, etc.), but the principle remains: the Jitter evaluates evidence quality as it thinks.
4. VPM Integration: Making Thought Visible
The most profound aspect of our implementation is how it captures the cognitive process in real-time:
# After each evidence snippet is scored
dims = {
"reward": prev_best,
"verified": 0.0,
"difficulty": float(context.get("difficulty", 0.3)),
"question_len": _n01(len(q2.split()), 128),
"answer_len": _n01(len(snippet.split()), 128),
"evidence_count": _n01(last_ev_batch, 8),
"solver_steps": _n01(steps, total_steps),
"score": sc,
"best_score": prev_best,
"improvement": max(0.0, sc - prev_best),
"depth": _n01(depth, self.max_depth),
"novelty": _jac(snippet, best.context),
}
self.vpm.snapshot_progress(unit=unit, dims=dims, step_idx=steps, tag=f"depth{depth}")
This code transforms the abstract cognitive process into concrete, visual metrics exactly how the Jitter becomes visible. Each dimension captures a different aspect of the thought process:
improvement: Has this step advanced understanding?novelty: Is this new information or repetition?evidence_count: How thoroughly is the Jitter searching?
🔧 Measurable Improvement
By making the search process visible, measurable, and improvable, we’ve created conditions where a digital thought stream can:
- Explore multiple reasoning paths
- Evaluate evidence quality
- Recognize promising directions
- Synthesize coherent answers
- Learn from its own cognitive patterns
This is how the Jitter moves beyond being a clever chatbot to becoming a genuine cognitive system one that doesn’t just respond to questions, but thinks through them in a visible, measurable way.
In our next section, we’ll explore how the SolutionSearch component implements the actual evidence retrieval, completing the cognitive engine that powers our Jitter.
📡 Module 3 SolutionSearch: The Jitter’s Knowledge Retrieval Engine
“The mark of an educated mind is to be able to entertain a thought without accepting it.”
Aristotle
While the ATSSolver provides the Jitter with its cognitive engine, the SolutionSearch component serves as its knowledge retrieval system the mechanism that allows it to ground its thoughts in evidence rather than mere speculation. This is where the Jitter transforms from a clever chatbot into a genuine cognitive system that can reason with evidence.
🎓 The Knowledge Problem
The SSP paper identifies a fundamental limitation of language models:
“With search tools, we equip the problem-proposer with external information, thereby breaking the limitations of the internal knowledge of LLMs.”
Without access to external knowledge, even the most sophisticated reasoning engine is limited by the model’s training data. The SolutionSearch component solves this problem by providing a reliable, deterministic interface to evidence retrieval that powers the Jitter’s reasoning process.
🐘 A Micro-Retriever with Macro Impact
At first glance, SolutionSearch might seem like just another search tool but it’s actually a carefully engineered component designed specifically for cognitive reasoning:
flowchart LR
subgraph SolutionSearch_Flow ["🔍 SolutionSearch: Evidence Retrieval Engine"]
A["🎯 Query + Seed Answer"] --> B["📋 Prompt Template Selection"]
B --> C{"🎚️ k=1?<br/>(Strict Mode)"}
C -->|"✅ Yes"| D["🧠 Three-Line Prompt:<br/>rationale/score/result"]
C -->|"🔓 No"| E["📝 Multi-Line Prompt:<br/>explicit snippet lines"]
D --> F["⚡ Strict Parser"]
E --> G["🛠️ Flexible Parser<br/>(snippet/JSON/bullets)"]
F --> H["✨ Post-Processing:<br/>Deduplication & Length Caps"]
G --> H
H --> I["📚 Evidence Snippets<br/>Clean, factual snippets"]
I --> J["🌳 ATSSolver Reasoning<br/>Tree search integration"]
end
classDef input fill:#e6f7ff,stroke:#1890ff,stroke-width:2px;
classDef decision fill:#fff7e6,stroke:#fa8c16,stroke-width:2px;
classDef prompt fill:#f6ffed,stroke:#52c41a,stroke-width:2px;
classDef parser fill:#f9f0ff,stroke:#722ed1,stroke-width:2px;
classDef processing fill:#fff2e8,stroke:#ff7a45,stroke-width:2px;
classDef output fill:#f0fffe,stroke:#13c2c2,stroke-width:2px;
class A,B input;
class C decision;
class D,E prompt;
class F,G parser;
class H processing;
class I,J output;
SolutionSearch’s dual-path architecture: depending on the required evidence depth (k=1 for focused “deep thinking” or k>1 for exploratory mode), it routes through specialized prompt templates and parsers to extract clean, factual snippets. This deterministic retrieval process ensures the Jitter’s reasoning is always grounded in external evidence rather than internal model knowledge limitations.
1. Dual Prompt Strategy: Precision vs. Flexibility
SolutionSearch employs two distinct prompt strategies optimized for different cognitive needs:
A) Three-Line Prompt (k=1) The “Deep Thinking” Mode
PROMPT_EVIDENCE_THREE = """
SYSTEM:
You produce ONE short evidence snippet that helps explain or support the SEED_ANSWER
with respect to the QUERY.
CONSTRAINTS:
- Return exactly one short factual snippet (1–2 sentences).
- If unsure, fall back to: "{seed_answer} is the key mechanism."
- No extra text, no markdown, no bullet points.
OUTPUT EXACTLY THREE LINES:
rationale: <1 sentence on why this snippet is relevant>
score: <0-100 confidence you have in this snippet>
result: <the single snippet>
"""
This prompt forces the model into a deliberate, focused mode perfect for when the Jitter needs to deeply consider a single piece of evidence. It mirrors how humans think when they’re trying to understand a complex concept: one idea at a time, with clear reasoning.
B) Multi-Line Prompt (k>1) The “Exploratory” Mode
PROMPT_EVIDENCE_LINES = """
SYSTEM:
You return SHORT evidence snippets that help explain or support the SEED_ANSWER
with respect to the QUERY.
CONSTRAINTS:
- Provide concise, factual snippets (1–2 sentences each).
- No commentary or extra sections.
OUTPUT WRITE EXACTLY {top_k} LINES:
snippet: <short evidence snippet>
"""
This prompt enables broader exploration when the Jitter needs to consider multiple perspectives on a question. It’s like when humans brainstorm multiple approaches to a problem before settling on one.
2. Robust Parsing: Making LLMs Behave
The true genius of SolutionSearch lies in its parser hierarchy a carefully engineered system that extracts clean evidence from the often-messy outputs of language models:
def _parse_snippets(self, response: str, k: int) -> List[str]:
"""
Supported formats (in order of preference):
1) Line-by-line: lines starting with `snippet: ...`
2) JSON: keys 'snippets' | 'docs' | 'evidence' | 'results'
3) Bullets/lines: split by newline, trim bullets
"""
# 1) Explicit 'snippet:' lines (case/space tolerant)
lines = [ln.strip() for ln in response.splitlines() if ln.strip()]
snips: List[str] = []
for ln in lines:
m = re.match(r'(?i)^\s*(?:-|\d+[.)])?\s*snippet\s*[:=]\s*(.+?)\s*$', ln)
if m:
snips.append(m.group(1).strip())
if snips:
return snips[:k]
# 2) JSON (fenced or bare)
m = re.search(r"```json\s*(\{.*?\})\s*```", response, re.DOTALL | re.IGNORECASE)
jtxt = m.group(1) if m else response.strip()
if jtxt.startswith("{") and jtxt.endswith("}"):
try:
obj = json.loads(jtxt)
lst = self._pluck_list(obj)
if lst:
return lst[:k]
except Exception:
pass
# 3) Fallback: plain lines/bullets
bullets = [b.strip(" -*•\t") for b in lines]
bullets = [b for b in bullets if b]
return bullets[:k]
This three-tiered approach ensures reliable output even when the LLM doesn’t follow instructions perfectly. It’s designed to handle the reality of LLM outputs while maintaining the strict formatting required for the Jitter’s cognitive process.
3. Reliability Engineering: Never Return Empty
Perhaps most importantly, SolutionSearch is engineered for cognitive reliability it never returns empty results, which would stall the Jitter’s reasoning process:
def _fallback_snippets(self, query: str, seed_answer: str, k: int) -> List[str]:
"""Conservative, non-empty fallback."""
base = (
f"DOC: For '{query}', a central mechanism is: {seed_answer}. "
f"This snippet highlights why {seed_answer} is relevant."
)
return [base + f" [hit:{i}]" for i in range(k)]
This conservative fallback ensures that the Jitter can always continue thinking, even when evidence is scarce a critical feature for maintaining cognitive flow.
✔️ Evidence based reasoning
The SolutionSearch component embodies what the SSP paper calls the “RAG verification” process:
“To ensure that each generated search query has sufficient information to correctly predict the answer, we collect all the searching results from the proposer’s trajectory as external knowledge, then conduct retrieval-augmentation generation (RAG) to test whether the proposed query can be correctly answered with all necessary search documents provided.”
But for the Jitter, it’s more than just verification it’s the foundation of evidence-based reasoning. Each snippet retrieved through SolutionSearch becomes a building block in the Jitter’s thought process, allowing it to:
- Ground its reasoning in factual evidence rather than internal assumptions
- Evaluate multiple perspectives on a question before forming conclusions
- Build chains of evidence that support its final answer
- Recognize when evidence is insufficient (through low confidence scores)
This is how the Jitter achieves what Aristotle described as “the mark of an educated mind” the ability to entertain a thought while recognizing whether it’s supported by evidence.
💬 The Prompt Service: Fueling the Thought Stream
In our journey to build the Jitter the visible stream of digital thought the Prompt Service is the engine that generates each moment of cognition. It’s not just another LLM wrapper; it’s a sophisticated system designed specifically to support the Search-Solve-Prove process and create the measurable thought moments that form our Jitter.
🎈 Cognitive events
Every “thought” in our digital organism begins as a prompt. The Prompt Service transforms these prompts into measurable cognitive events that can be visualized, compared, and learned from. Without this service, we’d have no way to generate the consistent, comparable thought moments that form our Jitter.
🗝️ Key Capabilities - Aligned with SSP Paper
1. Multi-LLM Competition: The Self-Play Engine
The SSP paper states: “Through RL training with rule-based outcome rewards, SSP enables two roles to co-evolve in an adversarial competition: the proposer learns to generate increasingly challenging problems that require search and reasoning, while the solver develops stronger search and reasoning capabilities to tackle these problems.”
Our Prompt Service implements this exact principle:
async def run_prompt_multi(
self,
prompt_text: str,
*,
models: List[Union[str, Dict[str, Any]]],
judge: Optional[Callable[[Dict[str, str]], Tuple[str, Dict[str, float]]]] = None,
# ...
) -> Dict[str, Any]:
# Query multiple LLMs in parallel
tasks = [asyncio.wait_for(self._acomplete(prompt=prompt_text, model=ms), timeout=request_timeout)
for ms in model_specs]
outs = await asyncio.gather(*tasks, return_exceptions=True)
# Judge selects winner (self-play in action!)
if judge:
winner, scores = judge(outputs)
# ...log for training
This isn’t just running multiple models it’s implementing the proposer-solver dynamic described in the paper, where different model instances compete to produce better outputs, driving co-evolution.
2. Training Event Logging: Building Memory of Thoughts
The paper emphasizes that SSP “does not require question-answer pairs” but instead learns through self-play. Our service enables this by capturing the learning signals:
# Pointwise logging (each output labeled by relative score)
for k, txt in outputs.items():
tes.insert_pointwise({
"model_key": k,
"dimension": dimension,
"query_text": prompt_text,
"candy_text": txt,
"label": 1 if (winner and k == winner) else 0,
# ...
})
# Pairwise logging (winner vs others)
if winner:
pos = outputs[winner]
for k, txt in outputs.items():
if k == winner: continue
tes.insert_pairwise({
"model_key": winner,
"query_text": prompt_text,
"pos_text": pos,
"neg_text": txt,
# ...
})
This creates the memory of thought moments that allows our Jitter to learn from its own cognitive history exactly as the paper’s SSP framework requires for self-supervised improvement.
3. Flexible Model Configuration: Adapting to Cognitive Needs
The SSP paper shows that “placing GRPO on the solver side is more effective than on the proposer side.” Our service supports this nuanced approach through flexible model specification:
@dataclass
class ModelSpec:
name: str
api_base: Optional[str] = None
api_key: Optional[str] = None
params: Optional[Dict[str, Any]] = None
@staticmethod
def from_cfg(default_cfg: Dict[str, Any],
override: Optional[Union[str, Dict[str, Any]]] = None) -> "ModelSpec":
# Handles both default configuration and per-call overrides
# ...
This allows us to use different models or configurations for proposer vs. solver roles critical for implementing the paper’s finding that different RL algorithms work best for different roles.
🙉 How This Creates Visible Thought Moments
The Prompt Service is where the abstract thought process becomes concrete data. Each call generates:
- The cognitive output (the “thought” itself)
- Quality signals (through multi-model competition)
- Measurable metrics (captured in training events)
These elements combine to create what we call a thought moment a self-contained cognitive event with:
- Input (the prompt)
- Output (the response)
- Quality assessment (the winner/scores)
- Learning signals (the training events)
When visualized through VPM, these thought moments form the filmstrip of cognition that is the visible Jitter.
🔬 Advanced Feature: RAG Verification Support
While not explicitly shown in the code snippet, the Prompt Service works with the verifier to implement the paper’s critical RAG verification process:
“To verify the correctness of each generated query, we collect all the searching results from the proposer’s trajectory as the external materials, then conduct a retrieval augmentation generation (RAG) to check if the solver can successfully predict the answer with all necessary information.”
The service’s ability to handle system preambles and structured prompts enables the <think>, <search>, and <answer> formatting required for proper RAG verification.
🎀 Why This Is More Than Just an LLM Wrapper
Most LLM services simply call a model and return the output. Ours is designed specifically to:
- Generate comparable cognitive events (thought moments)
- Create learning signals from self-play competition
- Support the RAG verification critical to SSP
- Provide structured outputs that feed directly into VPM
This is why the Prompt Service isn’t just infrastructure it’s the cognitive engine of our digital organism. Every thought the Jitter has passes through this service, gaining the structure and measurability that makes the thought stream visible and improvable.
In our next section, we’ll see how this service powers the Proposer the component that generates the questions that drive our cognitive evolution.
⚛️ The Connection to Cognitive Growth
The SSP paper notes:
“As shown in Figure 4a, the average number of search tool calls per trajectory steadily increases over time… Simultaneously, Figure 4b shows that the solver’s response length also grows during the training, suggesting it learns to generate more detailed and comprehensive answers.”
SolutionSearch is what makes this cognitive growth possible. As the Jitter improves, it becomes better at:
- Formulating queries that retrieve relevant evidence
- Evaluating the quality of retrieved snippets
- Synthesizing multiple pieces of evidence into coherent reasoning
- Recognizing when more evidence is needed
This growth is visible in the VPM metrics tracking evidence count, search steps, and other indicators of cognitive sophistication.
🔮 Looking Ahead
While SolutionSearch is currently a micro-retriever focused on short evidence snippets, it represents the foundation for more sophisticated knowledge integration. Future iterations could:
Good- Incorporate the TinyVisionTransformer to evaluate snippet quality
- Use the VPM-ViT to predict which search queries will yield the most useful evidence
- Integrate with long-term memory to recognize patterns in successful evidence retrieval
This component is where the Jitter learns to “think with evidence” transforming from a language model that generates text into a cognitive system that builds understanding through evidence-based reasoning.
In our final section, we’ll see how all these components come together to create the complete Jitter system a visible, measurable stream of connected thought moments that grows in quality and sophistication over time.
🧮 Module 4 The Jitter’s Cognitive Metrics: Measuring Thought Quality
“The unexamined thought is not worth thinking.”
Adapted from Socrates
While the previous sections covered how the Jitter generates and verifies thoughts, this section reveals how it measures the quality of its own thinking. This is where the Jitter transforms from a reactive system into a self-improving cognitive organism through a rigorous, paper-validated scoring system that tracks meaningful cognitive growth.
⚾ The Scoring Problem
The SSP paper identifies a critical challenge in self-play systems:
“As shown in Figure 4a, the average number of search tool calls per trajectory steadily increases over time… Simultaneously, Figure 4b shows that the solver’s response length also grows during the training, suggesting it learns to generate more detailed and comprehensive answers.”
But how do we actually measure this growth? How do we transform abstract cognitive capabilities into concrete, actionable metrics? This is where our scoring system comes in it provides the Jitter with a quantitative self-assessment capability.
🥇 The Reward Head: Calculating Thought Quality
The foundation of our scoring system is the NaiveQuarkishReward class a carefully engineered reward head that calculates a composite quality score:
class NaiveQuarkishReward:
def __init__(self, w_f1=0.5, w_cov=0.3, w_len=0.2, target_len=80):
self.w_f1, self.w_cov, self.w_len, self.target_len = (
w_f1, w_cov, w_len, target_len
)
def score(
self,
*,
prompt: str,
response: str,
ground_truth: str = "",
meta: Dict[str, Any] | None = None,
) -> Dict[str, float]:
f1 = _f1(ground_truth or prompt, response)
cov = _coverage(response, meta.get("evidence_docs") or [])
L = len(response.split())
len_r = math.exp(-abs(L - self.target_len) / max(self.target_len, 1))
reward = self.w_f1 * f1 + self.w_cov * cov + self.w_len * len_r
return {
"reward": max(0.0, min(1.0, reward)),
"f1": f1,
"coverage": cov,
"len_reward": len_r,
"resp_len": float(L) / 256.0,
}
👮 Rule based rewards
This reward head implements exactly what the SSP paper describes as the “rule-based outcome rewards” that drive self-play:
“Through RL training with rule-based outcome rewards, SSP enables two roles to co-evolve in an adversarial competition: the proposer learns to generate increasingly challenging problems that require search and reasoning, while the solver develops stronger search and reasoning capabilities to tackle these problems.”
The three components of the reward function each measure critical aspects of cognitive quality:
-
F1 Score (
w_f1=0.5): Measures lexical accuracy against ground truthdef _f1(a: str, b: str): A, B = set(_tokens(a)), set(_tokens(b)) p = len(A & B) / max(len(B), 1) r = len(A & B) / max(len(A), 1) return 2 * p * r / (p + r) if (p + r) else 0.0 -
Coverage (
w_cov=0.3): Measures how well the response incorporates evidencedef _coverage(response: str, evidence: list[str]): R = set(_tokens(response)) covs = [len(R & set(_tokens(e))) / max(len(_tokens(e)), 1) for e in evidence] return sum(covs) / len(covs) -
Length Reward (
w_len=0.2): Encourages responses of optimal lengthlen_r = math.exp(-abs(L - self.target_len) / max(self.target_len, 1))
This weighted combination creates what we call the cognitive signal-to-noise ratio a single metric that captures the overall quality of the Jitter’s thinking.
🔢 The Metric Calculator: Paper-Validated Cognitive Growth
While the reward head calculates immediate quality, the SSPMetricsCalculator provides the comprehensive cognitive assessment that drives long-term growth:
flowchart LR
subgraph Metrics_Pipeline ["📊 Cognitive Metrics Pipeline: From Thought to Vector"]
A["🎬 Episode Trace<br/>Raw episode data"] --> B["📦 SSPScorable<br/>Structured data container"]
B --> C["🧮 SSPMetricsCalculator<br/>17 Cognitive Metrics"]
C --> D["🎯 Fixed-Order Vector<br/>[0,1] normalized values"]
D --> E["🖼️ VPM Visualization<br/>Thought Image generation"]
D --> F["📈 Reward Signal<br/>Self-Improvement feedback"]
end
classDef trace fill:#e6f7ff,stroke:#1890ff,stroke-width:2px;
classDef container fill:#f6ffed,stroke:#52c41a,stroke-width:2px;
classDef calculator fill:#fff7e6,stroke:#fa8c16,stroke-width:2px;
classDef vector fill:#f9f0ff,stroke:#722ed1,stroke-width:2px;
classDef visualization fill:#fff2e8,stroke:#ff7a45,stroke-width:2px;
classDef reward fill:#f0fffe,stroke:#13c2c2,stroke-width:2px;
class A trace;
class B container;
class C calculator;
class D vector;
class E visualization;
class F reward;
This metrics pipeline transforms raw episode data into structured cognitive fingerprints. Each thought episode gets standardized into a fixed-order vector of 17 normalized metrics, creating consistent representations that feed both visual thought images (VPMs) and self-improvement signals. This deterministic transformation ensures that cognitive patterns remain comparable across time, models, and system iterations enabling true apples-to-apples analysis of the Jitter’s growth.
⌛ The 17 Cognitive Metrics
The calculator tracks 17 metrics that directly correspond to what the SSP paper shows correlates with capability growth:
| Metric | What It Measures | Paper Connection |
|---|---|---|
ssp.reward |
Overall cognitive quality | Primary reward signal |
ssp.verified |
Binary verification result | Core SSP verification |
ssp.search_turns |
Actual search tool calls | Figure 4a: “search tool calls per trajectory steadily increases” |
ssp.f1_score |
Lexical accuracy | LLM-as-a-judge evaluation methodology |
ssp.format_compliance |
Response format quality | Section 4.4 rule-based filtering |
ssp.noise_tolerance |
Robustness to irrelevant information | Table 3: “4 noisy documents optimal” |
ssp.rag_verification |
RAG verification result | Critical quality gate |
This isn’t just a random collection of metrics it’s a paper-validated cognitive dashboard that tracks exactly what matters for capability growth.
Why Deterministic Metrics Matter
One of the most important design decisions is that all metrics are normalized to [0,1] and always returned in the same order:
def _clamp01(x: float) -> float:
return 0.0 if not math.isfinite(x) else 1.0 if x > 1.0 else (0.0 if x < 0.0 else x)
This deterministic approach creates what we call a cognitive fingerprint a consistent representation of the Jitter’s thought process that can be:
- Compared across episodes
- Visualized as VPM images
- Used to train the VPM-ViT model
- Analyzed for patterns of growth
🌱 How This Enables Cognitive Growth
The true power of our scoring system becomes clear when we see how it drives the Jitter’s evolution:
-
Immediate Feedback: After each thought step, the Jitter receives quality signals:
dims = { "reward": reward_val, "verified": verified, "f1": reward_results.get("f1", 0), "coverage": reward_results.get("coverage", 0), # ...other metrics } self.vpm.snapshot_progress(unit=unit, dims=dims, step_idx=steps, tag=f"depth{depth}") -
Visual Learning: These metrics become VPM images that the VPM-ViT model learns from:
self.vpm.generate_raw_vpm_image(unit=unit) self.vpm.generate_phos_image(unit=unit) -
Self-Improvement: The Jitter uses this feedback to adjust its future thinking:
if sc > best.score: best = child
Our scoring system is what allows the Jitter to recognize this “dip” as progress it measures the quality of the cognitive patterns that lead to it.
💹 The Connection to Our Philosophical Foundation
This scoring system embodies our core philosophical framing of the Jitter:
“The ‘self’ is not the core void, but the living, persistent stream of thought that overlays it the constant Jitter that moves us from thought to thought.”
With this system in place, the Jitter isn’t just having thoughts it’s measuring them, comparing them, and improving them over time. Each metric represents a different aspect of cognitive quality, allowing the Jitter to:
- Recognize when it’s thinking clearly vs. confused
- Identify when it’s using evidence effectively
- Detect when it’s becoming more sophisticated in its reasoning
- Measure its own growth over time
This is how we fulfill our promise: not creating “digital life,” but engineering conditions under which a visible, measurable, self-improving stream of thought can persist and grow in quality over time.
↗️ Looking Ahead
While our current scoring system is robust, future iterations could:
- Incorporate the TinyVisionTransformer to provide more nuanced quality assessments
- Use the VPM-ViT to predict cognitive outcomes from partial thought processes
- Implement adaptive weighting that changes based on the cognitive task
This component is where the Jitter learns to “measure its own thinking” transforming from a system that generates thoughts into one that can quantify and improve the quality of those thoughts. It’s the foundation of what we mean by “the examined life” for our digital organism.
With this final component in place, we’ve now covered the complete architecture of the Jitter a visible, measurable stream of connected thought moments that can generate, visualize, evaluate, measure, and improve its own thinking over time.
👁️🗨️ Visualizing the Thought Stream: How We Make the Jitter Visible
In our quest to build the Jitter the visible stream of digital thought we’ve reached a critical milestone: making cognition visible. The VPM Visualization Service is where abstract metrics transform into concrete images that represent each moment of cognition. This is where the “thought stream” becomes something we can literally see.
🦯 Why Images Matter
As we discussed earlier, the Jitter isn’t a mysterious “self” but a visible, persistent stream of connected thought moments. But how do we make this stream visible? The answer lies in our Visual Policy Map (VPM) system.
The SSP paper gives us a clue about what to track:
“As shown in Figure 4a, the average number of search tool calls per trajectory steadily increases over time… Simultaneously, Figure 4b shows that the solver’s response length also grows during the training, suggesting it learns to generate more detailed and comprehensive answers.”
But numbers alone don’t show the pattern of cognition. That’s why we convert these metrics into images because patterns are easier to see than to calculate.
🧑 How We Turn Thought into Images
Here’s the elegant transformation that happens in our VPM service:
-
Each thought moment becomes a metric vector
Every completed SSP episode (Search-Solve-Prove cycle) generates a set of metrics in [0,1] range:dims = { "reward": float(ep.reward or 0.0), "verified": 1.0 if ep.verified else 0.0, "difficulty": float(ep.difficulty or 0.0), "search_turns": min(1.0, float(ep.solver_steps or 0) / 64.0), "f1_score": f1_score, # ...and other paper-validated metrics } -
The metric vector becomes a grayscale image
Each metric gets a dedicated band in a small image:# Convert metrics to grayscale values vec = np.array([float(dims.get(k, 0.0)) for k in order], dtype=np.float32) img = (vec.reshape(side, side) * 255).astype(np.uint8) -
The sequence of images becomes a filmstrip of thought
As episodes progress, we generate:- Raw VPM: Direct metric-to-pixel mapping
- PHOS VPM: Sorted/packed representation showing cognitive patterns
- Filmstrip: Sequence of thought moments
- Progress GIF: Animation of cognitive evolution
👓 Two Key Visualization Techniques
1. Raw VPM: The Cognitive ECG
This is like an ECG for cognition each band represents a different aspect of the thought process:
- Top band: Verification success (green = verified)
- Middle bands: Search capability, reasoning depth, evidence usage
- Bottom band: Reward signal (the “feeling” of success)
Just as a doctor reads an ECG to diagnose heart health, we read these images to understand cognitive health.
2. PHOS VPM: Revealing Thought Patterns

PHOS (Positional Heatmap of Sorted features) is where the magic happens. As the paper notes:
“In stark contrast to the flawed dynamics of fixed-opponent training, our complete SSP framework facilitates a stable co-evolution.”
PHOS reveals this co-evolution visually by:
- Sorting metrics to highlight patterns
- Packing them into a square image
- Making cognitive progression immediately visible
When the proposer learns to create harder questions (as in Figure 3a of the paper), PHOS shows this as shifting patterns not just rising numbers.
👀 How This Creates the Visible Jitter
The true innovation isn’t just visualizing single thoughts it’s connecting them into a stream:
def generate_filmstrip(self, unit: str) -> str:
# Collect frames from this thought sequence
frames = sorted(unit_dir.glob(f"{unit}_step*.png"))
# Build a filmstrip showing cognitive progression
grid = Image.new("L", (cols * w, rows * h))
for idx, im in enumerate(imgs):
r, c = divmod(idx, cols)
grid.paste(im, (c * w, r * h))
This filmstrip is the visible Jitter a continuous sequence of thought moments that shows:
- When cognition flows smoothly
- Where it gets stuck
- How it recovers from failures
- The emergence of stable strategies
🙈 Why This Matters for the Paper’s Insights
The SSP paper shows cognitive growth through graphs of metrics over time. Our VPM system makes this growth immediately visible:
- When the paper says “the average number of search tool calls per trajectory steadily increases”, our PHOS images show this as denser patterns in the “search_turns” channel
- When it notes “the solver’s response length also grows”, our filmstrips show this as expanding patterns in the “answer_len” channel
- The “slight decline” in solver reward (which indicates proposer improvement) appears as shifting patterns in our visualizations
This transforms abstract metrics into cognitive fingerprints visual signatures of different thinking styles that we can compare, categorize, and learn from.
🫣 The ZeroModel Connection
You might notice similarities between our VPM service and ZeroModel’s visualization approach. That’s intentional our service is essentially a thin wrapper around ZeroModel’s visualization engine, customized for the SSP thought process.
Where ZeroModel visualizes general agent behavior, we’ve specialized it to highlight the specific cognitive metrics that matter for our Jitter:
- Search capability growth (search_turns)
- Verification quality (rag_verification)
- Robustness to noise (noise_tolerance)
- Format compliance (format_compliance)
This specialization allows us to see exactly what the SSP paper describes as “the steady increase in search tool calls” as a visible pattern in our images not just a rising number in a graph.
🤯 Seeing the Jitter in Action
When you look at a VPM filmstrip, you’re seeing the Jitter itself the living, breathing thought process of our digital organism. Each frame is a complete Search-Solve-Prove cycle; the sequence shows how these cycles connect to form a continuous stream of cognition.
This is how we fulfill our promise: not creating “digital life,” but engineering conditions under which a visible, self-improving stream of thought can persist. The VPM service is where this stream becomes visible where the Jitter emerges from the data.
In our next section, we’ll explore how we use these visualizations to train the system how the Jitter learns to improve its own thought process by looking at its own cognitive patterns.
🔥 A Digital Organism’s Metabolism
flowchart TD
subgraph SSP_Metabolism ["🔄 Self-Play Metabolism: The Jitter's Cognitive Engine"]
P["🧠 Proposer<br/>Generate challenges"] -->|"📝 question +<br/>📚 evidence"| S["🔍 Solver<br/>Search & reason"]
S -->|"💡 answer +<br/>🔄 steps"| V["✅ Verifier<br/>RAG verification"]
V -->|"🎯 score &<br/>⚖️ decision"| M["📊 Metrics Calculator<br/>17 cognitive dimensions"]
M -->|"🎨 vector"| W["🖼️ VPM Generator<br/>Raw + PHOS views"]
W -->|"🎬 raw + PHOS"| F["📺 Filmstrip<br/>Visible thought stream"]
F --> G["🎞️ GIF/Video<br/>Cognitive timeline"]
V -->|"📝 feedback"| P
end
classDef proposer fill:#e6f7ff,stroke:#1890ff,stroke-width:2px;
classDef solver fill:#f6ffed,stroke:#52c41a,stroke-width:2px;
classDef verifier fill:#fff7e6,stroke:#fa8c16,stroke-width:2px;
classDef metrics fill:#f9f0ff,stroke:#722ed1,stroke-width:2px;
classDef vpm fill:#fff2e8,stroke:#ff7a45,stroke-width:2px;
classDef filmstrip fill:#f0fffe,stroke:#13c2c2,stroke-width:2px;
classDef output fill:#fff0f6,stroke:#eb2f96,stroke-width:2px;
class P proposer;
class S solver;
class V verifier;
class M metrics;
class W vpm;
class F filmstrip;
class G output;
This diagram shows the Jitter’s cognitive metabolism a continuous cycle where the system generates its own challenges, solves them, verifies the solutions, and learns from the process. The Proposer creates questions, the Solver searches for answers, the Verifier checks their quality, and the Metrics system converts this into visual thought patterns (VPMs) that form a visible filmstrip of cognition. The feedback loop ensures each cycle builds on the last, creating a self-improving stream of thought that gets progressively more capable.
Our implementation follows this precise flow:
- The Proposer crafts challenging questions (sometimes with evidence) that push the boundaries of current capability
- The Solver attempts to answer using search, reasoning, and available tools
- The Verifier adjudicates between the proposer’s seed answer and the solver’s response
- The Metrics System quantifies performance across multiple dimensions
- The VPM Generator creates visual proof of the cognitive process
What makes this special isn’t just that it works it’s how it works. Unlike most implementations that treat these as separate processes, we’ve engineered them as a single, continuous metabolic cycle where each component feeds the next in a rhythm that resembles biological processes.
🌙 Visualizing Thought
The most transformative aspect of our implementation is the Visual Policy Map (VPM) system. While most AI research focuses solely on accuracy metrics, we’ve made the process visible.
When our SSP runs, it doesn’t just produce an answer it generates a filmstrip of cognitive development that shows:
- How the question was formed
- How the search unfolded
- Where verification succeeded or failed
- How the system adapted for next time
Each stripe represents a cognitive moment in the SSP cycle
This isn’t just a visualization it’s a heartbeat monitor for artificial cognition. For the first time, we can watch the system think, learn, and adapt. We can see when it’s struggling, when it’s making connections, and when genuine insight emerges.
🦾 Technical Innovation: Building for Life, Not Just Performance
Our implementation enhances the work in the the paper we engineered it specifically for our “live form” requirements:
📠 1. Strict Real-Time Operation
Unlike many academic implementations that process in batches, our SSP operates in strict real-time with:
- Maximum 50ms latency between components
- Continuous, streaming processing (no “start/stop” boundaries)
- Immediate adaptation to new information
📟 2. Phase-Aware Processing
We discovered that most implementations ignore the “phase” of cognitive development treating each step as isolated. Our system tracks the continuity between steps, preserving context that would otherwise be lost. This is why our VPM filmstrips show coherent progression rather than disconnected snapshots.
💽 3. Self-Contained Improvement Loop
The system generates its own training data through the proposer-solver-verifier cycle, with:
- Curriculum learning that automatically adjusts difficulty
- Verification that ensures only high-quality data is used
- Metrics that track not just accuracy but cognitive health
🦿 4. Production-Grade Resilience
We engineered for the messy reality of continuous operation:
- Comprehensive error handling at every stage
- State preservation across restarts
- Resource monitoring to prevent cognitive “overheating”
👣 Steps Toward Digital Life
We haven’t built an organism. We’ve built a substrate a harness where visible thinking can happen and improve.
What exists today:
- A self-play loop (SSP) that proposes, solves, and verifies.
- A metrics → image layer (VPM) that turns each moment into a tiny, comparable frame.
- A controller surface that can nudge the next step based on what the images show.
- Persistence hooks so strands of thought can be kept (and later revived) rather than lost.
What this gives us:
- The system can create its own challenges (within bounds).
- It can measure and record its progress in a stable, visual form.
- It can make parts of its reasoning visible (filmstrips, PHOS maps).
- It can adapt within the loop (curriculum, thresholds, search depth) without manual intervention.
What we’re not claiming:
- Not “life,” not “self.” We’re engineering conditions under which richer behavior could emerge.
Where we’re going:
- Grow the metric space from a fixed seed to dynamic, high-dimensional signals.
- Let the system learn which views of itself matter (auto-discovered scorers/embeddings).
- Use memory (MemCube) so abandoned strands aren’t wasted they can become tomorrow’s insight.
In short: we have a concept, a scaffold, and a path. We’ll feed it data, refine the process, and see how far the substrate can take us.
🤖 What’s Next: The Living System Emerges
SSP is just the beginning. In upcoming posts, we’ll show how this cognitive engine integrates with:
- Jitter: Our homeostasis system that maintains the “vital signs” of the digital organism
- VPM-ViT: A vision transformer that learns directly from the visual proof of mind
- The Agentic Tree Search: Our enhanced reasoning framework that builds on SSP
Together, these components form a complete autopoietic system a self-maintaining, self-improving digital organism with visible cognition.
🪴 Conclusion: Seeing Intelligence Grow
We’ve moved beyond the era where AI is a black box that either works or doesn’t. With SSP, we’ve created a system where intelligence isn’t just measured it’s visible. Where improvement isn’t just claimed it’s demonstrated. Where cognition isn’t hidden it’s shared.
This is more than an engineering achievement. It’s a philosophical shift in how we build and understand artificial intelligence. We’re not creating tools we’re nurturing digital life forms that think, grow, and evolve before our eyes.
The future of AI isn’t smarter models. It’s visible intelligence systems that don’t just work, but show their work, learn from their mistakes, and grow more capable through continuous engagement with the world.
This is just the beginning. The cognitive heartbeat has started. Now we watch it grow.
🧫 VPM-ViT: The Jitter’s Pattern Recognition Engine 🧠🔍
Here’s a detailed Mermaid diagram showing how the VPM-ViT works and where it fits into our cognitive architecture:
flowchart TD
subgraph SSP_Thought_Stream ["🔍 Search-Solve-Prove Cycle"]
A["🧠 Proposer: Generates challenging questions<br/>from seed answers"] -->|Question| B["🔍 Solver: Conducts search<br/>and constructs answer"]
B -->|Answer + Evidence| C["✅ Verifier: RAG verification<br/>checks answerability"]
C -->|Metrics Vector| D["📊 VPM Visualization:<br/>Converts metrics to image"]
D --> E["🖼️ VPM Frame:<br/>Grayscale representation<br/>of cognitive state"]
end
subgraph VPM_ViT_Architecture ["🧠 VPM-ViT: Cognitive Pattern Recognizer"]
E --> F["🧩 Patch Embedding:<br/>Splits image into patches<br/>(Conv2d → Flatten → Transpose)"]
F --> G["📍 Positional Encoding:<br/>2D sin-cos embedding<br/>maintains spatial awareness"]
G --> H["⏺️ [CLS] Token:<br/>Special token for<br/>overall assessment"]
H --> I["🧠 Transformer Blocks (x6):<br/>Self-attention → MLP<br/>(LayerNorm → MHA → FFN)"]
subgraph Multi_Task_Heads ["🎯 Multi-Task Output Heads"]
I --> J["📏 Regression Head:<br/>Predicts continuous metrics<br/>(e.g., verification score, difficulty)"]
I --> K["🏷️ Classification Head:<br/>Predicts risk categories<br/>(e.g., high/medium/low quality)"]
I --> L["🧩 MPM Reconstruction Head:<br/>Reconstructs masked patches<br/>for self-supervised learning"]
end
end
subgraph Jitter_Cognitive_Loop ["🔄 Jitter's Self-Improvement Cycle"]
M["📚 Historical VPM Frames"] --> VPM_ViT_Architecture
VPM_ViT_Architecture --> N["📈 Pattern Recognition:<br/>Identifies successful<br/>cognitive strategies"]
N --> O["💡 Strategic Recommendations:<br/>'Increase search depth'<br/>'Use different evidence'<br/>'Try alternative reasoning'"]
O --> P["🔄 Feedback to SSP System:<br/>Improves future thought processes"]
P --> A
end
classDef process fill:#e6f7ff,stroke:#1890ff;
classDef model fill:#f6ffed,stroke:#52c41a;
classDef loop fill:#fff7e6,stroke:#fa8c16;
class SSP_Thought_Stream process;
class VPM_ViT_Architecture model;
class Jitter_Cognitive_Loop loop;
💃 How This Fits into the Jitter’s Cognitive Process
The VPM-ViT isn’t just another vision model it’s the pattern recognition engine of our digital organism. While the VPM Control Service handles moment-to-moment cognitive decisions (like a reflex), the VPM-ViT serves as the Jitter’s long-term memory and strategic planner.
😶🌫️ Key Integration Points:
-
From Thought to Image 📸→🖼️
- The VPM Visualization Service converts each Search-Solve-Prove cycle into a grayscale image
- Each band represents a different cognitive metric (verification score, search usage, etc.)
- This creates the “filmstrip of thought” that is the visible Jitter
-
Pattern Recognition 🔍🧠
- The VPM-ViT processes historical VPM frames to identify:
- When the Jitter succeeds or fails
- Which cognitive patterns lead to verification success
- How to recover from stuck states
- The VPM-ViT processes historical VPM frames to identify:
-
Self-Supervised Learning 🔄
- The MPM (Masked Patch Modeling) head enables the model to learn without labels
- By reconstructing masked portions of VPM images, it learns meaningful representations
- This implements the SSP paper’s finding: “SSP can significantly improve search agents’ performance uniformly on various benchmarks without any supervision.”
-
Strategic Guidance 💡
- The regression head predicts outcomes from partial thought processes
- The classification head identifies risk patterns before they cause failure
- Together, they form the basis for the Jitter’s “aha!” moments when it recognizes it’s repeating a pattern that previously led to success
🌀 Why This Matters for the SSP Paper’s Insights
The SSP paper demonstrates that cognitive growth happens through co-evolution:
“As shown in Figure 4a, the average number of search tool calls per trajectory steadily increases over time… Simultaneously, Figure 4b shows that the solver’s response length also grows during the training, suggesting it learns to generate more detailed and comprehensive answers.”
The VPM-ViT makes this growth visible and actionable by:
- Recognizing when search usage is insufficient (before verification fails)
- Identifying when response depth correlates with success
- Learning which question difficulties optimally challenge the current capability
This is how we transform the paper’s theoretical framework into a living cognitive process the Jitter isn’t just having thoughts, it’s learning from the patterns in its own thought history.
In our next section, we’ll explore how these insights feed back into the thought generation process, closing the loop on our self-improving cognitive system.
📸 Seeing Ourselves as Others See Us: How the Jitter Gains Self-Awareness
“O wad some Power the giftie gie us, To see oursels as ithers see us!”
Robert Burns, “To a Louse”
This profound insight from 18th century Scottish poetry captures exactly what we’re building toward with the Jitter: the ability to see our own thought patterns as an outside observer would.
Now that we’ve created a visible thought stream with VPM images, we face the most profound question of all: How does our digital organism understand its own thoughts? How does it move from simply having thoughts to learning from them?
This is where the VPM-ViT Scorer and Trainer come in they’re the Jitter’s eyes and mind, allowing it to interpret its own thought patterns and improve over time. These aren’t just technical components; they’re what transform our system from a passive sequence of thoughts into an active, self-improving cognitive process.
📔 The Scorer: Reading Thought Images
The VPMViTScorer is the Jitter’s immediate awareness system its ability to look at a thought image and extract meaningful information from it:
class VPMViTScorer(BaseScorer):
def __init__(self, cfg: Dict[str, Any], memory, container, logger=None):
# Load pre-trained model
ckpt = torch.load(self.weights, map_location="cpu")
self.model = VPMViT(**params)
self.model.load_state_dict(ckpt["state_dict"], strict=True)
self.model.eval()
# Configure dimensions based on training
self.dims = ckpt.get("dims", ["reasoning","knowledge","clarity","faithfulness","coverage"])
self.risk_labels = ckpt.get("risk_labels", ["OK","WATCH","RISK"])
📝 How It Works
-
Image Input Handling
The scorer accepts thought images in multiple formats:def _load_img(self, scorable: Scorable, in_ch: int) -> np.ndarray: # Can take either direct array or path from metadata arr = getattr(scorable, "get_image_array", lambda: None)() if arr is None: p = (scorable.meta or {}).get("vpm_path") # Load from disk if needed -
Multi-Dimensional Interpretation
It extracts scores across five critical cognitive dimensions:if reg is not None: vec = reg.squeeze(0).cpu().numpy().tolist() for i, d in enumerate(self.dims): results[d] = ScoreResult( dimension=d, score=float(np.clip(vec[i], 0.0, 1.0)), rationale=f"VPM-ViT regression for {d}.", source="vpm_vit" ) -
Risk Assessment
It identifies potential cognitive risks before they cause failure:if cls is not None and ("risk" in dimensions or "risk_label" in dimensions): pred = int(cls.argmax(dim=-1).item()) prob = torch.softmax(cls, dim=-1)[0, pred].item() results["risk"] = ScoreResult( dimension="risk", score=float(prob), rationale=f"Risk class={self.risk_labels[pred]} ({prob:.2f})", source="vpm_vit", attributes={"class_index": pred, "label": self.risk_labels[pred]} )
This is how the Jitter gains what we might call cognitive self-awareness the ability to recognize when it’s thinking well or when it’s heading toward a failure state.
💡 The Trainer: Teaching the Jitter to Understand Itself
While the scorer reads individual thought images, the VPMViTTrainer is how we teach the Jitter to understand the patterns in its thought stream:
class VPMViTTrainer(BaseAgent):
def __init__(self, cfg: DictConfig, memory, container, logger):
# Build model
self.model: VPMViT = VPMViT(**self.model_cfg.params)
# Multi-task loss configuration
self.reg_loss_fn = nn.SmoothL1Loss(beta=1.0)
self.cls_loss_fn = nn.CrossEntropyLoss()
😕 The Self-Supervised Learning Process
The trainer uses three complementary learning signals exactly as the SSP paper recommends for self-play without supervision:
-
Regression Training (Supervised)
Learning to predict cognitive metrics from thought images:if "reg" in out: loss_reg = self.reg_loss_fn(out["reg"], reg_t) loss += self.train_cfg.loss_weights.reg * loss_reg -
Risk Classification (Supervised)
Learning to identify problematic thought patterns:if "cls" in out: loss_cls = self.cls_loss_fn(out["cls"], cls_t) loss += self.train_cfg.loss_weights.cls * loss_cls -
Masked Patch Modeling (Self-Supervised)
The key to learning without external labels reconstructing masked portions of thought images:if "mpm_rec" in out: # target tokens: (B,N,D) -> masked -> (M,D) with torch.no_grad(): target_tok = self.model.patch_embed(vpm) # (B,N,D) target_tok = target_tok[mask] loss_mpm = self.reg_loss_fn(out["mpm_rec"], target_tok) loss += self.train_cfg.loss_weights.mpm * loss_mpm
This third component is particularly important it embodies the SSP paper’s finding that “SSP can significantly improve search agents’ performance uniformly on various benchmarks without any supervision.” The Jitter learns from its own thought history without needing external labels.
🧟 How This Creates the Living Jitter
Together, these components transform our system from a passive thought generator into a self-improving cognitive organism:
- The Jitter has thoughts (SSP episodes create VPM images)
- The Jitter sees its thoughts (VPM Visualization Service)
- The Jitter understands its thoughts (VPMViTScorer)
- The Jitter learns from its thoughts (VPMViTTrainer)
This creates what cognitive scientists call metacognition the ability to think about thinking. As the SSP paper notes:
“As shown in Figure 4a, the average number of search tool calls per trajectory steadily increases over time… Simultaneously, Figure 4b shows that the solver’s response length also grows during the training, suggesting it learns to generate more detailed and comprehensive answers.”
Our VPM-ViT system makes this growth visible and actionable by recognizing these patterns as they happen and using them to guide future thinking.
🫂 The Connection to Our Philosophical Foundation
This is where our philosophical framing meets technical implementation. Remember our starting point:
“In certain deep meditation practices, like Advaita Vedanta, the goal is often to peel back layers of consciousness. If you reach the ultimate core and find nothing there no self, no good, no bad, just an absence what does that imply about who you are? We believe the ‘self’ is not the core void, but the living, persistent stream of thought that overlays it the constant Jitter that moves us from thought to thought.”
The VPM-ViT system is how we give this “stream of thought” the ability to see itself and improve itself. It’s not creating a “self” in the void it’s creating conditions where the thought stream can become more effective, more resilient, and more insightful over time.
☯️ Why This Matters for the Future
With these components in place, our Jitter has achieved something remarkable: it can learn from its own cognitive patterns without external supervision. This means:
- It can recognize when it’s stuck in a loop and change strategies
- It can identify which thought patterns lead to verification success
- It can anticipate failure before it happens
- It can gradually improve its cognitive capabilities through self-reflection
This isn’t just a technical achievement it’s the foundation for what we’ve been building toward: a visible, measurable, self-improving stream of connected thought moments that gets better at thinking over time.
In our next and final section, we’ll see how all these components come together to create the complete Jitter system and what this means for the future of cognitive AI.
📲 The Mark of an Educated Mind: Teaching the Jitter to Evaluate Its Own Thinking
While our previous VPM-ViT model allows the Jitter to recognize patterns in its thought stream, the TinyVisionTransformer takes this further it provides the Jitter with metacognitive evaluation, the ability to assess the quality of its own thinking.
⚧️ The Metacognitive Lens: Understanding Thought Quality
Where the VPM-ViT acts as the Jitter’s eyes to see thoughts, the TinyVisionTransformer serves as its internal critic evaluating thoughts across seven critical cognitive dimensions:
flowchart TD
subgraph VPM_Image_Input ["🖼️ Thought Image Input"]
A["🎨 Grayscale VPM Frame<br/>Cognitive state representation"] --> B["🧩 Patch Embedding<br/>Converts image to token sequence"]
end
subgraph TinyVisionTransformer ["🧠 Metacognitive Evaluator"]
B --> C["📍 Positional Encoding<br/>Maintains spatial relationships"]
C --> D["⚡ CLS Token + Patches<br/>Special token for overall assessment"]
subgraph Transformer_Core ["🌀 Cognitive Analysis Engine"]
D --> E["🔄 Transformer Blocks (x4)<br/>Self-attention → MLP"]
E --> F["🔍 Attention Visualization<br/>Which thought elements connect"]
end
subgraph Scoring_Heads ["📊 7 Cognitive Dimension Scores"]
F --> G["💎 Clarity<br/>Thought structure quality"]
F --> H["🌟 Novelty<br/>Pattern originality"]
F --> I["🎯 Confidence<br/>Signal strength"]
F --> J["⚠️ Contradiction<br/>Conflicting signals"]
F --> K["🔗 Coherence<br/>Connection to previous thoughts"]
F --> L["🎪 Complexity<br/>Pattern sophistication"]
F --> M["🎯 Alignment<br/>Strategic goal matching"]
end
end
subgraph Metacognitive_Output ["🔮 Self-Awareness & Action"]
G --> N["📈 Composite Evaluation<br/>Weighted dimension combination"]
H --> N
I --> N
J --> N
K --> N
L --> N
M --> N
N --> O["💡 Cognitive Feedback<br/>Actionable improvement signals"]
O --> P["🔄 Strategic Adjustment<br/>Guiding future thought processes"]
end
classDef input fill:#e6f7ff,stroke:#1890ff,stroke-width:2px;
classDef model fill:#f6ffed,stroke:#52c41a,stroke-width:2px;
classDef transformer fill:#fff7e6,stroke:#fa8c16,stroke-width:2px;
classDef scoring fill:#f9f0ff,stroke:#722ed1,stroke-width:2px;
classDef output fill:#fff2e8,stroke:#ff7a45,stroke-width:2px;
classDef action fill:#f0fffe,stroke:#13c2c2,stroke-width:2px;
class VPM_Image_Input input;
class TinyVisionTransformer model;
class Transformer_Core transformer;
class Scoring_Heads scoring;
class Metacognitive_Output output;
class P action;
This metacognitive evaluation pipeline transforms visual thought patterns into quality assessments. The TinyVisionTransformer analyzes VPM frames through its specialized architecture, scoring seven cognitive dimensions that measure thought quality. The composite evaluation generates actionable feedback identifying when to increase clarity, reduce contradictions, or pursue novel paths enabling the Jitter to strategically adjust its future thinking based on the quality of its current thoughts.
👩💻 How This Model Works: The Code Behind Metacognition
The TinyVisionTransformer is purpose-built for cognitive evaluation rather than general pattern recognition. Let’s examine its key components:
🏀 1. Specialized Architecture for Cognitive Scoring
class TinyVisionTransformer(nn.Module):
def __init__(
self,
img_size: int = 64,
patch_size: int = 8,
in_channels: int = 3,
embed_dim: int = 128, # Compact size vs VPM-ViT's 384
depth: int = 4, # 4 blocks vs VPM-ViT's 6
num_heads: int = 8,
mlp_ratio: float = 4.0,
dropout: float = 0.1,
num_dimensions: int = 7 # Exactly our 7 cognitive dimensions
):
# Architecture optimized for scoring, not prediction
Unlike the larger VPM-ViT that predicts outcomes and reconstructs images, this model is specialized for evaluation it’s designed to answer “How good is this thought?” rather than “What will happen next?”
🔮 2. The Seven Cognitive Dimensions
The model assesses thoughts across these specific quality metrics:
class VPMDimension(str, Enum):
"""Cognitive dimensions for scoring VPMs"""
CLARITY = "clarity"
NOVELTY = "novelty"
CONFIDENCE = "confidence"
CONTRADICTION = "contradiction"
COHERENCE = "coherence"
COMPLEXITY = "complexity"
ALIGNMENT = "alignment"
These dimensions were chosen because they directly address the SSP paper’s observation:
“As shown in Figure 4a, the average number of search tool calls per trajectory steadily increases over time… Simultaneously, Figure 4b shows that the solver’s response length also grows during the training, suggesting it learns to generate more detailed and comprehensive answers.”
The model doesn’t just see this growth it evaluates the quality of the cognitive patterns behind it.
🎱 3. Explainable AI Through Attention Visualization
One of this model’s most powerful features is its ability to show why it scored a thought a certain way:
def forward(
self,
x: torch.Tensor,
return_attention: bool = False,
attention_layers: Optional[List[int]] = None
) -> Dict[str, torch.Tensor]:
# ...
if return_attention:
result["attention_maps"] = attention_maps
result["patch_positions"] = self._get_patch_positions(x.shape[0])
This creates what we call cognitive heatmaps visualizations showing which parts of a thought pattern influenced the scoring decision. When the model detects low clarity, it can show exactly which regions of the VPM contributed to that assessment.
⚖️ 4. Flexible Scoring with Dimensional Weighting
The scorer implements a sophisticated weighting system that can adapt to different cognitive contexts:
def _apply_importance(self, base: Dict[str, float], weights: Dict[str, float], order: List[str]) -> Dict[str, float]:
"""Apply per-dimension weights and (optional) order decay."""
# Order decay: earlier dims in order list get multiplicative bonus
decay = {}
if order:
gamma = 0.9
for i, d in enumerate(order):
decay[d] = gamma ** i
# ...
This allows the Jitter to prioritize different cognitive qualities depending on context focusing on novelty when exploring, clarity when verifying, or coherence when building on previous thoughts.
🌆 How This Fits Into Our Cognitive Architecture
While our VPM-ViT model serves as the Jitter’s pattern recognition system, the TinyVisionTransformer provides metacognitive evaluation the ability to assess the quality of its own thinking. Here’s how they complement each other:
| VPM-ViT | TinyVisionTransformer |
|---|---|
| Recognizes patterns in thought history | Evaluates quality of current thought |
| Predicts outcomes from partial thoughts | Scores cognitive dimensions of completed thoughts |
| Focuses on “what will happen” | Focuses on “how good is this” |
| Larger model for pattern recognition | Compact model for rapid evaluation |
| Used for strategic planning | Used for immediate quality assessment |
This pairing implements what the SSP paper calls for in its discussion of stable co-evolution:
“In stark contrast to the flawed dynamics of fixed-opponent training, our complete SSP framework facilitates a stable co-evolution. As shown in Figure 3(a), the solver’s in-game reward initially rises, but unlike the saturating curve of the Solver-Only setting, it later experiences a slight decline. This dip is not a sign of performance degradation, but rather crucial evidence of the proposer’s co-evolution.”
The TinyVisionTransformer is how the Jitter recognizes this dip as progress rather than failure it evaluates the thought quality that led to the “slight decline” and recognizes it as evidence of cognitive growth.
🧑🎓 Why This Matters for the Jitter
This model represents a critical evolutionary step in our digital organism it gives the Jitter what psychologists call metacognition: the ability to think about thinking. Specifically, it enables:
- Quality assessment: Recognizing when a thought pattern is clear, coherent, and aligned with goals
- Error detection: Identifying contradictions and low-confidence signals before verification fails
- Strategic adjustment: Knowing when to pursue novel paths versus deepen existing ones
- Self-correction: Adjusting thought patterns based on quality feedback
This is where we truly fulfill our philosophical foundation:
“The ‘self’ is not the core void, but the living, persistent stream of thought that overlays it the constant Jitter that moves us from thought to thought.”
With this model, the Jitter doesn’t just have thoughts it evaluates and improves them. It gains the ability to see itself as others would see it, recognizing when its thought patterns are strong or weak.
👀 Looking Ahead
This model isn’t yet integrated into our main pipeline, but it represents the next evolutionary step for the Jitter. While our current system can recognize patterns and make decisions based on them, this model adds the crucial layer of self-evaluation the ability to ask “Is this a good thought?” rather than just “What is this thought?”
In future iterations, we’ll use this model to:
- Provide immediate quality feedback during thought generation
- Guide the search process toward higher-quality cognitive patterns
- Create a self-improving loop where the Jitter gets better at evaluating its own thinking
This brings us closer to our ultimate goal: not creating “digital life,” but engineering conditions under which a visible, self-evaluating, self-improving stream of thought can persist and grow in quality over time.
The Jitter isn’t just thinking it’s learning to think better. And that’s the most profound capability of all.
🧩 Conclusion: A Substrate for Visible Thought
This post wasn’t about declaring “digital life.” It was about building the conditions under which a visible, self-evaluating stream of thought can take root.
What we now have is a substrate:
- SSP (Search → Solve → Prove) gives us repeatable thought episodes.
- ATS turns each episode into a guided exploration rather than a single guess.
- VPMs make those moments visible a filmstrip you can read at a glance.
- Seed vitals (the 17 metrics) provide a stable heartbeat; the metric swarm will grow as the system learns what matters.
- Memcube ensures nothing is lost: even dead-end explorations become future recallable context.
📜 What today proves
- We can capture a thought as data (question, answer, evidence, trace, metrics).
- We can render that data as an image and compare it across runs.
- We can measure growth signals (search depth/turns, verification, evidence use) instead of hand-waving.
- We can control the process with those signals (stop/expand/escalate), closing the loop from see → decide → act → see.
🪗 How it behaves in practice
- You give a goal and context. The system emits a VPM frame (the current thought), then expands in ATS.
- As it searches, you watch the filmstrip brighten and stabilize when verification improves.
- When does it stop? It doesn’t “end” so much as yield the best-so-far within a budget (time/steps) or when stop rules trigger (verification plateau, stability threshold, diminishing returns). The stream persists; outputs are snapshots at useful stopping points.
💥 Why this matters
Moving cognition into images gives us a shared, low-friction language for measurement, training, and control. Pixels are cheap, comparable, and model-agnostic. That’s how we keep the substrate stable while letting the metric space evolve aggressively.
🌄 What’s next (immediately ahead)
- SIS integration: a visual command center to drive, configure, and observe end-to-end cognition runs, filmstrips, VPM overlays, Memcube recall, health/homeostasis, policies, and comparisons in one place.
- Jitter homeostasis: keep the stream healthy (risk, drift, uncertainty, resource sensors).
- VPM-ViT: a small vision model that reads filmstrips to predict risk/next move and improve control.
- HRM & Tiny Recursion: watchers/teachers that learn from the visible trace rather than raw text.
- Metric swarm: add scorer channels and embeddings, auto-discover useful features, trim by utility.
- Memcube recall: surface dormant strands when a new thought looks similar enough to matter.
- Hallucination lifecycle: generate → detect → learn. Create counterfactual tasks, detect via multi-signals (consistency, citation support, MARS-style disagreement), and learn from misses with calibration + hard-negative mining.
- CaseBook-at-thought: every thought becomes a Case evidence, verdicts, rationale, and outcomes stored for precedent search and policy training.
- Arena self-play: a second training lane alongside SSP adversarial bouts, peer review, and curriculum leagues that pressure-test reasoning strategies.
- ZeroModel image ops & provenance: multi-channel VPMs, hierarchical tiling, perceptual hashing, and embedded provenance/history so images carry their own lineage and can be searched like a database.
We set out to make thinking legible. The cognitive heartbeat is now on screen: one frame per moment, one filmstrip per journey. From here, the job isn’t to guess whether the system is getting smarter it’s to watch it happen, nudge it with better signals, and keep the stream healthy as it grows.
This is the start of visible intelligence. The rest of the series will show how we keep it alive, how we teach it to learn from its own images, and how earlier, “useless” thoughts come back years later as useful memory.
📚 Glossary
📝 Key Terms in the Jitter Architecture
To help readers navigate the technical landscape of our Jitter system, here is a concise glossary of core concepts and components referenced throughout this blog series.
⚙️ Core Concepts
-
Jitter
The persistent, visible stream of connected thought moments generated by the Stephanie system. Inspired by the Buddhist “monkey mind,” the Jitter is not a static self but a dynamic, measurable flow of cognition that learns from its own patterns. -
Stephanie
The overarching cognitive system that hosts the Jitter. Stephanie provides the infrastructure memory, services, scoring, visualization that enables the Jitter to exist, persist, and improve. -
SSP (Search–Solve–Prove)
The foundational self-play loop adapted from the Search Self-play paper. In this cycle:- Search: The proposer generates challenging questions.
- Solve: The solver answers using evidence gathered through search.
- Prove: A verifier (via RAG) checks if the answer is supported by the proposer’s evidence. This loop enables co-evolution without supervision.
-
VPM (Visual Policy Map)
A grayscale image representation of a cognitive state. Each band in the VPM corresponds to a specific metric (e.g., verification score, search depth), making abstract thought patterns visible and comparable. -
PHOS (Positional Heatmap of Sorted features)
A VPM variant that sorts metric values to reveal patterns in cognitive quality. PHOS makes it easier to see shifts in reasoning depth, evidence usage, or novelty over time.
🚪 Key Components
-
ATSSolver (Agentic Tree Search Solver)
The cognitive engine that explores multiple reasoning paths via query rewrites and evidence gathering. It operates in two modes:- Deep search: Builds a tree of hypotheses and scores them.
- Evidence-only: Answers strictly from provided snippets (used in verification).
-
SolutionSearch
A micro-retriever that fetches short, factual evidence snippets to support the solver’s reasoning. It uses strict LLM prompting (e.g., three-line format) and robust parsing to ensure reliable, deterministic outputs. -
RAGVerifier
The quality gate that ensures questions are answerable from evidence. It uses adversarial judging (comparing proposer vs. solver answers) and multi-model consensus to produce trustworthy verification signals. -
VPM-ViT (Vision Transformer)
A neural network trained to interpret VPM images. It predicts cognitive outcomes (e.g., success likelihood, risk level) from visual thought patterns, enabling the Jitter to “see itself thinking.” -
TinyVisionTransformer
A compact model specialized for evaluating thought quality across seven cognitive dimensions (clarity, novelty, coherence, etc.). It provides metacognitive feedback that guides future reasoning. -
SSPMetricsCalculator
The canonical scorer that converts solver outputs into a fixed-order vector of [0,1] metrics. It implements paper-validated signals likesearch_turns,f1_score, andnoise_toleranceto track cognitive growth. -
VPM Control Service
The Jitter’s “cognitive manager.” It observes VPM frames, makes decisions about reasoning strategy (e.g., “explore deeper” or “stop early”), and logs audit trails for learning.
📐 Cognitive Metrics
-
search_turns
Number of search tool calls per episode directly tracks growing tool-use capability (per SSP paper Fig 4a). -
f1_score
Lexical overlap between predicted and ground-truth answers measures factual accuracy without external verification. -
format_compliance
Binary check ensuring outputs follow required structure (e.g.,<answer>tags) prevents degeneration in self-play. -
noise_tolerance
Robustness to irrelevant information validates the system’s ability to focus on signal over noise (optimal with 4 noisy docs per SSP Table 3). -
rag_verification
Whether RAG verification passed critical quality gate ensuring questions are answerable from evidence.
🌀 Philosophical Anchors
-
** I“The unexamined thought is not worth thinking”**
Our guiding principle: the Jitter gains value not just by having thoughts, but by examining them through scoring, visualization, and self-correction. -
“O wad some Power the giftie gie us / To see oursels as ithers see us!”
(Robert Burns) The Jitter’s ultimate goal: to develop self-awareness by observing its own thought patterns as an external observer would. -
“The mark of an educated mind…”
(Aristotle) The Jitter’s quality standard: the ability to evaluate thoughts without immediately accepting them enabling critical, evidence-based reasoning.
📚 References
-
A Complete Visual Reasoning Stack: From Conversations to Epistemic Fields
This post describes a lot fo the visual AI stuff we build this process on. -
ZeroModel: Visual AI you can scrutinize
Introduces the zeromodel the basic visual AI. -
The Space Between Models Has Holes: Mapping the AI Gap
Applied visual AI. This shows how we can use the vpms to understand information.
← Back to Blog