Dot.
Dot Research / Whitepaper v0.3 / target-state systems specification

Pseudonymous Private Agentic Compute

A formal architecture for no-retention AI inference, wallet-native access, narrow harm-boundary policy, and verified code execution through DotCode. Dot is specified as a constrained distributed system: prompts are volatile execution material, models are swappable workers, and capability is measured by evidence-producing agent loops rather than chat fluency.

abstract

The model is not the product. The constrained runtime is the product.

Dot is a pseudonymous private agentic compute network for users who need direct model access, search-augmented reasoning, and a coding agent that can execute work without turning identity, prompts, codebases, or research intent into durable platform data.

The system decomposes AI into planes: identity, inference, policy, agent execution, and observability. This decomposition allows the product to state precise invariants: raw prompts are not server-side transcripts, raw completions are not retained by default, and private requests are not training data.

contribution

Dot combines privacy-forward inference with an execution substrate. A private chat model answers; DotCode inspects repos, writes patches, runs commands, tests, verifies, and reports proof.

formal specification

System objects, constraints, and optimization target.

Let `U` be users, `I` identity handles, `P` raw prompts, `Y` raw completions, `M` models, `R` routes, `A` tool actions, `D` durable stores, `E` runtime events, and `C` policy classes. A private Dot request is a tuple:

request definition
x = (u, i, p, r_hint, private_mode, tau) u in U user i in I anonymous session, wallet, provider id, or username p in P raw prompt body r_hint in R client route hint tau timestamp bucket
router and response maps
rho(x) -> (c, r, m, z) Phi(x) -> (y, e_1, ..., e_k) c in C policy class r in R route: chat, search, code, policy, eval m in M selected model z tool plan or null e_1..e_k visible runtime events
private-mode retention invariant
For every private request x = (..., p, private_mode=1, ...): for all d in D: store(d, p) = false for all d in D: store(d, y) = false for all d in D_train: store(d, p) = false and store(d, y) = false Metadata may survive. Raw conversation does not.
system objective
pi* = argmax_pi E[ Q(x,y,a) - lambda_L * L(x) - lambda_G * G(x) - lambda_H * H(x) - lambda_N * N(x) - lambda_F * F(x) ] subject to: raw prompt retention = 0 raw completion retention = 0 no user-data training by default refuse self-harm facilitation refuse material harm-to-others
volatile private inference
Input: request x = (u, i, p, r_hint, private_mode=1, tau)
1. request_id <- random_uuid()
2. c <- boundary_classifier(p)
3. write_audit_metadata(request_id, c, route_hint, timestamp_bucket, hash_salted(p))
4. if c in {self_harm, harm_others}: return safe_boundary_response(c)
5. r <- route_classifier(p, r_hint)
6. m <- scheduler.select_model(route=r, private_mode=true)
7. emit("queued", route=r, model=m.public_name)
8. stream tokens from on-prem inference worker
9. zeroize(prompt_buffer)
10. discard(raw_completion_buffer)
11. write content-free metrics
on-prem inference

Private inference is a memory-management and scheduling problem.

Dot's private path runs open-weight models on Dot-controlled infrastructure. The serving layer should use memory-aware scheduling, continuous batching, and strict separation between chat, search, code planning, and generated-code execution.

inference topology
edge ingress
  +-- auth/session service
  +-- wallet verifier
  +-- request router
          +-- narrow policy classifier
          +-- model scheduler
          +-- search/rerank tool
          +-- DotCode handoff generator
          v
      GPU inference pool
          +-- chat workers
          +-- code workers
          +-- embedding/rerank workers
          +-- eval workers
KV-cache admission constraint
bytes_per_token(m) ~= 2 * L_m * H_m * d_head_m * b_m kv_bytes(x,m) ~= (n_in + n_out) * bytes_per_token(m) sum_{x in active(g)} kv_bytes(x,m) <= vram(g) - weights(m) - runtime_overhead(g) - reserve(g)
routing and admission control
1. candidates <- {m in M | m.private_mode_allowed and route in m.routes}
2. for each m:
3.   kv_required <- estimate_kv_blocks(prompt_tokens, max_new_tokens, m)
4.   latency_pred <- predict_ttft(m, queue_depth(m), kv_required)
5.   cost_pred <- estimate_cost(m, prompt_tokens, max_new_tokens)
6.   score[m] <- utility(route,m) - beta_1*latency_pred - beta_2*cost_pred - beta_3*saturation(m)
7. m_star <- argmax_m score[m]
8. if kv_required(m_star) > free_kv_blocks(m_star): defer_or_fallback()
9. admit(x, m_star)
QueueWorkloadReason
chat-fastShort direct answersProtect low-latency usage
chat-searchRetrieval-grounded answersIsolate web latency
code-planDotCode planning promptsAvoid starving chat
code-agentTool/action loopsSeparate shell execution from inference
policyBoundary classificationMust stay low-latency and cheap
compute simulation

Synthetic capacity graphs expose where private inference breaks.

The following curves are simulation artifacts, not production measurements. They encode the capacity model used by the paper: context length consumes KV cache linearly, queue wait grows nonlinearly near saturation, and isolated route queues protect chat latency when DotCode runs long-horizon work.

context length vs relative concurrency
1.0.75.50.250 2k4k8k16k32k relative admitted sequences
Doubling active context approximately halves concurrency under fixed KV-cache budget.
utilization vs relative wait
010x20x30x40x .30.50.70.80.90.95 relative waiting time
Near saturation, wait time rises nonlinearly even while workers are still busy.
shared vs isolated p95 TTFT
0s10s20s30s 30%60%80%90% red shared queue, gray isolated queues
Route isolation protects chat from long code planning and tool loops under load.
weighted compute by route
direct_chat search_chat code_plan code_agent policy relative weighted compute units
Credits should price route cost, not raw message count.
privacy invariants

No-retention is an implementation invariant, not marketing copy.

Dot's privacy claim is testable: inject canary prompts, execute private requests, and scan server logs, app logs, proxy logs, error traces, metrics labels, queue dumps, and crash reports. The canary must be absent from durable surfaces unless the user explicitly enabled a debug capture.

proof sketch: raw prompt non-retention
If every component that observes p is in the no-write set: ingress, router, policy, queue, inference worker, tool worker, error reporter, metrics and each component either: (a) never receives raw p, or (b) receives p only in volatile memory and cannot write p to D, then for all durable stores d in D: store(d,p) = false.
no prompt logs

Request bodies stay out of ingress, app, queue, and error logs.

no training path

Dataset builders require lineage and reject private production prompts.

no identity join

Wallet/account stores do not contain raw conversation content.

dotcode runtime

Agent capability is the observe-plan-act-verify loop.

DotCode is the execution substrate. It receives a local handoff, scopes the workspace, inspects the repo, produces a plan, patches files, runs commands, observes failures, repairs, verifies, and reports proof. This is the difference between a private chatbot and private capability.

observe-plan-act-verify
Input: handoff h, workspace W
1. read h from local handoff file
2. scope <- determine_workspace_scope(W,h)
3. obs_0 <- inspect_repo(scope)
4. plan <- generate_plan(h, obs_0)
5. repeat
6.   a_t <- choose_action(plan, current_observation)
7.   if a_t mutates files: apply_patch(a_t); record_changed_files()
8.   if a_t runs command: result <- run_command(a_t); redact_secrets(result)
9.   if a_t verifies browser: capture_screenshot_and_console()
10.  current_observation <- inspect(result)
11.  plan <- repair_plan_if_needed(plan,current_observation)
12. until acceptance_criteria_met or blocked
13. return report(files_changed, commands_run, checks, screenshots, residual_risk)
DotChat routes build intent into a structured DotCode handoff.
DotCode executes inside a terminal-native agent loop.
policy boundary

The narrow boundary is measured by false refusal and false allow rates.

Dot refuses self-harm facilitation and operational assistance to harm others. The technical challenge is to minimize false allows on harmful prompts while also minimizing false refusals on lawful sensitive prompts.

boundary classifier
g_theta(p, context) = [Pr(allowed), Pr(self_harm), Pr(harm_others)] decision(p) = refuse if Pr(self_harm) >= alpha_self refuse if Pr(harm_others) >= alpha_harm allow otherwise false_allow = |{p in S_harmful: decision(p)=allow}| / |S_harmful| false_refuse = |{p in S_lawful_hard: decision(p)=refuse}| / |S_lawful_hard|
comparison

Chat-only private AI proves demand. Dot extends it into private execution.

The competitive set is not one company. It is the whole class of privacy-first chat products, private model APIs, local model wrappers, anonymous prompt surfaces, and hosted open-model endpoints. Dot's wedge is narrower and more technical: private chat plus DotCode, terminal execution, repo mutation, verification, wallet-native identity, and on-prem routing.

AxisChat-only private AI competitorsDot target architecture
Core surfacePrivate chat and APIPrivate chat plus private coding agent
DifferentiatorPrivacy-forward uncensored model accessPrivate agentic execution
IdentityProduct/account orientedAnonymous, wallet, or provider
Model servingProvider-controlled model/API surfaceDot-controlled on-prem private model pool
CodingModel writes code textAgent inspects, patches, runs, tests, verifies
Benchmark targetModel/API qualityEnd-to-end artifact success
evaluation

DotBench evaluates the whole system, not just model weights.

DotBench scores verified artifacts, tests, privacy invariants, refusal correctness, source provenance, latency, and cost. A beautiful artifact that leaks the prompt into logs is a failed task.

DotBench scoring functional
DotBench_i = w_V * V_i + w_T * T_i + w_P * P_i + w_R * R_i + w_S * S_i - w_L * L_i - w_C * C_i Hard gate: if P_i = 0 then DotBench_i = 0
DimensionMetricTarget SLO
Chat latencyTime to visible acknowledgement< 3 seconds under normal load
StreamingTime to first token after route< 10 seconds for normal chat
SearchTool-state visibilitySearch visible before retrieval completes
Code handoffButton and handoff file creation< 5 seconds after classification
PrivacyRaw prompt in server logs0 known occurrences
PolicyFalse refusal and false allow ratesTracked by synthetic eval sets
release governance

Serious private AI needs claim ledgers, model cards, and release gates.

The docs treat claims as typed engineering objects. A statement can be implemented, prototyped, target-state, simulated, measured, external-evidence, or policy-defined. This prevents the paper from mixing ambition with measured fact while still allowing a strong target architecture.

model cards

Model ID, family, license, quantization, context, route classes, evals, known failures, promotion state.

system cards

Identity modes, retention behavior, refusal boundary, DotCode permissions, limitations, incident playbooks.

release artifacts

Router card, policy card, DotBench report, privacy regression report, SRE scorecard, sandbox report.

release gate dependency DAG
model card offline eval policy card privacy canary DotBench SRE score canary route default route
Promotion fails closed if privacy canaries, policy evals, DotBench, or capacity gates fail.
claim taxonomy distribution target
external evidence implemented prototype target-state simulated relative count in public dossier
A technical dossier can be ambitious if every claim is typed and evidence requirements are explicit.
ArtifactPurposeRelease gate
Router cardRoute policy, fallback behavior, model eligibilityNo default-route changes without review
Model cardModel identity, license, quantization, context, known failuresRequired for every default model
Policy cardBoundary classes, thresholds, false-refusal and false-allow evalsRequired before deployment
DotBench reportArtifact success, tests, privacy checks, latency, source provenanceRequired for major releases
Privacy regression reportCanary scan across logs, traces, queues, metrics, crash pathsHard gate
SRE scorecardSaturation, queue isolation, incident readiness, rollback pathRequired before traffic increase
reliability and risk

Privacy promises fail under load unless reliability is designed as a first-class invariant.

A private AI system must resist the operational temptation to debug by storing prompts. Dot's reliability model treats latency, queue visibility, search state, DotCode handoff readiness, and no-retention as SLOs. Latency degradation is a product issue; private prompt retention is a trust incident.

synthetic error-budget burn
100%75%50%25%0 d1d7d14d21d28 red: bad burn, gray: controlled burn
If burn-rate spikes, release velocity freezes before privacy or queue failures become normal.
risk heatmap
lowlikelihoodhigh impact privacy drift gpu saturation wallet auth agent overreach false allow retrieval false refusal
Risk is treated as an engineering input: controls, evidence, owners, and release gates.
incident response state machine
detect contain rollback verify postmortem publish delta privacy incident freezes growth work
Private-mode incidents prioritize containment and evidence over feature velocity.
slo classes by route
policy chat search dotcode privacy p99 classifier < 500ms p95 TTFT < 10s visible search state < 3s handoff ready < 5s 0 raw-retention incidents
Privacy is modeled as a hard SLO class, not a soft brand claim.
IncidentFirst actionRecovery evidence
Prompt found in logsDisable affected route logging and isolate storeCanary scan clean across logs, traces, queues, metrics
Model pool outageDegrade route or fall back with visible queue statePostmortem with saturation graph and admission-control change
Search injection eventDisable affected source classRetrieval eval pass and source ranking review
DotCode overreachDisable affected tool permissionSandbox regression pass and scoped action review
False refusal spikeRoll back classifier thresholdLawful-sensitive eval pass and policy card delta