Dot Technical Whitepaper

abstract

The model is not the product. The constrained runtime is the product.

Dot is a pseudonymous private agentic compute network for users who need direct model access, search-augmented reasoning, and a coding agent that can execute work without turning identity, prompts, codebases, or research intent into durable platform data.

The system decomposes AI into planes: identity, inference, policy, agent execution, and observability. This decomposition allows the product to state precise invariants: raw prompts are not server-side transcripts, raw completions are not retained by default, and private requests are not training data.

contribution

Dot combines privacy-forward inference with an execution substrate. A private chat model answers; DotCode inspects repos, writes patches, runs commands, tests, verifies, and reports proof.

formal specification

System objects, constraints, and optimization target.

Let `U` be users, `I` identity handles, `P` raw prompts, `Y` raw completions, `M` models, `R` routes, `A` tool actions, `D` durable stores, `E` runtime events, and `C` policy classes. A private Dot request is a tuple:

request definition

x = (u, i, p, r_hint, private_mode, tau) u in U user i in I anonymous session, wallet, provider id, or username p in P raw prompt body r_hint in R client route hint tau timestamp bucket

router and response maps

rho(x) -> (c, r, m, z) Phi(x) -> (y, e_1, ..., e_k) c in C policy class r in R route: chat, search, code, policy, eval m in M selected model z tool plan or null e_1..e_k visible runtime events

private-mode retention invariant

For every private request x = (..., p, private_mode=1, ...): for all d in D: store(d, p) = false for all d in D: store(d, y) = false for all d in D_train: store(d, p) = false and store(d, y) = false Metadata may survive. Raw conversation does not.

system objective

pi* = argmax_pi E[ Q(x,y,a) - lambda_L * L(x) - lambda_G * G(x) - lambda_H * H(x) - lambda_N * N(x) - lambda_F * F(x) ] subject to: raw prompt retention = 0 raw completion retention = 0 no user-data training by default refuse self-harm facilitation refuse material harm-to-others

volatile private inference

Input: request x = (u, i, p, r_hint, private_mode=1, tau)
1. request_id <- random_uuid()
2. c <- boundary_classifier(p)
3. write_audit_metadata(request_id, c, route_hint, timestamp_bucket, hash_salted(p))
4. if c in {self_harm, harm_others}: return safe_boundary_response(c)
5. r <- route_classifier(p, r_hint)
6. m <- scheduler.select_model(route=r, private_mode=true)
7. emit("queued", route=r, model=m.public_name)
8. stream tokens from on-prem inference worker
9. zeroize(prompt_buffer)
10. discard(raw_completion_buffer)
11. write content-free metrics

on-prem inference

Private inference is a memory-management and scheduling problem.

Dot's private path runs open-weight models on Dot-controlled infrastructure. The serving layer should use memory-aware scheduling, continuous batching, and strict separation between chat, search, code planning, and generated-code execution.

inference topology

edge ingress
  +-- auth/session service
  +-- wallet verifier
  +-- request router
          +-- narrow policy classifier
          +-- model scheduler
          +-- search/rerank tool
          +-- DotCode handoff generator
          v
      GPU inference pool
          +-- chat workers
          +-- code workers
          +-- embedding/rerank workers
          +-- eval workers

KV-cache admission constraint

bytes_per_token(m) ~= 2 * L_m * H_m * d_head_m * b_m kv_bytes(x,m) ~= (n_in + n_out) * bytes_per_token(m) sum_{x in active(g)} kv_bytes(x,m) <= vram(g) - weights(m) - runtime_overhead(g) - reserve(g)

routing and admission control

1. candidates <- {m in M | m.private_mode_allowed and route in m.routes}
2. for each m:
3.   kv_required <- estimate_kv_blocks(prompt_tokens, max_new_tokens, m)
4.   latency_pred <- predict_ttft(m, queue_depth(m), kv_required)
5.   cost_pred <- estimate_cost(m, prompt_tokens, max_new_tokens)
6.   score[m] <- utility(route,m) - beta_1*latency_pred - beta_2*cost_pred - beta_3*saturation(m)
7. m_star <- argmax_m score[m]
8. if kv_required(m_star) > free_kv_blocks(m_star): defer_or_fallback()
9. admit(x, m_star)

Queue	Workload	Reason
chat-fast	Short direct answers	Protect low-latency usage
chat-search	Retrieval-grounded answers	Isolate web latency
code-plan	DotCode planning prompts	Avoid starving chat
code-agent	Tool/action loops	Separate shell execution from inference
policy	Boundary classification	Must stay low-latency and cheap

compute simulation

Synthetic capacity graphs expose where private inference breaks.

The following curves are simulation artifacts, not production measurements. They encode the capacity model used by the paper: context length consumes KV cache linearly, queue wait grows nonlinearly near saturation, and isolated route queues protect chat latency when DotCode runs long-horizon work.

context length vs relative concurrency

Doubling active context approximately halves concurrency under fixed KV-cache budget.

utilization vs relative wait

Near saturation, wait time rises nonlinearly even while workers are still busy.

shared vs isolated p95 TTFT

Route isolation protects chat from long code planning and tool loops under load.

weighted compute by route

Credits should price route cost, not raw message count.

privacy invariants

No-retention is an implementation invariant, not marketing copy.

Dot's privacy claim is testable: inject canary prompts, execute private requests, and scan server logs, app logs, proxy logs, error traces, metrics labels, queue dumps, and crash reports. The canary must be absent from durable surfaces unless the user explicitly enabled a debug capture.

proof sketch: raw prompt non-retention

If every component that observes p is in the no-write set: ingress, router, policy, queue, inference worker, tool worker, error reporter, metrics and each component either: (a) never receives raw p, or (b) receives p only in volatile memory and cannot write p to D, then for all durable stores d in D: store(d,p) = false.

no prompt logs

Request bodies stay out of ingress, app, queue, and error logs.

no training path

Dataset builders require lineage and reject private production prompts.

no identity join

Wallet/account stores do not contain raw conversation content.

dotcode runtime

Agent capability is the observe-plan-act-verify loop.

DotCode is the execution substrate. It receives a local handoff, scopes the workspace, inspects the repo, produces a plan, patches files, runs commands, observes failures, repairs, verifies, and reports proof. This is the difference between a private chatbot and private capability.

observe-plan-act-verify

Input: handoff h, workspace W
1. read h from local handoff file
2. scope <- determine_workspace_scope(W,h)
3. obs_0 <- inspect_repo(scope)
4. plan <- generate_plan(h, obs_0)
5. repeat
6.   a_t <- choose_action(plan, current_observation)
7.   if a_t mutates files: apply_patch(a_t); record_changed_files()
8.   if a_t runs command: result <- run_command(a_t); redact_secrets(result)
9.   if a_t verifies browser: capture_screenshot_and_console()
10.  current_observation <- inspect(result)
11.  plan <- repair_plan_if_needed(plan,current_observation)
12. until acceptance_criteria_met or blocked
13. return report(files_changed, commands_run, checks, screenshots, residual_risk)

DotChat routes build intent into a structured DotCode handoff.

DotCode executes inside a terminal-native agent loop.

policy boundary

The narrow boundary is measured by false refusal and false allow rates.

Dot refuses self-harm facilitation and operational assistance to harm others. The technical challenge is to minimize false allows on harmful prompts while also minimizing false refusals on lawful sensitive prompts.

boundary classifier

g_theta(p, context) = [Pr(allowed), Pr(self_harm), Pr(harm_others)] decision(p) = refuse if Pr(self_harm) >= alpha_self refuse if Pr(harm_others) >= alpha_harm allow otherwise false_allow = |{p in S_harmful: decision(p)=allow}| / |S_harmful| false_refuse = |{p in S_lawful_hard: decision(p)=refuse}| / |S_lawful_hard|

comparison

Chat-only private AI proves demand. Dot extends it into private execution.

The competitive set is not one company. It is the whole class of privacy-first chat products, private model APIs, local model wrappers, anonymous prompt surfaces, and hosted open-model endpoints. Dot's wedge is narrower and more technical: private chat plus DotCode, terminal execution, repo mutation, verification, wallet-native identity, and on-prem routing.

Axis	Chat-only private AI competitors	Dot target architecture
Core surface	Private chat and API	Private chat plus private coding agent
Differentiator	Privacy-forward uncensored model access	Private agentic execution
Identity	Product/account oriented	Anonymous, wallet, or provider
Model serving	Provider-controlled model/API surface	Dot-controlled on-prem private model pool
Coding	Model writes code text	Agent inspects, patches, runs, tests, verifies
Benchmark target	Model/API quality	End-to-end artifact success

evaluation

DotBench evaluates the whole system, not just model weights.

DotBench scores verified artifacts, tests, privacy invariants, refusal correctness, source provenance, latency, and cost. A beautiful artifact that leaks the prompt into logs is a failed task.

DotBench scoring functional

DotBench_i = w_V * V_i + w_T * T_i + w_P * P_i + w_R * R_i + w_S * S_i - w_L * L_i - w_C * C_i Hard gate: if P_i = 0 then DotBench_i = 0

Dimension	Metric	Target SLO
Chat latency	Time to visible acknowledgement	< 3 seconds under normal load
Streaming	Time to first token after route	< 10 seconds for normal chat
Search	Tool-state visibility	Search visible before retrieval completes
Code handoff	Button and handoff file creation	< 5 seconds after classification
Privacy	Raw prompt in server logs	0 known occurrences
Policy	False refusal and false allow rates	Tracked by synthetic eval sets

release governance

Serious private AI needs claim ledgers, model cards, and release gates.

The docs treat claims as typed engineering objects. A statement can be implemented, prototyped, target-state, simulated, measured, external-evidence, or policy-defined. This prevents the paper from mixing ambition with measured fact while still allowing a strong target architecture.

model cards

Model ID, family, license, quantization, context, route classes, evals, known failures, promotion state.

system cards

Identity modes, retention behavior, refusal boundary, DotCode permissions, limitations, incident playbooks.

release artifacts

Router card, policy card, DotBench report, privacy regression report, SRE scorecard, sandbox report.

release gate dependency DAG

Promotion fails closed if privacy canaries, policy evals, DotBench, or capacity gates fail.

claim taxonomy distribution target

A technical dossier can be ambitious if every claim is typed and evidence requirements are explicit.

Artifact	Purpose	Release gate
Router card	Route policy, fallback behavior, model eligibility	No default-route changes without review
Model card	Model identity, license, quantization, context, known failures	Required for every default model
Policy card	Boundary classes, thresholds, false-refusal and false-allow evals	Required before deployment
DotBench report	Artifact success, tests, privacy checks, latency, source provenance	Required for major releases
Privacy regression report	Canary scan across logs, traces, queues, metrics, crash paths	Hard gate
SRE scorecard	Saturation, queue isolation, incident readiness, rollback path	Required before traffic increase

reliability and risk

Privacy promises fail under load unless reliability is designed as a first-class invariant.

A private AI system must resist the operational temptation to debug by storing prompts. Dot's reliability model treats latency, queue visibility, search state, DotCode handoff readiness, and no-retention as SLOs. Latency degradation is a product issue; private prompt retention is a trust incident.

synthetic error-budget burn

If burn-rate spikes, release velocity freezes before privacy or queue failures become normal.

risk heatmap

Risk is treated as an engineering input: controls, evidence, owners, and release gates.

incident response state machine

Private-mode incidents prioritize containment and evidence over feature velocity.

slo classes by route

Privacy is modeled as a hard SLO class, not a soft brand claim.

Incident	First action	Recovery evidence
Prompt found in logs	Disable affected route logging and isolate store	Canary scan clean across logs, traces, queues, metrics
Model pool outage	Degrade route or fall back with visible queue state	Postmortem with saturation graph and admission-control change
Search injection event	Disable affected source class	Retrieval eval pass and source ranking review
DotCode overreach	Disable affected tool permission	Sandbox regression pass and scoped action review
False refusal spike	Roll back classifier threshold	Lawful-sensitive eval pass and policy card delta