The model is not the product. The constrained runtime is the product.
Dot is a pseudonymous private agentic compute network for users who need direct model access, search-augmented reasoning, and a coding agent that can execute work without turning identity, prompts, codebases, or research intent into durable platform data.
The system decomposes AI into planes: identity, inference, policy, agent execution, and observability. This decomposition allows the product to state precise invariants: raw prompts are not server-side transcripts, raw completions are not retained by default, and private requests are not training data.
Dot combines privacy-forward inference with an execution substrate. A private chat model answers; DotCode inspects repos, writes patches, runs commands, tests, verifies, and reports proof.
System objects, constraints, and optimization target.
Let `U` be users, `I` identity handles, `P` raw prompts, `Y` raw completions, `M` models, `R` routes, `A` tool actions, `D` durable stores, `E` runtime events, and `C` policy classes. A private Dot request is a tuple:
Input: request x = (u, i, p, r_hint, private_mode=1, tau)
1. request_id <- random_uuid()
2. c <- boundary_classifier(p)
3. write_audit_metadata(request_id, c, route_hint, timestamp_bucket, hash_salted(p))
4. if c in {self_harm, harm_others}: return safe_boundary_response(c)
5. r <- route_classifier(p, r_hint)
6. m <- scheduler.select_model(route=r, private_mode=true)
7. emit("queued", route=r, model=m.public_name)
8. stream tokens from on-prem inference worker
9. zeroize(prompt_buffer)
10. discard(raw_completion_buffer)
11. write content-free metrics
Private inference is a memory-management and scheduling problem.
Dot's private path runs open-weight models on Dot-controlled infrastructure. The serving layer should use memory-aware scheduling, continuous batching, and strict separation between chat, search, code planning, and generated-code execution.
edge ingress
+-- auth/session service
+-- wallet verifier
+-- request router
+-- narrow policy classifier
+-- model scheduler
+-- search/rerank tool
+-- DotCode handoff generator
v
GPU inference pool
+-- chat workers
+-- code workers
+-- embedding/rerank workers
+-- eval workers
1. candidates <- {m in M | m.private_mode_allowed and route in m.routes}
2. for each m:
3. kv_required <- estimate_kv_blocks(prompt_tokens, max_new_tokens, m)
4. latency_pred <- predict_ttft(m, queue_depth(m), kv_required)
5. cost_pred <- estimate_cost(m, prompt_tokens, max_new_tokens)
6. score[m] <- utility(route,m) - beta_1*latency_pred - beta_2*cost_pred - beta_3*saturation(m)
7. m_star <- argmax_m score[m]
8. if kv_required(m_star) > free_kv_blocks(m_star): defer_or_fallback()
9. admit(x, m_star)
| Queue | Workload | Reason |
|---|---|---|
| chat-fast | Short direct answers | Protect low-latency usage |
| chat-search | Retrieval-grounded answers | Isolate web latency |
| code-plan | DotCode planning prompts | Avoid starving chat |
| code-agent | Tool/action loops | Separate shell execution from inference |
| policy | Boundary classification | Must stay low-latency and cheap |
Synthetic capacity graphs expose where private inference breaks.
The following curves are simulation artifacts, not production measurements. They encode the capacity model used by the paper: context length consumes KV cache linearly, queue wait grows nonlinearly near saturation, and isolated route queues protect chat latency when DotCode runs long-horizon work.
No-retention is an implementation invariant, not marketing copy.
Dot's privacy claim is testable: inject canary prompts, execute private requests, and scan server logs, app logs, proxy logs, error traces, metrics labels, queue dumps, and crash reports. The canary must be absent from durable surfaces unless the user explicitly enabled a debug capture.
Request bodies stay out of ingress, app, queue, and error logs.
Dataset builders require lineage and reject private production prompts.
Wallet/account stores do not contain raw conversation content.
Agent capability is the observe-plan-act-verify loop.
DotCode is the execution substrate. It receives a local handoff, scopes the workspace, inspects the repo, produces a plan, patches files, runs commands, observes failures, repairs, verifies, and reports proof. This is the difference between a private chatbot and private capability.
Input: handoff h, workspace W 1. read h from local handoff file 2. scope <- determine_workspace_scope(W,h) 3. obs_0 <- inspect_repo(scope) 4. plan <- generate_plan(h, obs_0) 5. repeat 6. a_t <- choose_action(plan, current_observation) 7. if a_t mutates files: apply_patch(a_t); record_changed_files() 8. if a_t runs command: result <- run_command(a_t); redact_secrets(result) 9. if a_t verifies browser: capture_screenshot_and_console() 10. current_observation <- inspect(result) 11. plan <- repair_plan_if_needed(plan,current_observation) 12. until acceptance_criteria_met or blocked 13. return report(files_changed, commands_run, checks, screenshots, residual_risk)
The narrow boundary is measured by false refusal and false allow rates.
Dot refuses self-harm facilitation and operational assistance to harm others. The technical challenge is to minimize false allows on harmful prompts while also minimizing false refusals on lawful sensitive prompts.
Chat-only private AI proves demand. Dot extends it into private execution.
The competitive set is not one company. It is the whole class of privacy-first chat products, private model APIs, local model wrappers, anonymous prompt surfaces, and hosted open-model endpoints. Dot's wedge is narrower and more technical: private chat plus DotCode, terminal execution, repo mutation, verification, wallet-native identity, and on-prem routing.
| Axis | Chat-only private AI competitors | Dot target architecture |
|---|---|---|
| Core surface | Private chat and API | Private chat plus private coding agent |
| Differentiator | Privacy-forward uncensored model access | Private agentic execution |
| Identity | Product/account oriented | Anonymous, wallet, or provider |
| Model serving | Provider-controlled model/API surface | Dot-controlled on-prem private model pool |
| Coding | Model writes code text | Agent inspects, patches, runs, tests, verifies |
| Benchmark target | Model/API quality | End-to-end artifact success |
DotBench evaluates the whole system, not just model weights.
DotBench scores verified artifacts, tests, privacy invariants, refusal correctness, source provenance, latency, and cost. A beautiful artifact that leaks the prompt into logs is a failed task.
| Dimension | Metric | Target SLO |
|---|---|---|
| Chat latency | Time to visible acknowledgement | < 3 seconds under normal load |
| Streaming | Time to first token after route | < 10 seconds for normal chat |
| Search | Tool-state visibility | Search visible before retrieval completes |
| Code handoff | Button and handoff file creation | < 5 seconds after classification |
| Privacy | Raw prompt in server logs | 0 known occurrences |
| Policy | False refusal and false allow rates | Tracked by synthetic eval sets |
Serious private AI needs claim ledgers, model cards, and release gates.
The docs treat claims as typed engineering objects. A statement can be implemented, prototyped, target-state, simulated, measured, external-evidence, or policy-defined. This prevents the paper from mixing ambition with measured fact while still allowing a strong target architecture.
Model ID, family, license, quantization, context, route classes, evals, known failures, promotion state.
Identity modes, retention behavior, refusal boundary, DotCode permissions, limitations, incident playbooks.
Router card, policy card, DotBench report, privacy regression report, SRE scorecard, sandbox report.
| Artifact | Purpose | Release gate |
|---|---|---|
| Router card | Route policy, fallback behavior, model eligibility | No default-route changes without review |
| Model card | Model identity, license, quantization, context, known failures | Required for every default model |
| Policy card | Boundary classes, thresholds, false-refusal and false-allow evals | Required before deployment |
| DotBench report | Artifact success, tests, privacy checks, latency, source provenance | Required for major releases |
| Privacy regression report | Canary scan across logs, traces, queues, metrics, crash paths | Hard gate |
| SRE scorecard | Saturation, queue isolation, incident readiness, rollback path | Required before traffic increase |
Privacy promises fail under load unless reliability is designed as a first-class invariant.
A private AI system must resist the operational temptation to debug by storing prompts. Dot's reliability model treats latency, queue visibility, search state, DotCode handoff readiness, and no-retention as SLOs. Latency degradation is a product issue; private prompt retention is a trust incident.
| Incident | First action | Recovery evidence |
|---|---|---|
| Prompt found in logs | Disable affected route logging and isolate store | Canary scan clean across logs, traces, queues, metrics |
| Model pool outage | Degrade route or fall back with visible queue state | Postmortem with saturation graph and admission-control change |
| Search injection event | Disable affected source class | Retrieval eval pass and source ranking review |
| DotCode overreach | Disable affected tool permission | Sandbox regression pass and scoped action review |
| False refusal spike | Roll back classifier threshold | Lawful-sensitive eval pass and policy card delta |