Executive Summary · At a Glance

What Mihwar is — and where it goes from here

Mihwar (محور · "pivot, axis") is the operating system for an AI consulting practice. It ships in three real versions and one speculative one. V0.1 is the personal MVP — a single page where Ahmed types an idea and Mihwar returns a stack-aware build playbook in minutes. V1 productizes the same engine as a 5-stage cockpit NMO uses to ship $25,000 client Blueprints. V2 opens it to clients as a self-serve SaaS. V-future is a marketplace bet kept alive only in the architecture.

First ship
V0.1 · 3–7 days
personal idea compiler
Then
V1 · 6 weeks
consultant cockpit · $25k Blueprints
V1 cycle
≤7 days
workspace → signed Blueprint
Backup
Daily · off-server
encrypted · restore-tested

One-sentence pitch

Mihwar in a sentence
Mihwar turns ideas into stack-aware build playbooks — first for Ahmed himself in a single page (V0.1), then for NMO consulting teams as a $25,000 Blueprint engine (V1), then for clients as a self-serve SaaS (V2). One engine, three operators in sequence, no rewrite between them.
محور · من فكرة شخصية إلى مخطط بناء في صفحة واحدة، ثم محرك استشاري بـ ٢٥ ألف دولار، ثم منصة ذاتية الخدمة. نفس المحرك، ثلاثة مشغلين. From personal idea to client SaaS — one engine, three operators in sequence.

The versions, at a glance

Mihwar ships in four layers. Each inherits the one below. V0.1 is the next thing to build — the personal MVP that closes the loop in days. V1 productizes it for NMO consulting in 6 weeks after V0.1 validates. V2 unlocks when clients pull. V-future is a bet kept alive only so today's architecture doesn't foreclose it.

What's in each version

V0.1 ◉ MVP · ships first

Idea Compiler

A single page where Ahmed types an idea; Mihwar returns a stack-aware build playbook with explicit local-vs-cloud flags and agent assignments — in minutes.

OperatorAhmed only · single user · zero auth surface
InputFree-form idea + pre-loaded Operator Profile (your stack: VPS, agents, services, APIs)
OutputBuild playbook — architecture, agent assignments, sequenced steps, local/cloud breakdown, cost estimate, risks
Local/cloudEvery component flagged 🔵 LOCAL (your VPS) · ☁ CLOUD (3rd-party / API) · 🌗 HYBRID
Cycle1 idea → 1 playbook in <5 minutes (single LLM call)
WorkflowSingle-shot · structured output · no multi-stage gating
StackNext.js single page · Anthropic API · Operator Profile in JSON · no Postgres yet
Build effort3–7 days · one Claude Code prompt · displaces nothing in V1's design
Why firstDogfood the engine on your own ideas before selling V1 as $25k Blueprints — validates the loop in days, not weeks
StatusSpecced · ready to build
V1 ● After V0.1 · 6 weeks

Consultant Cockpit

The cockpit NMO Partners uses to run client engagements end-to-end.

OperatorNMO Partners — Ahmed today, the growing team from Month 4
TenancySingle tenant in operation · multi-tenant in the data model from day one
Deliverable$25,000 Blueprint · single self-contained HTML · bilingual EN/AR · signed manifest
Cycle≤7 days from kickoff to signed Blueprint · target ≥30% conversion to build
Workflow5 stages — Lab → Discovery → Architecture → Playbook → Handoff
StackPostgres · Redis · FastAPI · Next.js · Hostinger VPS · Coolify · Traefik
SecurityWireGuard admin · per-tenant DEK · audit trail · signed exports · self-isolated from Apex
BackupDaily off-server encrypted · restore drill on Day 5 · drill repeated quarterly
Cost target≤$30 in API spend per Blueprint · cache hit rate ≥80%
StatusDesigned · queued behind V0.1  masterplan live at this URL
V2 ◐ Triggered · post-V1

Client Platform · SaaS

Same engine, exposed to clients. Self-serve AI visioning inside the client's organisation.

OperatorClients (CTO / Head of AI) — self-serve, no NMO consultant in the loop
TenancyMulti-tenant · per-tenant DEK · row-level security · cross-tenant isolation tested
DeliverableSame Blueprint format · per-tenant branding · same signed manifest
PricingPer-seat OR per-Blueprint subscription · billing tier preview lives in §SaaS Billing
New vs V1Org Profile · public sign-up · billing · branding · embedded coaching · annual profile review
Trigger10+ inbound prospects asking for self-serve access (logged starting Week 6)
ReusesEngine · catalog · 5-stage workflow · Blueprint format · security model · logging
BackupPer-tenant DEK · same daily off-server cadence · per-tenant restore-tested
StatusDesigned  not yet building · waiting for pull signal
V-future ○ Speculative · year 2+

Federated Catalog & Build Bridge

Speculative. Not committed. Listed so the architecture doesn't foreclose it.

IdeaCurated catalog opens to partners · Blueprint → Build handoff to NMO Apex agents
Why laterNeeds ≥25 V2 tenants and a separate, mature product (Apex) before it earns attention
RiskMarketplace dynamics are hard · unfair to commit to before V2 data exists
StatusA bet · not a plan · revisit after first 25 V2 tenants ship
Plan honesty
V0.1 is the next thing being built — the personal MVP that proves the loop in days. V1 productizes the same loop into the 5-stage Blueprint engine after V0.1 validates. V2 is fully designed but conditional on inbound pull — capital is not burned building it before clients ask. V-future is on this page only because today's architecture decisions should not foreclose it. That's the honest version of the roadmap.

The four mental tests Mihwar is built against

Every implementation choice in this document — every endpoint, every query, every prompt — is checked against four questions. They appear as flags throughout the rest of the masterplan.

Scale

Would this still work at 100× current load?

Security

How could this be abused by a hostile actor?

Observability

Could this be investigated at 2am, six months from now?

Economics

Affordable at 100× usage? Cost per user per month?

Where to read next

Six entry points, depending on what you came here for:

If you want the why

Vision & Promise

The product story. Who needs it, why now, what it replaces.

Open Vision →
If you want the how

Architecture & Stack

System architecture, data model, multi-tenancy, security boundary.

Open Architecture →
If you want when

6-Week Roadmap

Day-by-day plan from VPS Day 0 to first signed engagement.

Open Roadmap →
If you want what could go wrong

Risks & Mitigations

12 named risks, mapped to mitigations baked into the plan.

Open Risks →
If you want the SaaS picture

Phase 2 & SaaS

What V2 is, when it triggers, how it sells, how it bills.

Open SaaS Path →
If you start Monday

Monday Morning Actions

The 7-day plan from "approved" to "first prompt running on the VPS".

Open Monday →
This is a one-screen view
Every claim above is unpacked, justified, and made falsifiable in one of the 30+ sections that follow. Use the sidebar.
V0.1 · Personal MVP · Ships First

V0.1 · The Idea Compiler

One page. You type an idea. Mihwar returns a stack-aware build playbook — what to build, which of your agents owns each step, what runs locally on your VPS, what runs in the cloud. Three to seven days from spec to live URL. The personal MVP that ships before V1, dogfoods the engine on Ahmed's own ideas, and earns the right to build the rest.

Build time
3–7 days
single Claude Code prompt
User cycle
≤5 min
idea → playbook
Operator
1 (you)
no auth · no tenancy
Cost / playbook
≤$0.50
single Sonnet call · cached profile

Why V0.1 exists

V1 is six weeks of build before the engine compiles its first idea into a deliverable. That's too long to validate the core loop. V0.1 compresses everything that matters about V1 into a single page that Ahmed uses on himself, every day, on whatever idea is on his mind that morning. If the playbooks it produces are useful, V1 is worth building. If not, the masterplan changes before any client sees it.

The loop V0.1 closes
You have an idea → Mihwar grounds it in your actual stack → Mihwar returns a build playbook with local/cloud flags and agent assignments → you build it (or you don't) → next idea. Same loop V1 closes for clients, just one tier up the abstraction ladder.

The single page

Input — what the page asks for

  • Idea field — free-form textarea. "I want to build X for Y reason." Two to five sentences typical. No structure forced.
  • Operator Profile — pre-loaded from a JSON file. The user does not re-enter it every time. Edits happen once via a settings page (Phase 2 of V0.1, optional).
  • Optional context — paste a link to an existing repo, or a one-line preference like use Next.js, not Astro.

Output — the build playbook

SectionWhat it containsWhy it's here
1 · Idea summary One sentence reflecting the idea back, plus the success metric implied by it. Confirms the engine understood. Catches misreads early.
2 · Architecture Component list. Each component flagged 🔵 LOCAL · ☁ CLOUD · 🌗 HYBRID. Includes runtime, storage, queue, frontend, observability. Local/cloud is the whole reason V0.1 exists. Surfaces decisions that change cost, latency, and sovereignty before you build.
3 · Agent assignments For each build step, the suggested agent from your roster (e.g. PM, Dev-1, VPS Admin, Cyber). External tasks (e.g. domain registration) flagged as human-only. Lets you forward sections of the playbook directly to the agent that will execute them.
4 · Build sequence Ordered steps with effort estimate (≈hours), dependencies, and "definition of done" per step. So "build playbook" doesn't mean "vague TODO list". You can start work after reading.
5 · Cost estimate Monthly cost split by 🔵 LOCAL (sunk · already paid via VPS) and ☁ CLOUD (per-API / per-month). Worst-case at 100× usage. Mental-test #4 (Economics) baked in from idea zero.
6 · Risks & unknowns 3–5 named risks with likelihood and mitigation. Explicit "what we don't know yet" list. Prevents the playbook from feeling more confident than it should.

The local-vs-cloud flag system

Every component in the architecture section gets one flag. The flag is the whole point of V0.1.

🔵 Local

Runs on your VPS

Postgres in a container, Coolify-managed services, n8n flows, files on disk, the cron that runs daily backups. Sunk cost — no per-call billing. Sovereignty stays with you.

☁ Cloud

External SaaS or API

Anthropic API calls, GitHub repos, Linear issues, Stripe payments, S3-compatible backup target, third-party SMTP. Per-call billing. Scales without your hardware.

🌗 Hybrid

Local with cloud fallback

Local Ollama with Anthropic API failover, on-VPS embedding model with cloud fallback for spikes, local SMTP relay routed through SES on volume. The pragmatic default for variable load.

Operator Profile — Ahmed's edition

The Operator Profile is the JSON Mihwar reads as static context on every call. It's pre-loaded for Ahmed; future operators (NMO consultants, then clients in V2) will get their own.

SectionExamplesUpdate cadence
InfrastructureHostinger VPS · Coolify · Traefik · Postgres available · Redis available · WireGuard adminAnnual or on stack change
Cloud APIsAnthropic API · OpenAI fallback · GitHub · Linear · n8n · Hostinger DNSOn API key rotation
Agents availableThe agents in your Apex roster — PM · Productizer · VPS Admin · Dev-1 · Dev-2 · Data Sci · Cyber · HR · MarketingOn agent roster change
Personal preferencesStack defaults (Next.js · FastAPI · Postgres) · auth library · deploy toolWhenever taste changes
ConstraintsVPS RAM cap · cost cap per project per month · regions allowed · sovereign-cloud requirementAnnual
Existing assetsSibling products on the same VPS (e.g. Apex, n8n) · their networks · domains owned · wildcard cert availabilityOn infrastructure change
Sibling-product safety Security
The Operator Profile names sibling products on the same VPS so V0.1 can avoid collisions (port, network, domain). It does not share filesystems, networks, or credentials with them. Mihwar V0.1 generates plans that respect the existing isolation boundary; it never proposes touching sibling-product internals.

Architecture · what V0.1 itself looks like

LayerWhatFlag
FrontendSingle Next.js page · textarea + submit · renders the returned playbook as styled HTML🔵 Local · runs in your VPS container
BackendOne Next.js API route or FastAPI endpoint · receives idea + reads Operator Profile from disk · calls Anthropic API · returns structured playbook🔵 Local app · ☁ Anthropic call
LLMClaude Sonnet 4.6 · structured output (JSON schema) · prompt-cached static prefix (system + Operator Profile)☁ Cloud (Anthropic API)
StorageOperator Profile lives in a single JSON file on disk · past playbooks saved as HTML files in a folder · no database🔵 Local
AuthBehind WireGuard / IP allowlist — Ahmed only · no login UI · no tenancy logic🔵 Local
ObservabilityPer-call log to a JSON file: model · input tokens · output tokens · cache_read_tokens · cost_usd · request_id · timestamp · idea hash🔵 Local
Cost ceilingHard cap: ≤$0.50 per playbook · monthly soft alert at $20 spend (it'll never get close)☁ Anthropic billing

Build effort — 3 to 7 days, one Claude Code prompt

  1. Day 1 — Scaffold Next.js page, paste Operator Profile JSON, hardcode a sample idea, get the LLM call working with structured output.
  2. Day 2 — Wire the form, render playbook sections, style the local/cloud flags, add the cost-log JSON file.
  3. Day 3 — Prompt caching on the static prefix, verify cache_read_tokens > 0 on call #2, tighten the prompt to enforce the 6-section output.
  4. Day 4–5 — Dogfood on five real ideas Ahmed has been sitting on. Observe where the playbook gets vague, sharpen the prompt, refine the Operator Profile.
  5. Day 6–7 — Deploy to https://mihwar.nmopartners.com/v01 behind WireGuard. Add a tiny TOC of past playbooks. Decide whether V1 is worth building based on whether you used V0.1 daily.

What V0.1 inherits to V1

  • The loop — idea → grounded playbook → execute. V1's 5-stage workflow is the same loop with explicit gates.
  • Operator Profile concept — V1 generalises it: each consultant has one, each client (in V2) has an Org Profile that is the same idea at the buyer level.
  • Local/cloud flagging — promoted to a first-class field in V1's Blueprint format. Clients get the same colour-coded breakdown.
  • Agent assignment language — V1's Stage 4 (Playbook) inherits the same agent-references vocabulary.
  • Cost-log shape — the JSON written per call in V0.1 becomes the schema for the ai_calls table in V1's Postgres.
  • Sonnet + prompt caching — V0.1 proves the cache hit rate behaviour before V1 scales it across five distinct prompts.

What V0.1 explicitly is not

  • Not a Blueprint engine. Output is a build playbook for Ahmed, not a $25k client deliverable.
  • Not multi-tenant. One JSON profile. One user. One folder of past playbooks.
  • Not a five-stage workflow. Single shot. No gates. No async forms. No house-style filter.
  • Not the catalog. Recommendations are grounded in the Operator Profile, not in a curated library of vendors and patterns.
  • Not for clients. Behind WireGuard. Never shown externally.

Success criteria

  • Ahmed uses V0.1 on at least 5 of his own ideas in the first 14 days.
  • At least one playbook leads to a built thing that ships.
  • Cost stays under $20/month.
  • Cache hit rate ≥ 80% by call #3 in any session.
  • Ahmed is willing to commit the next 6 weeks to V1 because V0.1 worked — or honestly say "the loop isn't useful" and reshape V1 before building it.
V0.1 is the cheapest correct first step
Three to seven days. One prompt. One static JSON file. One LLM call. If V0.1 doesn't change how Ahmed builds, V1 won't change how clients buy. Find that out in days, not weeks.
Part A · Vision

Mihwar — A consultant's cockpit today, a client platform tomorrow

Mihwar (محور · "pivot, axis") is a two-phase platform. Phase 1 is a single-operator web app that turns 3-week AI consulting discoveries into 3-day Blueprint deliverables for NMO Partners. Phase 2 is a SaaS where clients run their own AI visioning and roadmaps inside the same engine. The same axis turns — the operator changes.

Phase 1 cycle
≤7 days
empty workspace → signed Blueprint
Phase 1 price anchor
$25k
single Blueprint deliverable
Phase 2 trigger
10+ pulls
clients asking for self-serve access
Build budget
6 weeks
VPS to first signed engagement

What Mihwar is, in one paragraph

Mihwar is the operating system for an AI consulting practice. In Phase 1 it is the private cockpit Ahmed and the NMO team use to run client engagements: a five-stage workflow that takes a vague client wish and produces a $25,000 Blueprint deliverable in a working week, grounded in a curated catalog of vendors, models, and patterns and in the client's own infrastructure inventory. In Phase 2 the same engine is exposed to clients directly so they can self-serve AI visioning and roadmaps inside their own organisations, paying NMO a subscription for the platform and the catalog. The Phase 1 codebase is built so the Phase 2 pivot is a deployment change, not a rewrite.

The promise

Without Mihwar

3 weeks · PowerPoint · dies in committee

  • Discovery drags: 14 unrecorded conversations, contradictions, sprawling notes.
  • Architecture is gut-call: senior architect picks tools from memory, no audit trail.
  • Deliverable is a 60-slide deck plus a Word doc — unsearchable, unverifiable, unforwardable.
  • Conversion to build: ≈15%. Most decks rot in a finance committee for 90 days.
With Mihwar

7 days · interactive HTML · ships

  • Discovery is structured, partially async, and refuses to advance until complete.
  • Architecture is grounded in the curated catalog and the client's actual stack — every recommendation traceable.
  • Deliverable is a single self-contained HTML file: bilingual, navigable, signed manifest, opens in any browser.
  • Target conversion to build: ≥30%. The CTO can forward the Blueprint, search inside it, and reason about it.

What V1 must be

  • Single-tenant in operation, multi-tenant in design. Tenancy lives at the data layer from day one so V3 isn't a rewrite.
  • Built primarily by Claude Code, in five carefully-scoped prompts. Ahmed reviews diffs and steers; the LLM writes most of the lines.
  • Boring stack. Postgres, Redis, FastAPI, Next.js, Hostinger VPS, Coolify. No Kubernetes, no microservices.
  • Bilingual EN/AR by default. KSA market. Arabic is not an afterthought.
  • Self-defended. Mihwar's own security and infrastructure get the same rigour we sell to clients.

What V1 is not

  • Not a SaaS yet. No public sign-up, no Stripe, no per-tenant theming. The data model is ready; the UI is not.
  • Not a build platform. Mihwar produces the Blueprint and the Playbook. Building is downstream — by NMO Apex's agent team, by a partner, or by the client themselves.
  • Not a generic "AI ChatGPT for consulting". Every recommendation is grounded in a curated catalog and an inventoried environment. House style is enforced. House voice is mandatory.

The four mental tests

Every implementation decision in this masterplan is checked against four questions. They run through every section that follows.

Scale

Would this still work at 100× current load?

Security

How could this be abused by a hostile actor?

Observability

Could this be investigated at 2am, six months from now?

Economics

Affordable at 100× usage? What's the cost per user per month?

A note on this document
This masterplan is the v2 — it preserves the v1 vision and substance, and adds: explicit Phase 1 / Phase 2 framing, an Organisation Infrastructure Profile section (so the same client doesn't re-enter their stack each engagement), a dedicated Mihwar Self-Security & Infrastructure section, an AI Economics section, an Observability & Logs Page section, an Operations Handbook, and a deeper Phase 2 product spec. Every prior section is kept and tightened.
Part A · Vision

The problem & the insight

Why AI consulting projects stall in KSA right now, and the single observation that turns Mihwar from "another workshop tool" into a defensible product.

The problem — three layers

Layer 1 · Discovery is slow and expensive

A typical AI use-case discovery in a KSA enterprise takes 3–6 weeks. Stakeholders are scattered across IT, business units, security, procurement, vendors. Information arrives in WhatsApp threads, email PDFs, three different SharePoint tenants and a printed spreadsheet from a DBA. The consultant spends 60% of the engagement chasing data dictionaries and license terms, not designing the system.

Layer 2 · Architecture decisions are gut-call

By the time the inventory is "good enough", the senior architect picks tools from memory. The recommendation is rarely written down beside the alternatives that were rejected. Six months later when the build runs into trouble, no one remembers why Snowflake was chosen over BigQuery — and there is no audit trail to consult.

Layer 3 · The deliverable is dead on arrival

Most engagements end with a 60-slide PowerPoint plus a Word document. CTOs forward them to a procurement committee, who can't navigate them, can't search them, can't share fragments without re-formatting, and can't verify whether the architecture has been validated against the actual environment. The artifact is dead on arrival.

The insight

The bottleneck in AI consulting is not the architecture. It's the discovery interview. The architecture step takes a senior architect a few days at most — pattern-match the use case, pick from the toolkit, run the numbers. What takes weeks is dragging information out of stakeholders. So the leverage is not "AI that designs systems" — it is "AI that runs the interview." — The thesis

The reframe

Mihwar is not a chatbot for architects. It is an interviewing instrument. Stage 1 sharpens the use case. Stage 2 conducts the inventory. Both stages structurally refuse to advance until the inputs to Stage 3 are complete. Stage 3 — architecture synthesis — is fast precisely because Stages 1 and 2 made it possible. Most AI consulting tools start at Stage 3 and skip the discovery; that is exactly why their outputs feel hallucinated.

Why this works in KSA, specifically

  • KSA enterprises have real budgets for AI right now and weak internal AI talent. They need consultants who move fast but stay rigorous.
  • The local consulting market is dominated by Big Four-style PowerPoint teams. A bilingual interactive HTML deliverable signed by an Arabic-fluent boutique is a credible differentiator.
  • PDPL, SAMA, NCA cybersecurity controls — these introduce architecture constraints that grounded recommendations must respect. Generic AI tools can't see those constraints; Mihwar's catalog encodes them.
Defensibility
The Mihwar workflow is copyable in 90 days. The Mihwar catalog — opinionated, KSA-localised, vendor-vetted, refreshed quarterly — is the moat. Every engagement adds rows. Every quarterly review prunes them. By month 12, the catalog is a deliverable in itself.
Part A · Vision

Market context

2026 is the loudest year in KSA AI consulting history. Mihwar's job is to be the most differentiated voice in the room — not the loudest.

The 2026 KSA AI landscape

Saudi Arabia has declared 2026 the Year of AI. Concretely:

  • SDAIA is funding AI capability programs across ministries and Vision-2030 entities. Tier-2 and Tier-3 government bodies (universities, regulators, regional admin) are being told to "have an AI strategy" by year-end.
  • Banks under SAMA are running parallel AI initiatives — fraud, KYC summarisation, contact centre — under increasing regulatory scrutiny.
  • Mid-market enterprises (logistics, retail, healthcare networks, family offices) are watching the giants and want a credible mid-budget option for AI exploration.
  • The Big Four are quoting 8–14 weeks and $200k+ for AI strategy decks. Most clients can't afford that or won't.

The competitive shape

CompetitorStrengthWeakness Mihwar exploits
Big Four (Deloitte, EY, PwC, KPMG)Brand, regulatory comfort, large delivery teamsSlow, expensive, generic decks, junior delivery on senior pitch
BCG / Bain / McKinseyStrategy chops, board-level access$300k+ floor, no implementation grounding, no KSA-localised vendor view
Local SI consultanciesRelationships, ministry pre-qualsBody-shop economics, no productised IP, no AI-specific differentiation
Boutique AI shops (regional / overseas)Technical depthNo Arabic delivery, no PDPL fluency, no in-region presence
"AI strategy" SaaS toolsCheap, fastGeneric catalog, not grounded in client's actual stack, no consultant orchestration

The wedge

Mihwar plus NMO occupies a specific gap: a senior, KSA-fluent consultant team backed by a productised workflow that produces a verifiable, interactive deliverable in 7 days for a $25k anchor price. No Big Four competes there because their cost structure forbids it. No SaaS competes there because they have no senior consultant. No body-shop competes there because they have no productised IP.

Two-tier client thesis

Government clients

Vision-2030 entities, ministries, regulators.

  • Want: defensible architecture, PDPL/NCA compliance, bilingual deliverable for ministerial review.
  • Pain: Big Four cost, slow delivery, decks that don't survive procurement scrutiny.
  • Mihwar fit: Tier 2 (Blueprint + Playbook + RFP spec) at $40–60k. Stage 4 outputs an RFP-ready spec they can put to public tender.

Mid-market clients

Banks, logistics, retail, healthcare, family offices.

  • Want: a fast, defensible AI strategy that doesn't need a $300k commitment to start.
  • Pain: CFO won't sign $200k for a deck. CTO won't trust a $5k SaaS tool.
  • Mihwar fit: Tier 1 ($15–30k Blueprint), conversion to Tier 3 build later.

Phase 2 market

When Mihwar opens to clients directly (Phase 2), the addressable market widens substantially: every mid-market enterprise that does not need a consultant in the room but does need a structured visioning process becomes a buyer. Pricing shifts from engagement-based to per-seat or per-Blueprint subscription. NMO captures consultancies as a meta-tier — small AI shops who license the Mihwar engine and the catalog and use it inside their own client engagements.

Risk acknowledged
The KSA market is rapidly consolidating around 4–5 large preferred suppliers. Mihwar's window is the next 18 months. After that, the ground hardens. Every milestone in this masterplan is calibrated to that window.
Part A · Vision

Positioning & pricing

How Mihwar is sold, what it costs the client, and how it makes NMO defensibly profitable.

The three-tier pricing model

TierDeliverablePriceCycleMargin
Tier 1 · Blueprint Bilingual interactive HTML Blueprint signed off by client CTO. One 90-min walkthrough. $15–30k 1–2 weeks ≥75%
Tier 2 · Blueprint + Playbook Tier 1 plus 6-week build plan, risk register, vendor short-list, RFP-ready spec. $30–60k 2–3 weeks ≥65%
Tier 3 · End-to-end engagement Tier 2 plus orchestrated build (NMO Apex agents or partner squad). $120k+ 3–9 months 30–50% on build portion

The pricing anchor

Always quote the Blueprint price first. Even when a client wants a full build, the conversation starts with: "The first deliverable is the Blueprint. It's $25,000 and it takes us about a week. Once you have it, you'll decide what to build, when, and with whom — including potentially us." — Sales discipline

This anchors the value of discovery, separates it from build risk, and makes Tier 1 feel reasonable. Never quote a Tier 3 price first. It triggers procurement scrutiny that the engagement isn't sized for.

Phase 2 pricing (preview)

When Mihwar becomes a SaaS, pricing shifts. The Blueprint becomes a unit of work the customer self-produces; NMO charges for access to the engine and the catalog.

Phase 2 planAudiencePrice targetWhat's included
StarterSingle AI champion at a mid-market enterprise$1,200/mo or $9,600/yr1 workspace, 3 Blueprints/yr, premium catalog read-only, EN/AR
Team5-seat AI office$3,500/mo5 workspaces, unlimited Blueprints, custom branding, SSO
ConsultancyBoutique AI shops licensing Mihwar for their clients$25k/yr + per-BlueprintWhite-label, multi-client workspaces, customer-private catalog tier, NMO catalog as premium
EnterpriseLarge org with strict residency / SSO needsCustomDedicated tenant in-region, BYO IDP, audit export, contractual residency

Margin discipline

  • Phase 1: The Blueprint price minus the Anthropic/embeddings cost minus 1.5 days of senior consultant time must clear 65% margin. If a Blueprint burns more than 1.5 days of consultant attention, the workflow has failed.
  • Phase 2: Per-Blueprint AI cost must stay below $40 at the 95th percentile via prompt caching, two-tier model selection, and batch-API for non-realtime steps. See AI Economics.
House style
Every Mihwar deliverable carries a recognisable visual and verbal signature. Tight typography. No lorem-ipsum tone. Decisions named, not hedged. Trade-offs explicit. The deliverables look like $30k consulting artifacts, not "an AI generated this."
Part A · Vision

Personas

Mihwar serves four distinct user roles. The product treats each one differently. Phase 1 is built for the first three; Phase 2 adds the fourth.

Persona 1 · Ahmed (Founding Consultant) — the driver

Role
Founder of NMO Partners. Senior technologist. Runs every engagement personally in V1.
Touches the tool
Daily during active engagements. Power user.
Jobs to be done
Move clients from "we want AI" to a signed Blueprint in <7 working days. Maintain consistency across engagements. Build catalog IP.
Jobs avoided
Slow tooling. Generic AI voice. Forced sequence when context demands flexibility. Lost work.
Phase 1 access
Full admin. Phase 2: meta-admin role across all client tenants.

Persona 2 · NMO Consultant (future, Month 4+) — the growing team

Role
First or second consultant Ahmed hires as engagement volume grows.
Jobs to be done
Run engagements without Ahmed in the room, get the same Blueprint quality, learn from prior engagements (pattern reuse), hand off to Ahmed for review at clear checkpoints.
Implications for V1
RBAC is deferred to V2, but the data model supports per-user audit from day one. Every workspace action records the consultant who took it.

Persona 3 · Client CTO / Head of AI — the audience

Role
Senior technology leader at the client. Receives the Blueprint as the engagement deliverable.
Touches the tool
Indirectly: opens the Blueprint HTML, possibly fills async forms during Stage 2, attends the 90-min walkthrough.
Jobs to be done
Get a defensible, board-ready AI strategy. Know it is grounded in his/her actual stack. Be able to forward and search it.
Frustrations to avoid
Decks that look generic. Recommendations that ignore PDPL or KSA presence. Inability to verify source of a claim.

Persona 4 · Phase 2 self-serve client — the future buyer Phase 2

Role
Mid-market AI champion or in-house digital lead. Uses Mihwar without an NMO consultant in the room.
Touches the tool
Self-serve workspace, runs Stages 1–3 themselves, optionally pays NMO for a 2-hour expert review before Stage 4.
Jobs to be done
Get a board-ready AI roadmap. Reuse the Org Profile across multiple use cases. Export to Confluence / SharePoint / PDF.
Implications for design
The interview surfaces must work without a senior consultant interpreting them. Tooltips, examples, "what good looks like" hints throughout. Stage 2 must self-validate.
Phase 1 readiness
Already accounted for: the data model is multi-tenant, the workflow is structured, the catalog tier system is designed. Phase 2 is a deployment + UI polish, not a rewrite.
Why four personas matter for V1
Even though Phase 2 is months away, every V1 design decision is checked against all four. A surface that only Ahmed could love will not survive the pivot. A schema that only V1 needs will require a rewrite. Designing for the audience now costs little; designing for them later costs months.
Part A · Vision · New

The two-phase strategy

One codebase, two operating modes. Phase 1 is the consulting cockpit operated by NMO. Phase 2 is the self-serve platform operated by clients. The same engine drives both — the difference is who holds the steering wheel.

Why two phases, in this order

  • Phase 1 first earns the right to Phase 2. Without ten signed engagements producing real Blueprints, Mihwar has no proof, no testimonials, no catalog moat, and no understanding of where the workflow actually breaks for a non-expert user.
  • Phase 1 funds Phase 2. Each $25k Blueprint at 75% margin contributes $18k toward the SaaS lift. Five engagements pay for the Phase 2 build wholesale.
  • Phase 1 stress-tests every Phase 2 surface. Every UX paper-cut Ahmed hits with a real client is a paper-cut a self-serve user would have hit harder. Phase 1 is a moving usability test.
  • Phase 2 too early kills the consulting margin. Self-serve at $1,200/mo dilutes the perceived value of the $25k engagement. Phase 2 launches when Phase 1 is sold out, not before.

What Phase 1 builds that Phase 2 inherits

CapabilityPhase 1 usePhase 2 inheritance
Five-stage workflowNMO consultant runs itSelf-serve user runs it with embedded coaching
CatalogNMO's curated knowledge basePremium tier (NMO) + customer-private tier
Blueprint format$25k deliverableSelf-produced artifact
Multi-tenant data layer (RLS)One tenant: NMOMany tenants: subscribers
Org Infrastructure ProfileCaptured per engagement, reused on repeatCaptured per organisation, drives every Blueprint they make
aiproxy + AI economics disciplineCost control across few engagementsCost discipline at scale; per-tenant budget caps
Audit logPer-user actions for NMO teamCompliance trail for regulated subscribers

What Phase 2 adds on top of Phase 1

  • Identity provider integration: OIDC, SAML, Microsoft Entra, Google Workspace.
  • Self-serve onboarding: sign-up flow, email verification, workspace creation wizard, first-Blueprint guide.
  • Embedded coaching: the AI plays "consultant in the room" for users without one. Higher tooltips density, "show me an example" affordances, sample answers from the catalog.
  • Per-tenant customisation: theme, logo, custom blueprint cover page, branded export.
  • Billing & metering: Stripe, per-seat or per-Blueprint counters, hard caps, soft alerts.
  • Catalog tiering: NMO premium catalog (read-only, paid), customer-private catalog (writeable, scoped to that tenant).
  • Public Mihwar website & pricing page.

The phase-pivot triggers

Phase 2 development starts when any one of the following becomes true:

  • Demand pull: ≥10 distinct prospects ask "can we get a Mihwar login" within any 6-month window.
  • Capacity ceiling: NMO's consultant team is fully booked, pipeline is stronger than capacity, and adding consultants doesn't scale margin.
  • Catalog moat is mature: ≥300 entries with quarterly review cycles. Catalog itself is now a deliverable.
Until then
V1 stays disciplined. Phase 2 features only land in the codebase if they cost nothing to add now — schema columns, tenant scoping, request-id propagation, scrubbing middleware. UI for Phase 2 is built when the trigger fires, not before.
Part B · Product

The five-stage workflow

Mihwar's core mechanic. Five sequential stages, each producing a versioned artifact, each unlocking the next. The Architecture Gate between Stages 2 and 3 is the rule that earns Mihwar its existence.

The complete workflow

StageModeDurationOutputAI model
1 · Ideation LabLive workshop · Socratic AI60–90 minSharpened use case (1-pager)Claude Sonnet
2 · DiscoveryHybrid live + async forms2–3 elapsed daysInfrastructure inventoryHaiku for filtering · Sonnet for synthesis
⚑ Architecture Gate · Stage 3 locked until Stage 2 is signed off
3 · ArchitectureAI synthesis · consultant edits~1 dayUse Case BlueprintSonnet (extended thinking)
4 · PlaybookOptional · Tier 2+ only~1 dayBuild plan · risks · vendors · RFP specSonnet
5 · HandoffCompile · present · export90-min walkthroughFinal HTML Blueprint deliverable

The stage mechanics

Each stage is a panel in the Mihwar UI with three sub-panels:

  • The interview — a chat-like surface where the consultant works through prompts. The AI asks follow-ups; the consultant types, edits, or pastes.
  • The artifact — the structured output of the stage, continuously updated as the interview progresses. The consultant sees it forming.
  • The signoff — a single button at the bottom: "I am satisfied this stage is complete." Clicking it freezes the artifact, creates an immutable version row, and unlocks the next stage. The consultant always controls signoff — never the AI.

Why "the gate" matters

The Architecture Gate is the most important rule in Mihwar. Every other AI consulting tool will happily synthesise architecture from incomplete inputs, because the AI doesn't care. Mihwar refuses. The gate is what makes the deliverable trustworthy. — Architecture discipline

Concretely: when the consultant tries to advance to Stage 3, the system checks Stage 2 completeness against the use case category. Missing critical fields ("nobody has told us where the data is") block advance with a specific, actionable list. The consultant cannot bypass this from the UI; they would have to edit the database directly to override.

Stages can iterate
The flow is sequential, but versioned. If Stage 3 reveals a missing data point, the consultant can re-open Stage 2, capture it, re-sign off, and Stage 3 re-synthesises with the new context. The audit log records every back-and-forth.
Part B · Product · Stage 1

Stage 1 — The Ideation Lab

A 60–90 minute live conversation that turns a vague client wish ("we want to use AI in our call centre") into a sharp, scoped use case with measurable success criteria. Socratic AI interrogates ambiguity until consultant and client agree on what they're actually building.

When this stage runs

Typically the first or second meeting with a new client. The CTO has expressed interest, may have a fuzzy idea of what they want, and needs the consultant to help them sharpen it. The Lab can also be skipped if the client arrives with a fully-scoped use case (rare) — they get a discount for not needing it.

The six sharpening questions

Mihwar runs the Lab through six question phases. The AI generates the specific questions in context, but they always probe these dimensions:

#DimensionThe question behind the question
1The painWhat specific operational pain are we removing? Not "improving efficiency" — "reducing first-call resolution time from 14 minutes to under 6 minutes."
2The userWho is the human in the loop? Internal employee? External customer? Regulated principal?
3The current stateHow is this done today? With what tools, by whom, at what cost? Sketch the unhappy path.
4The success metricIf we did this perfectly, what number moves and by how much? Who measures it?
5The blast radiusWhat happens if the AI is wrong 5% of the time? 20%? Tolerable / catastrophic?
6The first-mile constraintsWho has the data? Who has the budget? Who must approve?

The AI's behaviour

The Lab uses Claude Sonnet (latest) with a system prompt that turns it into a Socratic interviewer. Behaviour rules:

  • Asks one question at a time. Never bundle two probes.
  • Reflects what it heard before moving on. Confirms understanding in the consultant's words.
  • Surfaces contradictions politely. "Earlier you said X. This sounds like Y. Which one is the real one?"
  • Refuses to recommend solutions. Stage 1 is about the problem, not the answer. If the consultant tries to leap forward, the AI parks the answer for later.
  • Honours house style. No exclamation marks. No "I'd love to help!" No emoji. Direct, professional, KSA-appropriate.

The artifact: the 1-pager

As the conversation progresses, the artifact panel renders a structured Use Case 1-pager:

USE CASE: [name]
PAIN:    [one sentence]
USER:    [persona, role, jurisdiction]
TODAY:   [current process, cost, owner]
TARGET:  [metric, baseline, goal, by when]
BLAST:   [tolerable failure modes, intolerable failure modes]
INPUTS:  [what data is needed, who owns it]
DECISION-OWNER: [who signs off the build]
OUT-OF-SCOPE: [explicit non-goals]

The signoff

When the consultant is satisfied, they hit "Sign off Stage 1". The 1-pager is frozen as v1. If they re-open later, edits create v2, v3, etc. — never overwrite. This becomes the input to Stage 2's question-set tailoring.

Phase 2 considerations Phase 2

For self-serve Phase 2 users, Stage 1 needs more scaffolding: example 1-pagers from the catalog ("see how a contact-centre AI was scoped"), inline tooltips that explain each dimension, and a "show me a strong answer" affordance on each prompt. The schema doesn't change — just the surface.

Cost note Cost
Stage 1 averages ~30 turns. With prompt caching on the system prompt + house style + catalog examples, marginal cost per turn is dominated by output tokens. Expected per-Lab spend: $1.20–$2.50 at $3/Mtok input, $15/Mtok output.
Part B · Product · Stage 2

Stage 2 — Discovery

The infrastructure inventory. The most labour-intensive stage and the one most clients hate. Mihwar's job is to make it bearable, structured, partially async — and to refuse to advance until it's actually complete.

Why this stage is the bottleneck

In a traditional engagement, Stage 2 takes 3–6 weeks. It's where consultants chase stakeholders for data dictionaries, screenshots of dashboards, license confirmations, GPU specs. It's where projects stall.

Mihwar compresses to 2–3 elapsed days by:

  • Generating only the questions that matter. The AI uses the Stage 1 1-pager to filter the discovery taxonomy down to ~30 questions out of a possible ~150.
  • Splitting live and async. Questions the consultant can answer live; questions that need a DBA or vendor contract get sent as a structured form to the right person via a single-use, time-limited link.
  • Auto-detecting completeness. The AI tells the consultant exactly which questions are still blocking architecture synthesis.
  • Reusing the Org Profile. If the client has done a previous engagement (or is a Phase 2 user), most infrastructure questions are already answered. See Org Infra Profile.

The discovery taxonomy

DomainWhat we capture
Data sourcesWarehouses (Teradata, Snowflake, BigQuery), lakes (S3, ADLS), operational DBs, file shares, SaaS APIs, Excel sprawl. License terms. Volume. Freshness. Owner.
ComputeCloud accounts, on-prem servers, GPU clusters, Kubernetes, VPS providers, edge devices. Capacity. Region. Procurement model.
Identity & accessIDP (Entra, Okta, custom), SSO state, MFA coverage, service-account hygiene, secret stores.
Network & perimeterVPN, ZTNA, private endpoints, egress controls, region restrictions, SAMA / NCA controls applicable.
Existing AI/MLModels in production, vendors used, licensing, evaluation discipline, MLOps maturity.
CompliancePDPL, SAMA, NCA ECC, sector-specific (healthcare, education). Data classification scheme.
PeopleSponsors, decision owners, champions, blockers. Skill availability.
Budget & procurementApproved spend envelope. Procurement vehicle (direct, RFP, framework). Vendor preferences.
ConstraintsResidency, on-prem mandates, vendor exclusions, contractual SLA shape, audit cadence.

Async forms — how they work

For each async question, Mihwar generates a single-use form link, scoped to the question, time-limited (default 7 days), bound to the recipient's email and IP-logged. The link looks like:

https://mihwar.nmopartners.com/async/01HV7Z9K3J5XPQ8WMY4N6T2RES

Recipients land on a clean, branded page with one or two questions, an "I don't know — ask X" escape, and a submit button. No login required. Submissions stream back into the consultant's Stage 2 panel.

Security flag Security
Async-link tokens are cryptographically random ≥128-bit (ULIDs server-generated via secrets.token_urlsafe(16) not uuid4 when used for auth). Single-use: marked consumed on first valid submission. Time-bound: hard expiry at 7 days, rejected at the API layer. IP-logged for audit. Form pages return generic errors on invalid/expired tokens, never leak whether the token existed. See Client Security & PDPL.

The completeness check

The AI maintains a running gate-check: which Stage 3 architecture decisions can be made given current Stage 2 inputs? The consultant sees this as a live readiness meter, with the specific blocking questions named:

Stage 3 readiness: 76% · 4 questions remain blocking
✓ Data residency captured
✓ Identity provider captured
✗ GPU availability — pending response from CloudOps (sent 3 days ago)
✗ PDPL classification of customer voice transcripts — pending Legal
✗ SAMA AI governance applicability — async link expired, resend?
✗ Production traffic peak — async link sent today

Phase 2 considerations Phase 2

Self-serve users don't have a consultant orchestrating Stage 2. Mihwar must:

  • Pre-populate from the persistent Org Profile wherever it overlaps.
  • Suggest who to ask for each blocker ("typically your DBA can answer this — here's a template message").
  • Allow inviting collaborators into the workspace to answer their slice directly.
Part B · Product · Stage 3

Stage 3 — Architecture synthesis

The AI proposes a complete reference architecture for the use case, grounded entirely in the client's actual infrastructure (Stage 2) and the curated catalog. The consultant reviews, edits, and signs off the result.

What "grounded" means

The AI is given:

  • The Stage 1 1-pager (sharp use case).
  • The Stage 2 inventory (what they have).
  • The full catalog, RAG-retrieved (vendors, models, frameworks, patterns, constraints).
  • The house style guide and banned-phrases list.

The AI is forbidden from:

  • Recommending a vendor not in the catalog.
  • Recommending a vendor without KSA presence when the inventory says residency is required.
  • Recommending a tool the inventory shows the client doesn't have a license for, without flagging procurement implications.
  • Inventing pricing.

The six architecture outputs

  1. Layered architecture diagram — auto-generated SVG using the 10-layer model from the AI Ecosystem Primer, populated with chosen tools.
  2. Component manifest — table of every component, its role, its catalog reference, the rationale.
  3. Data flow diagram — auto-generated, showing how data moves from sources to user-visible outputs and back.
  4. Trade-offs & alternatives — what was considered and rejected, and why. With explicit catalog references.
  5. Open questions — anything the AI flagged as needing human judgement before build commits.
  6. Compliance overlay — a separate read of the architecture against PDPL, SAMA, NCA controls, depending on what Stage 2 captured.

How synthesis actually runs

Synthesis is asynchronous. The consultant clicks "Generate Architecture v1"; the request lands in a background queue (BullMQ-equivalent on Redis). A worker:

  1. Loads the 1-pager, the inventory, the relevant catalog slice (RAG: top-K embeddings).
  2. Runs Sonnet with extended thinking enabled, system prompt cached via cache_control.
  3. Streams progress to the consultant's UI via SSE.
  4. Persists the result as stage_artifacts v1.
  5. Auto-renders the SVG diagrams from the structured component manifest.

Total elapsed time: typically 60–120 seconds. The consultant sees a "thinking…" beam during synthesis and reads the result when it lands.

Scale flag Scale
Synthesis is the heaviest call in Mihwar (10–30k output tokens with extended thinking). Running it inline in the request thread would block the API for two minutes. Background queue + SSE keeps the UI responsive and lets us retry on transient failures without losing user work. See System Architecture.

Editing the result

The consultant can:

  • Edit any field directly (textarea / structured editors per output).
  • Ask the AI to "regenerate just the trade-offs section with these adjustments."
  • Override a vendor recommendation and have the AI re-rationalise.
  • Accept and freeze the result, creating Stage 3 v1.
Cost flag Cost
One Stage 3 synthesis run costs ~$3–8 depending on extended-thinking depth. Prompt caching of the catalog snapshot (~50k tokens at hit rate >80%) dominates the savings. See AI Economics.
Part B · Product · Stage 4

Stage 4 — The Build Playbook

Optional but high-margin. Adds detailed build planning, risk register, vendor short-list, and reference repositories. Sold as Tier 2+ pricing. The Playbook is what a buy-side procurement officer actually reads.

Why this stage is optional

Many engagements stop at Stage 3 — the client signs off the Blueprint, takes it to their finance committee, comes back later for the build. Mihwar respects that — Stage 4 is opt-in and adds days, not hours.

When clients do want Stage 4, they're typically committed to building and need the planning rigour. They're paying $30–60k for the Blueprint+Playbook combo and they expect a deliverable they can hand to a build team.

The five Playbook outputs

  1. 6-Week Build Plan — week-by-week milestones, dependencies, owner per task. Conservative estimates.
  2. Risk Register — every risk identified during discovery and architecture, with severity, mitigation, owner.
  3. Vendor & Tooling Short-List — for every component, 1–3 specific vendors with KSA presence, pricing model, contact, last-reviewed date.
  4. Reference Repositories — pointers to NMO Apex's accumulated build patterns: starter Helm charts, FastAPI templates, Next.js shells. (Tier 3 only — IP is gated.)
  5. RFP Specification — for government clients: a procurement-ready scope of work, evaluation criteria, and acceptance test plan. Optional add-on.

The Risk Register format

RiskLikelihoodImpactMitigationOwner
Anthropic API quota tightened mid-buildMedHighMulti-region key, fallback to second model familyNMO platform lead
Customer voice transcripts contain PHI under MoH classificationHighHighPre-classify sample, redact pipeline before LLM, legal sign-off Week 1Client legal + NMO

The RFP spec

For government engagements, the RFP spec is the keystone. It mirrors the Blueprint structurally but reformats it as a procurement document: scope of work, deliverables, milestones, acceptance criteria, evaluation matrix, security clauses (NCA-ECC, PDPL), and pre-qualified vendor categories. The client's procurement team can lift it into their tender platform with minimal editing.

Margin discipline
Stage 4 is high-margin only if Stage 1–3 outputs are clean. If the consultant has to rebuild Stage 2 inventory in Stage 4 to get Vendor short-list right, the gate has been violated upstream.
Part B · Product · Stage 5

Stage 5 — Handoff

The final stage. Compiles the Blueprint, generates the proposal/scope document if relevant, and exports the deliverable.

What gets delivered

  • The Blueprint HTML — single self-contained file. Bilingual EN/AR. Mihwar branding minimal; NMO + client logos prominent. Opens in any browser, works offline, prints respectably.
  • Manifest & signature — embedded JSON manifest with version, signoffs, catalog snapshot hash, generation timestamp. Cryptographically signed.
  • Source pack (optional) — for clients on Tier 2+, a zip of the structured artifacts (1-pager, inventory, architecture, playbook) as JSON, for downstream tooling.
  • Walkthrough recording — Tier 1+ engagements include a 90-minute walkthrough. With consent, the recording becomes a deliverable too.

Presentation mode

Mihwar supports a presentation mode — full-screen, larger fonts, navigable page-by-page. The consultant shares the Blueprint screen, walks the client CTO through each section, answers questions, captures any final adjustments. Adjustments create a v(n+1) without invalidating the original signed manifest.

After handoff

The workspace doesn't disappear. Mihwar retains it indefinitely (subject to data retention policy). NMO can:

  • Reopen later to update the Blueprint if the client requests changes.
  • Reference patterns in future engagements (with consent and anonymisation).
  • Track which Blueprints converted to builds — feeds the catalog's "used in N past engagements" field.
  • Run quarterly catalog reviews against the body of past Blueprints to catch drift.
Retention & deletion Security
Default retention is indefinite for active clients. On client request (PDPL right of erasure) or at end of business relationship, the workspace can be hard-deleted with audit-logged confirmation. Catalog patterns derived from a deleted workspace stay in the catalog only if the rationale is structurally anonymised (no client name, no specific volumes). See Client Security & PDPL.
Part B · Product

The Blueprint format

The Blueprint is the deliverable. Everything in Mihwar exists to produce it. This page specifies exactly what it looks like, how it's structured, and why each design choice matters.

Five hard requirements

  1. Self-contained. Single HTML file, no external dependencies (one optional Google-Fonts link). Opens offline. Works on any device, any year.
  2. Bilingual. EN by default, AR view available with full RTL. Translation is consultant-reviewed, not raw machine output.
  3. Navigable. Sidebar TOC, in-page anchors, search box (Ctrl+K). The CTO must be able to find any claim in <10 seconds.
  4. Branded. Client logo, NMO logo, project name, version, document classification, date. Looks like a $30k document, not a generated artifact.
  5. Verifiable. A signed manifest and version stamp prove provenance. Reader can hash-check.

The Blueprint structure

§SectionContent
0CoverClient logo, project name, date, NMO logo, version, document classification
1Executive SummaryOne-page overview. The CFO reads only this.
2Use Case DefinitionThe Stage 1 1-pager, formatted
3Current StateStage 2 inventory, summarised — what they have today
4Proposed ArchitectureDiagram, component manifest, rationale
5Data & Agent FlowHow information and decisions move through the system
6Trade-Offs & AlternativesWhat we considered and rejected, and why
7Compliance & RiskPDPL / SAMA / NCA reading. Risk register summary.
8Build Playbook (Tier 2+)Plan, vendors, RFP spec
9GlossaryPlain-language definitions of every acronym used
AManifestVersions, signoffs, catalog hash, signature

Why HTML, not PDF

  • Searchable. Ctrl+F works inside the browser. PDFs garble across reader software.
  • Linkable. Internal anchors. The CTO can email a link to "§4.2 Vector Store choice" rather than "look at page 23 of the attached".
  • Copyable. Tables paste into Confluence. Code blocks copy clean. PDFs are write-only.
  • Live diagrams. SVG architecture diagrams scale on retina; PDF diagrams pixelate.
  • Forward-shippable. A 1.2MB HTML file forwards cleanly. A 40MB PDF gets stripped by mail filters.

The manifest

{
  "blueprint_id": "01HV8XQGT7K5R2W3M9N6P8Y4ZS",
  "client": "Tadawul",
  "project": "Customer Voice AI",
  "version": "1.0",
  "generated_at": "2026-05-07T14:32:18.420Z",
  "engagement_id": "eng-0042",
  "tenant_id": "nmo-001",
  "stages": {
    "stage_1": {"version": 3, "signed_off_by": "ahmed@nmopartners.com",
                "signed_at": "2026-05-03T10:14:00Z"},
    "stage_2": {"version": 5, "signed_off_at": "2026-05-05T16:22:00Z"},
    "stage_3": {"version": 2, "signed_off_at": "2026-05-06T09:08:00Z"}
  },
  "catalog_snapshot_hash": "sha256:7c4b8d…",
  "signing_key_id": "mihwar-prod-2026-05",
  "signature": "ed25519:0x4f2a1c9b…"
}
Why a manifest?
Six months from now, when a procurement officer asks "is this the same Blueprint we received in May?" — the manifest answers in 5 seconds. The Ed25519 signature also lets clients verify cryptographically that the Blueprint hasn't been tampered with after signoff.
Part B · Product

The Catalog

Mihwar's grounding source. A curated, opinionated reference of vendors, models, frameworks, and patterns — maintained by NMO, used by the AI for every recommendation.

Why a catalog

If the AI is allowed to recommend any vendor based on its training data, three things go wrong:

  • Hallucinations. It recommends vendors that don't exist in KSA, or quotes pricing from 2023.
  • Inconsistency. Two engagements get different recommendations for the same problem.
  • Loss of differentiation. NMO's Blueprint reads like every other consultancy's because it's drawn from the same public training data.

The catalog solves all three. It's NMO's opinionated knowledge base, evolving with every engagement.

The 10-layer architecture atlas

The catalog is organised around a 10-layer reference architecture covering the full AI stack. Every catalog entry attaches to one or more layers. The same atlas is used in Stage 3's auto-generated diagram.

Catalog schema

EntityFields
Vendorname, layers (1–10), region availability, KSA presence (none / partner / direct), pricing model, NMO opinion (rating 1–5 + notes), known limits, partner contacts, last reviewed
Modelname, family, provider, context window, cost/M input, cost/M output, languages (incl. AR strength), strengths, weaknesses, NMO opinion
Frameworkname, layer, license, language, maturity, NMO opinion, when-to-use, when-to-avoid
Patternname, problem solved, components used, reference repo, used in N past engagements, success notes
Constrainttype (PDPL, SAMA, NCA, on-prem-only, etc.), description, implication for architecture
Questiondomain, question text (EN + AR), category, async/live default, depends-on Stage-1 fields, gating Stage-3 decisions

Catalog seeding (V1)

The catalog is seeded from two sources:

  • The AI Ecosystem Primer (NMO's existing reference doc) — every vendor, model, and framework listed there imports as a catalog entry.
  • Pattern templates derived from NMO Apex's prior builds — contact-centre AI, document Q&A, fraud screening, KSA/AR-language tasks.

Seed target for V1: 80–120 vendors, 30 models, 20 frameworks, 12 patterns, 30 constraints, 150 questions. Within 6 months of operation: 200+ entries, quarterly review cycle.

Catalog tiers (Phase 2 preview) Phase 2

  • Public tier — minimal entries, good for the public Mihwar marketing site.
  • Premium tier (NMO) — the full opinionated catalog. Phase 2 subscribers get read-only access, NMO writes.
  • Customer-private tier — each Phase 2 tenant can add their own private entries (preferred internal vendors, contractual exceptions, data-classification overrides). Never visible to other tenants.
The catalog is the moat
The Mihwar workflow is copyable in 90 days. The Mihwar catalog — opinionated, KSA-localised, vendor-vetted, refreshed quarterly — is the moat. Every engagement adds rows. Every quarterly review prunes them.
Part B · Product · New

The Organisation Infrastructure Profile

Stage 2 asks 30+ questions about the client's stack. Most of those answers don't change between engagements with the same client. The Org Profile captures them once, persists them at the organisation level, and pre-populates every future Blueprint — both inside Phase 1 and across Phase 2.

The problem this solves

Today, when NMO does a second engagement with a returning client, the consultant manually re-enters 80% of Stage 2 — same Snowflake, same Entra tenant, same SAMA-registered subsidiary, same procurement rules. The client wonders why they're answering the same questions twice. Phase 2 makes this unbearable: a self-serve user shouldn't face a 30-question infrastructure quiz on every Blueprint they generate.

What the Org Profile captures

The Org Profile is a structured, versioned document attached to a tenant (Phase 2) or to a client entity within NMO's tenant (Phase 1). It mirrors the Stage 2 taxonomy:

SectionExamplesUpdate cadence
Identity & tenantLegal entity, sector, regulator(s), HQ region, employee count, AR/EN preferenceAnnual or on change
Data platformWarehouses, lakes, ETL tooling, BI tools, classification schemePer use case unless changed
Compute & cloudCloud accounts, regions, Kubernetes, GPU access, on-prem footprintQuarterly
Identity & securityIDP, MFA coverage, ZTNA, secret stores, SOC, incident response shapeAnnual
Compliance posturePDPL applicable, SAMA registered, NCA-ECC tier, sector controls (MoH, MoE)Annual or on regulatory change
ProcurementApproved vendor list, RFP framework, procurement vehicle, budget cycleAnnual
AI maturityModels in production, MLOps state, AI champion, governance committeePer use case
ConstraintsData residency mandates, vendor exclusions, on-prem-only systems, sovereign-cloud requirementAnnual or on change

How the Profile feeds discovery

Why this is a separate concept from Stage 2

  • Different lifecycle. Stage 2 inventory is engagement-bound and frozen with the Blueprint. The Org Profile lives across engagements and is updated in place.
  • Different ownership. Stage 2 is owned by the consultant. The Org Profile is owned by the client (Phase 2) or by the consultant on behalf of the client (Phase 1).
  • Different write surface. Stage 2 is a workflow. The Org Profile is a settings page.
  • Different security needs. Org Profile contains the sensitive long-form picture of the organisation — encrypt at rest at field level, see Mihwar's Own Security.

The "delta" experience

When a returning client starts a new engagement:

  1. Mihwar loads the Org Profile, marks it as the baseline for Stage 2.
  2. The AI scans the Stage 1 1-pager, identifies which Stage 2 questions are still unanswered or stale for this specific use case (e.g. a contact-centre use case might trigger questions about voice telephony platforms not relevant to a previous fraud-screening engagement).
  3. The consultant only sees the delta: ~5–8 use-case-specific questions instead of 30.
  4. On signoff, any edits flow back into the Org Profile (with an "this updates the profile?" confirmation), versioning the profile too.

Versioning

Org Profile is versioned in the same pattern as stage_artifacts: every meaningful update creates an immutable version row with author + timestamp. Stage 2 inventories link to the specific Profile version they were derived from — so re-reading a Blueprint a year later shows what the world looked like then, not now.

Security & privacy

Org Profile is sensitive Security
The Profile contains the most sensitive picture of the client: cloud accounts, security tooling, compliance posture, vendor names. Treat it as crown-jewel data. Encrypt at field level with per-tenant DEKs (KMS-wrapped). Restrict export. Audit every read. See Mihwar's Own Security.

Phase 2 fit Phase 2

The Org Profile is the central artifact of Phase 2. A self-serve user fills it once at onboarding (with a guided wizard), then every Blueprint they generate inherits it. Profile review becomes an annual event — pushed by Mihwar with email reminders. Without the Profile concept, Phase 2 is unusable; with it, the second Blueprint a customer generates feels effortless.

Part C · Architecture

System architecture

Five containers on a single VPS, joined to a private Docker network. Egress to Anthropic strictly through the aiproxy. Boring, well-understood, swap-safe.

The six containers

ContainerRolePortNotes
mihwar-webNext.js 14 frontend (SSR + SSE)3000 internalRenders the workspace UI & the Blueprint viewer
mihwar-apiFastAPI backend8000 internalAuth, data, async-link issuing, all business logic
mihwar-workerarq async workerStage 3 synthesis, embeddings, scheduled jobs
mihwar-aiproxyLiteLLM gateway4000 internalSingle egress to Anthropic + Voyage; cost meter; cache
mihwar-redisRedis 76380 internalQueue, session store, rate-limit counters, response cache
mihwar-postgresPostgres 16 + pgvector5435 internalPersistent state. Nightly backup. RLS enforced.

All containers run on a private Docker network mihwar_net. Only Caddy (managed by Coolify) is exposed to the public internet on 80/443. Postgres and Redis ports are never bound to host or to the public Apex network.

Request flow — typical write path

  1. Browser → Caddy (TLS, HSTS, CSP applied) → mihwar-web.
  2. Server component issues fetch to mihwar-api with the user's session cookie.
  3. API authenticates the session, attaches verified identity to the request context (contextvars), generates or accepts request_id.
  4. API authorises the action against tenant + workspace ownership.
  5. Long path: API enqueues a job on Redis (with request_id + user_id + tenant_id in payload) and returns 202 + job ID. Worker consumes, calls aiproxy, persists result.
  6. Short path: API writes to Postgres directly, returns 200.
  7. Streaming path (Stage 1 chat, Stage 3 progress): API holds an SSE connection, relays tokens from aiproxy as they arrive.

Single egress — why aiproxy

  • One key in one place. The Anthropic key lives only in the aiproxy environment. Worker / API / web never see it.
  • Cost meter. Every call is logged with model, input tokens, output tokens, cache hit, cost. See AI Economics.
  • Rate cap. Per-tenant + per-feature soft caps with hard refusal at threshold.
  • Model swap. Switching from Anthropic to a regional sovereign-cloud LLM in Phase 2 is a config change in aiproxy, not a code change in 14 places.
  • Cache. aiproxy can layer response-level caching for deterministic prompts (catalog questions, glossary expansions).

Outbound allowlist

The VPS firewall (UFW) restricts outbound traffic to:

  • api.anthropic.com, api.voyageai.com — from aiproxy only.
  • Coolify update servers, OS package mirrors — for system updates.
  • Backup target (object storage in a separate region) — encrypted, signed.

Anything else is denied by default. This kills two attack classes at once: data exfiltration via a compromised container, and prompt-injection-driven outbound calls.

Scale flag Scale
One VPS handles V1's load comfortably (~5 active engagements concurrently, <200 req/min peak). Phase 2 needs horizontal scale: read replicas, multiple worker nodes, Redis Sentinel, Anthropic key sharding. The architecture supports this — the data model is sharded by tenant_id and the work surfaces are stateless except for Postgres + Redis. See Multi-Tenancy.
Part C · Architecture

Data model

17 tables. Multi-tenant from day one. Versioned artifacts. Immutable audit log. Designed so Phase 2 doesn't require a rewrite.

Schema overview

TablePurpose
tenantsThe org owning a Mihwar instance. V1 has exactly one row (NMO). Phase 2 has many.
usersPeople who can log in. Belongs to a tenant.
sessionsLogin sessions. Cookie-bound, expiry-tracked, regenerated on login, IP-bound (soft).
service_principals newNon-user callers: aiproxy, worker, async-form-submitter, cron. Each has its own credential type.
clientsThe end-customer organisation NMO is consulting for. Belongs to a tenant. Owns Org Profiles.
org_profiles newThe persistent infrastructure profile of a client. Versioned. Field-level encrypted at rest for sensitive sections.
workspacesOne per client engagement. The unit of work. References the Org Profile version it started from.
workspace_membersWhich users have access to which workspace, at what role.
stage_artifactsThe output of each stage, per workspace. Versioned: every signoff creates a new immutable row.
messagesConversational log per stage — every AI exchange, every consultant entry. Linked to request_id.
catalog_entriesVendors, models, frameworks, patterns, constraints. Tenant-scoped (Phase 2 supports tier system).
questionsDiscovery question bank. Tenant-scoped, multilingual.
async_linksPer-stakeholder async form URLs. Time-limited, single-use, IP-logged.
async_responsesAnswers submitted via async links.
blueprintsCompiled Blueprint exports. Stored as both structured JSON and rendered HTML, with manifest hash.
audit_logImmutable. Every privileged action — signoffs, edits, exports, recommendations.
ai_callsEvery aiproxy call: input/output tokens, cache hits, model, cost, workspace, request_id, latency.

Versioning rules

All artifacts (stage_artifacts, blueprints, org_profiles) follow the same versioning pattern:

  • New row per signoff — never UPDATE.
  • version column auto-increments per parent.
  • signed_by + signed_at on the row that becomes "current".
  • parent_version link for diffing.
  • Working draft kept in a separate *_draft column or table; only frozen on signoff.

The tenant boundary

-- every business table has tenant_id with NOT NULL
ALTER TABLE workspaces ADD COLUMN tenant_id UUID NOT NULL
  REFERENCES tenants(id);
CREATE INDEX idx_workspaces_tenant ON workspaces(tenant_id);

-- row-level security enforced at the DB layer
ALTER TABLE workspaces ENABLE ROW LEVEL SECURITY;
CREATE POLICY ws_tenant_isolation ON workspaces
  USING (tenant_id = current_setting('app.tenant_id')::uuid);

-- API sets the session-local var on every request
SET LOCAL app.tenant_id = '01HV8Z…';

Indexes — the non-negotiable list

  • Every business table: (tenant_id) first, then (tenant_id, workspace_id) compound.
  • messages(workspace_id, stage, created_at DESC) — chat history retrieval.
  • stage_artifacts(workspace_id, stage, version DESC) — load latest version fast.
  • audit_log(tenant_id, actor_id, created_at DESC) — operator Logs page queries.
  • ai_calls(tenant_id, created_at DESC), plus (tenant_id, feature, created_at DESC) — cost dashboards.
  • async_links(token_hash) unique — single lookup on form load.
  • pgvector: HNSW index on catalog_entries.embedding for RAG retrieval.
Scale flag Scale
Pagination is keyset, not OFFSET. Long lists (messages, audit_log, ai_calls) use (created_at, id) < cursor. OFFSET on a 5M-row audit_log will scan from the start every page; keyset stays O(log n). See Observability Logs page.

Migrations

Alembic. Every migration declares its indexes with CONCURRENTLY for tables expected to grow past 100k rows (messages, ai_calls, audit_log). Migrations are reviewed in PR before being applied — no auto-apply on deploy.

Part C · Architecture

Stack choices

Every choice annotated with why. The bias is toward boring, well-documented, swap-safe technology — the kind a future contributor will thank us for.

Backend

ChoiceWhy
Python 3.12The AI ecosystem is Python-native. Anthropic SDK, vector DBs, embeddings — all Python-first.
FastAPIModern async framework, OpenAPI auto-gen, Pydantic-driven request validation.
SQLModel + SQLAlchemy 2.0One model serves database + API. No drift between schema and types.
AlembicMature schema migration. Boring on purpose.
asyncpgFastest Postgres driver in Python.
arqLightweight Redis-backed task queue. Idempotency keys, retries, DLQ.
Anthropic SDK (Python)First-party. Streaming, tool use, prompt caching, extended thinking.
LiteLLMThe aiproxy. Single egress, model swap, cache, cost.
structlogStructured JSON logs with auto context. See Observability.
OpenTelemetry SDKTraces. Quiet in V1, ready for distributed in V3.

Frontend

ChoiceWhy
Next.js 14 (App Router)Server components reduce JS shipped to browser. Perfect for the Blueprint viewer.
TypeScriptCatches errors at build time. Required for a multi-month codebase.
Tailwind CSSUtility-first. Lets the LLM (Claude Code) write consistent components without designing from scratch each time.
shadcn/ui (selected)Composable, accessible. Lifted into the repo, not added as a dependency.
ZodShared validation between client and server. Pydantic models at the API end, Zod schemas at the form end, both generated from the same source.
SWRClient-side caching for read endpoints. Optimistic updates for the workshop UI.

Data & vectors

ChoiceWhy
Postgres 16RLS, JSONB, generated columns, extensions. The default for everything.
pgvector + HNSWCatalog has <5k entries — pgvector handles it well at this scale. Phase 2 may justify a dedicated vector store; pgvector is the right starting point.
Voyage AI embeddingsStrong multilingual including Arabic. Paid API, kept behind aiproxy.
Redis 7Queue, session, rate-limit, response cache. One tool, four jobs.

Infra & ops

ChoiceWhy
Hostinger KVM VPSPredictable cost, root access, KSA-adjacent regions. Sufficient for V1 throughput.
CoolifySelf-hosted deployment platform. Git-driven deploys, rollbacks, env management.
Caddy (managed by Coolify)Automatic TLS, HSTS, CSP injection.
Docker ComposeFive containers, one VPS. Kubernetes is overkill at this scale.
UFWOutbound allowlist, default deny.
CloudflareDNS, DDoS shield, optional country-restriction rules.
Bahrain S3-compatible object storageEncrypted backup target, separate region.

Why no microservices

Microservices are a horizontal-scaling pattern. Mihwar V1 has one tenant and a handful of users. Microservices would buy nothing and cost weeks of build time, more failure modes, harder local development. The shape of "5 containers, one VPS" lets us ship the workflow in 6 weeks. Phase 2 may eventually warrant horizontal scaling — but that's a graduation move, not a starting point.

Boring is a feature
Every choice on this page can be hired against in KSA today. Every choice has 5+ years of production track record. Every choice has a clear "what would replace this" answer if it ever needs to change.
Part C · Architecture

Multi-tenancy strategy

V1 has one tenant. Phase 2 may have hundreds. The data model and security boundaries are designed today so the V3 pivot is a deployment change, not a rewrite.

The core decision

Bake tenancy in from day one. This is the #1 architectural decision in this masterplan. Every AI startup that "added multi-tenancy later" rebuilt their backend at month 9. — Architectural commitment

The cost of doing it now is one column on a few tables and one Postgres feature (RLS). The cost of doing it later is months of refactoring while engagements are paused.

The three tenancy levels

LevelStatusWhere it livesWhat it gives
1 · Schema-awareV1tenant_id column on every business table; index leads with itCheap query scoping; trivial to add
2 · Row-level securityV1Postgres RLS policies use app.tenant_id session varDB enforces tenant isolation even if app has a bug
3 · Tenant context plumbingV1FastAPI dep extracts tenant from session, sets SET LOCAL app.tenant_id per requestApplication layer is incapable of cross-tenant queries by accident
4 · Per-tenant DEKV1 for sensitive fieldsKMS-wrapped data encryption keys, one per tenantField-level encryption for Org Profile sensitive sections; tenant deletion = key deletion
5 · Schema-per-tenantPhase 2 enterprise tierDedicated schema per tenant, switched via search_pathStronger isolation for regulated subscribers
6 · DB-per-tenantPhase 2 sovereign tierDedicated Postgres instance per tenant, deployed in-regionHard residency, full backup separation

Cross-tenant tests

Every CI run executes a "tenant fence test": create two tenants, two users, two workspaces. Authenticate as user-A. Try to read user-B's workspace, message, audit log, blueprint. Assert 404 (not 403 — 403 leaks the existence of the resource). The test fails the build if any cross-tenant read returns data.

Tenant deletion (Phase 2)

When a Phase 2 tenant cancels and confirms erasure:

  1. Org Profile DEK is destroyed in KMS — encrypted fields become unrecoverable.
  2. All tenant_id-scoped rows are hard-deleted in a single transaction.
  3. Backup retention for that tenant is honoured (90 days) then purged.
  4. An audit log entry is written to a separate tenant-deletion ledger (kept indefinitely for compliance reasons).
A common mistake to avoid
Do not stuff tenant_id into the JWT and trust it client-side. The client never names its own tenant. The server resolves session_id → user_id → tenant_id on every request and uses the server-resolved value. Trusting client-supplied tenant IDs is one of the top sources of multi-tenant data leaks.
Part C · Architecture

Client security & PDPL

Mihwar handles client data — sometimes sensitive infrastructure inventories, sometimes regulated information. Security is not a sprinkle on the end; it's a structural choice baked into the architecture.

The threat model

Mihwar must defend against, in order of likelihood:

  1. Accidental data exposure. Bug returning the wrong client's data. Mitigated by RLS at the database layer.
  2. Compromised API keys. Anthropic key leaked. Mitigated by aiproxy as single egress and Anthropic's per-key rate limits + outbound allowlist.
  3. Stolen session token. Mitigated by short cookie lifetime, httpOnly, SameSite=Strict, IP-binding (soft), regenerate-on-login, MFA on the passphrase.
  4. Unauthorised async-link access. Mitigated by single-use cryptographically-random tokens, time expiry, IP logging, generic 404 on invalid.
  5. Malicious prompt injection in client docs. Mitigated by input quarantine (untrusted data in a delimited <document> block, never as system prompt), tool-use isolation, output filtering before any tool invocation.
  6. Mihwar host compromise. Mitigated by isolated networks, encrypted backups offsite, no shared secrets between containers, OS hardening, regular patching. See Mihwar's Own Security.
  7. Insider error. Mitigated by per-user audit log, separation of admin vs operator roles, two-person sign-off for destructive actions (V3).

Authentication

  • Single passphrase + TOTP MFA for V1. Passphrase Argon2id-hashed (memory cost ≥64 MB, time cost ≥3, parallelism 1). MFA code stored as TOTP secret encrypted at rest.
  • Account lockout with exponential backoff after 5 failed attempts within 15 min. Logged.
  • Session cookies: httpOnly, Secure, SameSite=Strict, ≤8h lifetime, sliding refresh. Tokens are secrets.token_urlsafe(32) — 256 bits of entropy. Stored as SHA-256 hashes in the DB.
  • Session regenerated on login (no session fixation). Old session ID invalidated on logout.
  • Phase 2: SSO via OIDC (Microsoft Entra, Google Workspace, Okta) and SAML for enterprise tier.

Caller identity model new

Every API call identifies its caller before any work. Caller types are explicit and disjoint, each with its own credential mechanism:

Actor typeCredentialWhere it livesExample
userSession cookie (Argon2id-derived)Browser, httpOnlyAhmed running a Lab
serviceService token (random ≥256-bit)Container env, never loggedWorker calling api
agentTool-use token, scoped per callIssued per-job by APIaiproxy-driven tool call
webhookHMAC-signed payloadSigning secret rotated quarterlyAsync form submission
cronService token, restricted to cron pathsCoolify envNightly catalog re-embed

Verified identity is attached to the request context (contextvars) and used for every downstream check. Permission is checked against the verified caller, never against client-supplied identifiers. Rate limit applies per verified identity, not IP alone.

Authorisation

  • Default DENY at framework layer. Every endpoint declares its required permission explicitly.
  • Object-level ownership check on every read/write. "Does this user have access to this workspace?" — answered server-side, no exceptions.
  • tenant_id in every query at the app layer + RLS at the DB. Two layers of defence; the second one catches the first one's bugs.
  • Async-link tokens are scoped to a single question + recipient + workspace and expire. They are not session tokens.

Input validation

  • Zod / Pydantic schema at every API boundary. Reject malformed, never "clean".
  • Body size limits at HTTP layer (1 MB default; 10 MB for known upload paths).
  • File uploads: validate MIME + extension + magic bytes; UUID filename server-side; outside web root; AV scan on receive (ClamAV); never executed.
  • SQL: parameterised queries only. Dynamic identifiers (rare — ORDER BY columns) come from a server-side allowlist.
  • Command injection: never pass user input to shell. Argument arrays only.

XSS / CSRF / headers

  • Auto-escape via React. Raw HTML rendering only via DOMPurify with strict allowlist (used in Blueprint preview, never in chat).
  • Strict CSP: default-src 'self'; img-src 'self' data:; style-src 'self' 'unsafe-inline' fonts.googleapis.com; font-src fonts.gstatic.com; connect-src 'self' — no unsafe-eval, no unsafe-inline scripts. Nonces for inline if absolutely needed.
  • HSTS preloaded. X-Content-Type-Options: nosniff, Referrer-Policy: strict-origin-when-cross-origin, X-Frame-Options: DENY, Permissions-Policy tightly restricted.
  • Strip Server / X-Powered-By.
  • CSRF: SameSite=Strict cookies + double-submit token on state-changing endpoints from Next.js Server Actions.

Data protection

  • TLS 1.2+ everywhere. HSTS enabled.
  • Sensitive fields at rest: Org Profile sensitive sections (cloud account IDs, security tooling vendor names, regulatory pointers) and the discovery inventory free-text fields are field-level AES-256-GCM encrypted using per-tenant DEKs wrapped by a master KEK in KMS.
  • Display masking: sensitive fields show •••• •••• until explicitly revealed; reveal is audit-logged.
  • Data minimisation: Stage 2 captures structural facts, not raw documents. If the consultant pastes a vendor contract into chat, Mihwar warns and recommends extracting only the structural answer.
  • Retention: active workspaces indefinite; closed workspaces 5 years (PDPL records-of-processing rationale); audit log indefinite or per-legal; chat messages 2 years; ai_calls 1 year aggregated then summarised.

PDPL compliance

The Saudi Personal Data Protection Law applies whenever Mihwar processes personal data of KSA residents. Key obligations:

  • Lawful basis. Discovery interviews capture infrastructure data and stakeholder names — processed under "performance of a contract" with NMO's client. Consent collected separately for case-study reuse.
  • Data residency. If a client requests residency, NMO can deploy a dedicated Mihwar instance in a Bahrain or Riyadh region. Default Hostinger VPS is sufficient for most engagements but does not meet strict residency for SAMA-Tier-1 banks. Phase 2 enterprise tier ships with explicit residency contractual commitments.
  • AI calls. Anthropic's API processes data outside KSA. Disclosed in client engagement agreement. For residency-sensitive clients, Phase 2 sovereign tier routes through a regional model deployment via aiproxy.
  • Data subject rights. Right of access (export Org Profile + Blueprint history). Right of erasure (tenant deletion flow above). Right of correction (edit Org Profile, versioned).
  • Breach notification. Documented runbook (see Ops Handbook) — SDAIA notification within 72h for qualifying breaches.

Prompt-injection defence

Untrusted input — pasted client docs, async form responses, third-party content — is treated as data, not instruction:

  • Wrapped in delimited blocks (<document index="1">…</document>) in the prompt. The system prompt instructs the model to treat block content as data only.
  • Tool-use is gated: any tool that would write to the database, send an email, or call an external API requires explicit consultant confirmation in the UI before execution. The AI cannot autonomously act.
  • Output goes through a guardrail check before any side-effect: if the AI emits an unexpected tool call, malformed JSON, or attempts to address the user with apparent privileged instructions, the call is rejected and logged as a suspected injection.

Dependency hygiene

  • Pinned versions, lockfiles committed (uv.lock or poetry.lock for Python; pnpm-lock.yaml for JS).
  • CI runs pip-audit + pnpm audit + trivy fs+image + gitleaks. HIGH or CRITICAL fails the build.
  • One-paragraph justification on every new dependency in the PR description.
  • Quarterly dependency review in addition to CI gating.

Errors & information disclosure

Production errors return: {"error":"internal","reference":"ERR-7K2P9X"}. The reference ID maps server-side to the full stack trace + request_id + tenant_id + user_id. Stack traces, paths, and schema info never reach the client. The Logs page lets the operator look up any reference ID in 5 seconds.

CRITICAL flag Security
Any HIGH or CRITICAL security risk identified in this section is fixed before deployment, not behind a feature flag. The list above is the working contract; deviations require explicit, dated, written exceptions.
Part C · Architecture · New

Mihwar's own security & infrastructure

We give clients $30k consulting on AI architecture security. We will not run a sloppy host. This page is the rigour we apply to Mihwar itself — what we lock down, how we patch, where the keys live, what the backups look like, who responds when something breaks.

The principle

A consultancy that runs sloppy infrastructure cannot credibly sell architecture advice. Mihwar's own posture is the first thing a security-aware client will probe — and it had better answer well. — Operating standard

VPS hardening — day one

  • Non-root user. No login as root. SSH key-only, password auth disabled.
  • SSH: port moved off 22 (low-effort but cuts ambient scan noise), fail2ban with bans on auth failure, allowed-from-IP list for the operator's static IPs (with break-glass procedure documented).
  • Firewall (UFW): default deny inbound; allow only :443 (Caddy), the moved SSH port, and the WireGuard endpoint for ops access. Default deny outbound; allowlist Anthropic/Voyage/Coolify/backups.
  • OS: unattended-upgrades enabled for security patches. Auto-reboot scheduled in low-traffic window with notification.
  • Auditd running with rules for SSH login, sudo, and config-file modification. Logs ship to the same structured pipeline as application logs.
  • No bare ports. Postgres + Redis + aiproxy + worker bind to 127.0.0.1 on the host, exposed to other containers via the Docker network only.
  • WireGuard ops VPN. Admin / DB / Coolify dashboards reachable only inside the VPN, not on public internet.

Secrets management

SecretWhere it livesRotation
Anthropic API keyaiproxy env (Coolify-injected)Quarterly + on suspicion
Voyage API keyaiproxy envQuarterly
Postgres superuserCoolify-managed, never in repoAnnual
App DB userCoolify env, least-privilegeAnnual
Session signing secretCoolify env, ≥256-bitQuarterly
HMAC webhook secret (async forms)Coolify envQuarterly
Blueprint signing key (Ed25519)aiproxy env, archived versions kept for verificationAnnual
KMS master keyExternal KMS (DigitalOcean / Hetzner / cloud-managed)Annual + on suspicion
Per-tenant DEKKMS-wrapped in DB; plaintext only in app memory at request timeOn tenant request or annually
Backup encryption passphraseOffline copy in a 1Password vault + sealed envelope physically heldAnnual

Never in source. A pre-commit hook (gitleaks) and CI scan reject any push that looks like a secret. The .env.example file is committed with placeholder values; the real .env is gitignored and lives only on the VPS via Coolify.

Database safety

  • App connects as a least-privilege role (no DROP, no ALTER, no TRUNCATE). Migrations run as a separate role only during deploys.
  • Daily logical backup via pg_dump, encrypted client-side with the backup passphrase, shipped to off-region object storage. 30-day retention.
  • Weekly base backup + WAL archiving for PITR (point-in-time recovery up to 7 days).
  • Quarterly restore drill — restore yesterday's backup into a sandbox, confirm checksum match, walk a smoke test, document timing.
  • Backup encryption verified by openssl enc -d dry-run weekly via cron; alert on failure.

Container safety

  • Images pinned by digest, not :latest.
  • Read-only root filesystem where the app permits (Postgres and Redis need writable; api / web / aiproxy / worker can all run RO).
  • Drop all Linux capabilities except those the process actually needs. No --privileged.
  • Secrets injected via env, never baked into images. Build args reviewed.
  • Docker socket NOT mounted into any application container.
  • trivy image scans every image at build time; HIGH/CRITICAL fails the deploy.

CI/CD security

  • GitHub Actions runners use OIDC to fetch deploy credentials — no long-lived secrets in repo settings.
  • Branch protection on main: required reviews, required status checks (lint, types, tests, vuln scan, gitleaks).
  • Signed commits enforced (Ahmed's GPG key documented).
  • Deploy is a Coolify webhook fired by CI on green main. Deploys produce a release tag + git SHA + image digest record.
  • Rollback is one click in Coolify or one command on the VPS — last 5 deploys retained.

Incident response

A documented runbook in /srv/mihwar/runbooks/incident.md on the VPS itself (so it's available even if the website is down). Phases:

  1. Detect. Alerts (cost spike, 5xx burst, auth-failure burst, backup failure).
  2. Triage. Severity classification: data exposure / availability / cost / minor.
  3. Contain. Standard containment per severity. For suspected key leak: aiproxy rotates the Anthropic key immediately and revokes; new key activated within 10 minutes.
  4. Communicate. Active engagement clients told within 24h if their data plausibly affected. SDAIA notification within 72h for qualifying PDPL breaches.
  5. Restore. Per the playbook for each scenario.
  6. Postmortem. Blameless within 7 days. Lessons feed the catalog and the runbook.

Disaster recovery

ScenarioRPORTOProcedure
VPS lost (provider outage)≤24h (last backup)≤4hProvision new VPS via Terraform-recipes (kept in repo); Coolify recovery; restore latest backup; rotate all secrets; validate.
Database corruption≤1h (WAL)≤2hPITR to last clean point; replay missed work from messages log + audit log; client notification if signoffs invalidated.
Anthropic API outageaiproxy fails open with a "synthesis temporarily unavailable" UI message. Background queue retains jobs; resumes on recovery.
Key compromise≤30 min to rotateRunbook drives rotation: Anthropic key, session signing, KMS keys (with re-wrap), HMAC, Ed25519 signing.
Single-passphrase compromise≤10 minForce logout all sessions, rotate passphrase + TOTP, audit-log review for unexpected actions.

Monitoring & alerts

  • Health checks: /health on api & web; Caddy probes them every 30s. Down for >2 min → Pushover alert to Ahmed.
  • Cost alerts: hourly aiproxy cost > threshold (e.g. $10/h sustained 2h) → alert. Daily $200+ → page.
  • Auth alerts: 10+ failed logins in 5 min from any IP → notify; specific user 5+ in 15 min → forced lockout + alert.
  • Backup alerts: nightly backup not produced by 03:00 → page. Backup checksum failure → page.
  • Disk: <15% free → notify; <5% → page.
  • Certificate: Caddy auto-renews; alert if renewal fails.

Annual security tasks

  • External penetration test against staging environment (post-Phase 1, before Phase 2 launch).
  • Restore drill with timing measurement.
  • Key rotation: KMS master, signing keys, all long-lived secrets.
  • Dependency review beyond CI: pruning unused, evaluating maintenance state.
  • Access review: who has VPS access, who has Coolify dashboard access, who has DB access — and is that still right?
  • Runbook tabletop: walk an incident scenario end-to-end with the team.

What gets deferred to Phase 2

  • SOC 2 Type II evidence (start collecting in V1; certify after the second full year of operation).
  • Bug-bounty program.
  • WAF in front of Caddy (Cloudflare provides much of this for free; managed WAF added when client demand justifies).
  • Customer-facing security portal with SOC reports + sub-processor list.
  • Dedicated SIEM. (V1 ships logs to a structured pipeline — see Observability — and grep-via-Logs-page covers V1 needs.)
Posture summary
None of the above is exotic. All of it is achievable in 6 weeks for a disciplined operator. The whole point: "boring, well-executed, documented" beats "modern, half-implemented, undocumented" every time. This is the rigour we sell.
Part C · Architecture · New

AI economics

Mihwar is built on Claude. Claude is the most expensive line item in the operating cost. The discipline that keeps a $25k Blueprint at 75% margin in Phase 1 — and makes a $1,200/mo subscription affordable in Phase 2 — is on this page.

The unit economics, modelled

Before any feature ships, we model: cost per call × calls per Blueprint × Blueprints per month. The targets:

StageCalls / BlueprintAvg cost / callCost contribution
Stage 1 · Lab (Sonnet · streaming)~30 turns$0.05–$0.12~$1.50–$3.50
Stage 2 · Discovery filtering (Haiku)~5 calls$0.01–$0.03~$0.10
Stage 2 · Async prompt drafting (Haiku)~10 calls$0.01~$0.10
Stage 3 · Synthesis (Sonnet · ext. thinking)1–3 generations$3–$8~$5–$20
Stage 4 · Playbook generation1–2 generations$2–$5~$3–$8
Embedding catalog reads (Voyage)~50 lookups$0.0005~$0.03
Total per Blueprint (target)$10–$32

At a $25k Blueprint, AI cost is ≤0.13% of revenue. The discipline below is what keeps it there.

The seven levers

1 · Smallest capable model

Two-tier routing throughout. Haiku handles: discovery question filtering, async prompt drafting, glossary expansion, classification (is this an inventory question or a use-case question?), single-turn lookups, simple tool selection. Sonnet handles: Lab interviewing, architecture synthesis, Playbook generation, anything where reasoning quality matters. Never use Opus unless an unsolved-for-Sonnet workload appears — and that becomes a separate budgeted decision.

2 · Prompt caching (every static prefix)

The catalog snapshot, the house style guide, and the system prompt for each stage are cached via cache_control: ephemeral. Order: stable → variable. Verify on call #2+ that cache_read_input_tokens > 0; if zero, the prefix is drifting (timestamp, random tool order, mutable preamble). Hit rate target: ≥80% for repeated within a 5-min cache TTL.

messages = client.messages.create(
  model="claude-sonnet-4-6",
  system=[
    {"type":"text","text":HOUSE_STYLE,
     "cache_control":{"type":"ephemeral"}},
    {"type":"text","text":CATALOG_SNAPSHOT,   # ~50k tokens
     "cache_control":{"type":"ephemeral"}},
    {"type":"text","text":STAGE3_PROMPT,
     "cache_control":{"type":"ephemeral"}},
  ],
  messages=conversation_history,
  max_tokens=4096,
)
# log: input_tokens, cache_read_tokens, cache_creation_tokens

3 · Batch API for non-realtime work

50% off — used for: nightly catalog re-embedding, retroactive question generation when the catalog changes, eval runs against past Blueprints to spot regressions, scheduled summarisation of long workspace histories. Anything tolerating >seconds latency.

4 · Context discipline

  • Never dump full conversation history. Past turns >20 are summarised into a "rolling synopsis" by Haiku and re-injected.
  • Never pass the full catalog. RAG with Voyage embeddings → top-K (typically 8–15 entries).
  • Cap max_tokens on every call. Stage 1 turn: 1024. Stage 3 synthesis: 8192. Async draft: 256.
  • Ask for terse output / structured JSON in the system prompt. "No preamble. JSON only." saves 10–20% output tokens.
  • Stream + cancel for user-cancellable surfaces (chat). Cancel kills the call mid-token; tokens to that point are still billed but the rest is not.

5 · Response caching

Hash (model, prompt, tools, temperature) → cache the response in Redis for hours when the prompt is non-personalised (catalog-only Q&A, glossary expansions). Semantic cache for near-duplicate queries (cosine ≥ 0.95) — not used in V1 but designed-for. Pre-compute predictable queries on a schedule (e.g. "expand each catalog entry into a one-paragraph summary" — done in Batch API, served from cache).

6 · Cheaper alternatives first

Before any LLM call, the question: is there a regex / SQL aggregation / classical-ML / rules path that gets us to the answer 100×–10,000× cheaper? Examples in Mihwar:

  • Email validation in async forms — regex, not Sonnet.
  • Stage 2 question selection when the use case category is well-known — rules-based filter, not "ask the LLM which questions to ask".
  • Markdown rendering of artifacts — server-side renderer, not "ask Sonnet to format this nicely".
  • PII scrubbing in logs — regex blocklist, not LLM.

7 · Agentic loop budgets

Any tool-using flow caps max_iterations (default 10) AND max_tokens_per_session (default 30k). Tool selection is done by Haiku where possible. Tool results are cached. Independent tool calls are parallelised. The loop refuses on hitting a budget rather than spending unbounded.

Per-call observability

Every aiproxy call writes a row to ai_calls:

{
  "request_id": "01HV…",
  "tenant_id": "nmo-001",
  "user_id": "ahmed",
  "workspace_id": "ws-0042",
  "feature": "stage3.synthesis",
  "model": "claude-sonnet-4-6",
  "input_tokens": 52800,
  "cache_read_tokens": 50100,
  "cache_creation_tokens": 0,
  "output_tokens": 4200,
  "latency_ms": 38400,
  "cost_usd": 0.279,
  "cache_hit_rate": 0.949
}

Per-feature / per-tenant cost dashboards

The operator Logs page (see Observability) includes a Cost view: per-feature bar chart for the last 30 days, per-tenant ranking, anomaly highlights (a tenant burning 10× their normal rate). Drilling in shows the calls behind any bar.

Hard budget caps

  • Per-tenant monthly cap. Soft warn at 80%, hard refuse at 100%. Phase 1: $500 default for NMO (well above expected). Phase 2: tied to subscription tier.
  • Per-feature daily cap. Stage 3 synthesis: 50 generations / day / tenant. Beyond is rate-limited with a clear UI message.
  • Per-user 1-min cap. Anti-runaway: 20 calls in 60 seconds → temporary cooldown.

Phase 2 cost discipline Phase 2

Phase 2 changes the math: many tenants, lower revenue per Blueprint, more risk of pathological usage. Discipline tightens:

  • Subscription tiers include Blueprint-count caps (Starter: 3 Blueprints/yr; Team: unlimited within $1,200/mo notional cost; overage charged).
  • Per-tenant aiproxy budget enforced atomically: counter incremented in Redis, hard refused at threshold, transparent UI.
  • Cheap-path features promoted: Org Profile reuse cuts ~40% of Stage 2 cost on repeat Blueprints; the cost saved becomes Phase 2 margin.
  • Free-tier-evaluation: a "draft Blueprint" mode using only Haiku for evaluation prospects, with clear "upgrade to full" CTA.
Cost flag Cost · the rule
If a feature would push expected cost-per-Blueprint > $40, it is redesigned (cheaper model, cached prefix, RAG instead of dump, batch instead of realtime) before shipping. There is no "we'll optimise later" — later is when the cost has already trained users to expect the feature.
Part C · Architecture · New

Observability & the Logs page

Every product Mihwar produces ships with an operator Logs page on day one. We hold ourselves to the same standard: when a client says "something happened on Tuesday at 3pm", an operator can reconstruct it in 60 seconds.

The principle

Logs are not for grep'ing on the day of an incident. Logs are the system's memory. Mihwar's logs let an operator at 2am, six months from now, answer: which user did what, with what data, when, with what result, and what did the system do downstream?

The mandatory envelope

Every line of every service is structured JSON, one event per line, with this envelope:

{
  "timestamp": "2026-05-07T14:32:18.420Z",
  "level": "info",
  "service": "mihwar-api",
  "env": "prod",
  "event": "stage.signoff",
  "message": "Stage 2 signed off",

  "request_id": "01HV8Z9K3J5XPQ8WMY4N6T2RES",
  "tenant_id": "nmo-001",
  "user_id": "ahmed",
  "actor_type": "user",
  "session_id_hash": "sha256:7c4b…",
  "ip": "91.193.x.x",
  "user_agent": "Mozilla/5.0 …",

  "workspace_id": "ws-0042",
  "stage": 2,
  "version": 5,
  "duration_ms": 14
}

Identity on every line

  • user_id — stable internal ID, never email. Explicit null with reason for unauthenticated paths.
  • tenant_id — required on every request/job line. No exception.
  • actor_typeuser | service | agent | webhook | cron | system.
  • request_id — generated at the edge (Caddy via X-Request-Id if present, else minted by api). Propagates to every downstream call.
  • session_id_hash — for grouping a user's actions in a session without exposing the raw token.
  • Login attempts log the email/username attempted. Never the password attempted.

Request-id propagation

What gets logged

Event classExamplesReason
Auth eventsauth.login.success, auth.login.failure, auth.logout, auth.mfa.enrolled, auth.token.refresh, auth.lockoutForensic reconstruction of who-was-where.
Sensitive readsorg_profile.read with field list, blueprint.exportPDPL audit trail for personal/regulated data access.
WritesStage signoffs, profile updates with compact diff of changed fieldsReconstruct what changed when a client disputes a recommendation.
External callsaiproxy.call with model, status, latency, retries, costCost forensics; vendor incident correlation.
Jobsjob.enqueued, job.started, job.succeeded, job.failed, job.dead_lettered"Why did Stage 3 never finish?" answered in 5s.
ErrorsStack trace + reference ID + tenant + user + request_idMap a client's "ERR-7K2P" reference back to root cause.
Async link eventsasync.issued, async.opened, async.submitted, async.expiredForensics on form-based data submissions.

What never gets logged

  • Passwords / token values / API keys / JWTs / session secrets / refresh tokens / signing keys / encryption keys / TLS private keys.
  • Full credit-card numbers / CVVs / bank-account numbers / full national IDs / passport numbers.
  • Raw request bodies for password / payment / sensitive-PII endpoints.
  • Authorization headers / session cookies / any credential-bearing header.
  • Full personal addresses / phone numbers / email addresses unless the event specifically requires them (login attempts include the username; profile updates include the changed field but with the value redacted unless it's structurally non-PII).

Scrubbing middleware

Two-layer defence. Field-name blocklist (password, token, secret, cookie, authorization, api_key, plus tenant-specific entries) recursively replaces values with ***REDACTED***. Value-pattern scrubbing catches credit-card / JWT / AWS-key shapes regardless of field name. Unit tests assert that a known sensitive payload never reaches the sink intact — these tests fail the build.

The Logs page

Every Mihwar product has a Logs page from V1 — including Mihwar itself. Operator UI features:

  • Filter by user_id / tenant_id / request_id / event class / level / service / time range.
  • One-click "all events for this request_id" — joins every line across services into a chronological view.
  • One-click "all events for this user_id in last N hours" — for forensic and support workflows.
  • One-click "trace this error reference" — paste an ERR-… code, see the stack + context.
  • Cost view (see AI Economics): per-feature, per-tenant, per-user.
  • Permission-gated: logs:read for general; logs:read:sensitive for sensitive-read events; logs:export for CSV export with audit-log entry per export.
  • Export with row cap (10k default) and audit log entry stating who exported what window.

Retention

ClassHot retentionCold retention
App logs (info)30 daysCompressed off-host for 90 days, then deleted
Errors / warns90 daysOff-host for 1 year
Audit log (auth, permissions, sensitive reads, admin)1 year hotIndefinite cold storage with integrity hashing
ai_calls90 days rawAggregated (per-feature daily) kept indefinitely
Debug logs≤7 days, off in prod by default

Distributed-tracing readiness

OpenTelemetry SDK is wired in V1 but quiet. Spans are created for: HTTP request, DB query, aiproxy call, queue job. Exporter is configured but pointed at a dev sink. When Phase 2 demands distributed tracing (e.g. dedicated DB tier triggers cross-host calls), turning on Tempo / Honeycomb / cloud trace is a config change, not a code change.

Alerts driven by logs

  • 5xx burst (10+ in 60s on api) → page.
  • Auth-failure burst → Pushover.
  • Backup-not-emitted by 03:00 → page.
  • aiproxy cost > threshold/hour → notify.
  • Worker DLQ depth > 0 → notify within 15 min.
  • Async-link bulk submission anomaly (e.g. 50 submissions on one token in 1 minute) → flag suspected abuse.
Observability flag Observability
The Logs page is not a v2 feature. It ships in Week 4 of the V1 build. Without it, Mihwar is uninvestigable; with it, Mihwar is operable by one person at 2am.
Part D · Build

6-week roadmap

From empty Hostinger directory to first signed-off engagement Blueprint in six calendar weeks. Built primarily by Claude Code with Ahmed reviewing and steering.

Build philosophy

Mihwar is built in vertical slices: each week ends with something demoable, not a half-finished horizontal layer. By Week 2 there's a working login and a working Lab. By Week 4 there's a working full Stage 1 → Stage 3 → Blueprint export. Weeks 5 and 6 are polish, AR localisation, and dogfooding on a real client engagement.

Week 1 · Foundation — VPS, Docker, schema, auth

  • Provision mihwar.nmopartners.com on Hostinger.
  • Set up mihwar_net with Docker Compose: postgres, redis, aiproxy, api skeleton, web skeleton, worker.
  • Implement single-passphrase auth + TOTP scaffolding.
  • Define and migrate the 17-table schema (incl. service_principals, org_profiles).
  • Set up Coolify deployment with branch-protected GitHub repo.
  • VPS hardening per Mihwar's Own Security: SSH, UFW, fail2ban, WireGuard.

Demo at end of week: Ahmed logs in to an empty workspace UI on a real domain, healthcheck green, all containers running, backups firing nightly.

Week 2 · Stage 1 Lab — fully working

  • Workspace shell (sidebar, stage panel layout, theme toggle).
  • Stage 1 end-to-end: chat surface, streaming AI responses (Sonnet via aiproxy), live-updating 1-pager artifact panel, signoff button.
  • Seed catalog with ~10 sample entries (just enough to test).
  • House-style prompt + banned-phrases filter live.
  • Prompt caching wired and verified (cache_read_input_tokens > 0 by call 2).

Demo: Ahmed runs a full Lab session with a real client, produces a 1-pager.

Week 3 · Stage 2 Discovery + Org Profile

  • Discovery taxonomy + question selection logic (Haiku-driven).
  • Org Profile schema + settings UI + field-level encryption with per-tenant DEK.
  • Async link issuing, form rendering at /async/{token}, response capture.
  • Stage 2 readiness meter live.
  • Stage 3 unlock gate enforced server-side.

Demo: a Stage 2 inventory completed across two live answers + three async-form submissions, with Stage 3 cleanly unlocked.

Week 4 · Stage 3 Architecture synthesis + Logs page

  • arq worker + Stage 3 synthesis job.
  • Catalog RAG via pgvector + Voyage embeddings.
  • SVG diagram auto-generation from component manifest.
  • Blueprint v1 compile (HTML render with all sections).
  • Logs page MVP: filter by request_id / user_id / tenant_id, "all events for this request" join.
  • ai_calls table + cost view in Logs page.

Demo: empty workspace → Stage 1 → Stage 2 → Stage 3 → click "Compile Blueprint" → bilingual HTML opens. Logs page shows the entire chain by request_id.

Week 5 · Stage 4 Playbook + AR localisation polish

  • Stage 4 implementation (build plan, risk register, vendor short-list, RFP spec template).
  • Full AR translation review of UI + Blueprint render. RTL polish.
  • Presentation mode for Blueprint walkthrough.
  • Manifest signing (Ed25519).
  • Catalog seeding from AI Ecosystem Primer (target 80–120 entries).

Demo: a full Tier 2 engagement walked end-to-end with all five stages, bilingual export, signed manifest verifiable.

Week 6 · Dogfood + ship

  • Run the first paying engagement on the live tool.
  • Track every paper-cut Ahmed hits — fix the breaking ones, file the rest.
  • External penetration test booked (runs in Week 8 against staging clone).
  • Backup restore drill — full recovery from yesterday's backup into a sandbox.
  • Final audit: master pre-commit checklist (see top of doc) — every box ticked or explicitly deferred with date.

Demo: First $25k Blueprint shipped. Mihwar is real.

Slip discipline
If Week 4 doesn't land Stage 3 working, the Playbook (Week 5) drops first — never compromise the gate or the Logs page. Mihwar without those is a different product, less defensible.
Part D · Build

Claude Code prompts

Five sequential prompts covering the build. Each prompt is self-contained and is run inside a single Claude Code session. Run them in order. After each one, review the diff, commit, and proceed.

Pre-flight

Before running any prompt:

  • SSH to the Hostinger VPS as a sudo-capable user.
  • Confirm Coolify is running and accessible at the WireGuard-only admin URL.
  • Confirm *.nmopartners.com resolves to the VPS.
  • Have these env vars ready: ANTHROPIC_API_KEY, VOYAGE_API_KEY, ADMIN_PASSPHRASE (Argon2id-hashed at boot), SESSION_SIGNING_SECRET (≥256-bit), HMAC_WEBHOOK_SECRET, BLUEPRINT_SIGNING_KEY (Ed25519 private), KMS_MASTER_KEY_ID.
  • Create empty directory /srv/mihwar/.
  • Create empty GitHub repo Arcahmed93/mihwar (private), with branch protection on main.
  • Configure pre-commit hook with gitleaks + ruff + mypy + biome.

Prompt 1 · Foundation

Scaffold the project, set up the database schema, implement single-passphrase auth, get Mihwar deployable on Hostinger via Coolify.

You are building Mihwar, a private consulting cockpit for AI use-case
discovery. This prompt scaffolds the project, sets up the database schema,
implements single-passphrase auth, and gets Mihwar deployable on Hostinger
via Coolify.

# DEPLOYMENT TARGET
- Hostinger KVM VPS, working directory /srv/mihwar/
- Subdomain mihwar.nmopartners.com (DNS already resolves to the VPS)
- Reverse proxy + TLS managed by Caddy via Coolify
- Containers on a NEW Docker network called mihwar_net (do NOT join apex_net)

# SIX CONTAINERS
1. mihwar-postgres — Postgres 16 + pgvector, volume mihwar_pg_data,
   port 5435 internal only
2. mihwar-redis — Redis 7, port 6380 internal only
3. mihwar-aiproxy — LiteLLM proxy, routes claude-* via Anthropic and
   voyage-* via Voyage. Port 4000 internal only
4. mihwar-api — Python 3.12 / FastAPI / SQLModel / asyncpg / arq client,
   port 8000 internal only
5. mihwar-worker — same image as api, runs arq worker
6. mihwar-web — Next.js 14 (App Router) / TypeScript / Tailwind / shadcn,
   SSR, port 3000 internal only

# FOUNDATIONAL RULES
- Pin every image by digest. No :latest.
- Containers run as non-root. Read-only root FS where possible.
- App DB user is least-privilege; migrations run as a separate role.
- gitleaks pre-commit hook in repo. CI runs trivy + pip-audit + pnpm audit.
- Structured JSON logging from line one (structlog in Python; pino in Node).
- request_id middleware on api: accept X-Request-Id, else mint ULID.
- Caller-identity context: contextvars carrying user_id, tenant_id,
  actor_type, request_id. Every log line emits these via a structlog processor.

# SCHEMA (17 tables — see masterplan p-data)
Generate the SQLModel definitions and an initial Alembic migration.
Every business table has tenant_id NOT NULL with an index leading on it.
Enable Postgres RLS on every business table; policies use
current_setting('app.tenant_id')::uuid.

# AUTH
Single-passphrase login with Argon2id (memory_cost=65536, time_cost=3).
TOTP enrolment endpoint (issues secret + QR via otpauth URL, stored encrypted
under the per-tenant DEK). Session cookies: httpOnly, Secure, SameSite=Strict,
8h sliding. Sessions stored as SHA-256 hash of token.
Account lockout: 5 failures in 15min → 15min cooldown, exponential.

# CALLER IDENTITY
service_principals table seeded with:
  - svc:worker (token in env, used for worker→api calls)
  - svc:aiproxy (token in env, used for api→aiproxy calls)
  - svc:cron (used for nightly jobs)
  - webhook:async-form (HMAC verifier for /async/* submissions)

# DELIVERABLES
- /srv/mihwar/docker-compose.yml
- /srv/mihwar/api/ (FastAPI app, models, migrations, auth, identity)
- /srv/mihwar/web/ (Next.js scaffold, login page, theme toggle)
- /srv/mihwar/aiproxy/ (LiteLLM config, env)
- /srv/mihwar/worker/ (arq worker entrypoint)
- /srv/mihwar/.env.example with placeholders
- /srv/mihwar/Caddyfile (TLS, HSTS, CSP, security headers)
- /srv/mihwar/runbooks/ (incident.md, backup-restore.md, key-rotation.md)
- README.md with one-command bootstrap
- A green CI run, an opening commit, and a green Coolify deploy.

# DONE WHEN
Visiting https://mihwar.nmopartners.com presents the login page,
correct passphrase + TOTP yields an empty workspace UI, healthcheck
endpoint returns 200, structured JSON logs flow with request_id +
tenant_id + user_id on every authenticated line, and a sample
async-form GET returns a generic 404 for an unknown token.

Prompt 2 · Stage 1 Lab

Build Stage 1 of the Mihwar workflow: the Ideation Lab.

# UI
- Workspace shell with persistent sidebar (workspace list,
  current workspace, stage navigator).
- Stage 1 panel: three sub-panels — chat (left), 1-pager artifact (right),
  signoff bar (bottom).
- Theme toggle, EN/AR toggle.

# CHAT
- Streaming via SSE from /api/v1/workspaces/{ws}/stages/1/messages.
- Each turn enqueues a synchronous (not background) Sonnet call via aiproxy
  with prompt-caching on the system prompt + house style.
- Verify cache_read_tokens > 0 by turn 2; log it.
- Cap max_tokens at 1024 per turn.
- Persist every turn in messages with request_id, user_id, tenant_id.

# SYSTEM PROMPT (cached)
"You are a Socratic AI use-case interviewer for NMO Partners… [full prompt
in /srv/mihwar/api/prompts/stage1.md]"

# ARTIFACT (1-PAGER)
- Live-rendered structured object: USE_CASE, PAIN, USER, TODAY, TARGET,
  BLAST, INPUTS, DECISION_OWNER, OUT_OF_SCOPE.
- Updated incrementally as the conversation progresses (the AI emits
  structured updates which the renderer applies).
- Versioned on signoff. signoff button calls /api/v1/.../sign with a
  confirmation modal.

# CATALOG
Seed catalog_entries with 10 sample entries (provided in seed.json).
Stage 1 doesn't query the catalog yet; it's used in Stage 3.

# DONE WHEN
Ahmed runs a 30-turn Lab against a sample use case ("AI for our customer
voice line") and ends with a frozen v1 1-pager. Logs page shows every turn
joined by request_id. Cost view shows the lab session cost broken out.

Prompt 3 · Stage 2 Discovery + Org Profile

Build Stage 2 of the Mihwar workflow plus the Org Profile foundation.

# ORG PROFILE
- New table org_profiles, versioned, tenant-scoped, linked to clients.
- Field-level encryption (AES-256-GCM) for sensitive sections using a
  per-tenant DEK. DEK created on tenant creation, KMS-wrapped, stored as
  ciphertext in tenants.dek_wrapped. App decrypts in-memory per request.
- Settings UI under /workspace/{ws}/profile to edit; versioned on save.
- Display masking by default; reveal explicit, audit-logged.

# STAGE 2
- Discovery taxonomy seeded in questions table (CSV + script).
- "Filter questions" Haiku call: given the Stage 1 1-pager + Org Profile
  baseline, return the ~30 questions that need fresh answers.
- Stage 2 panel shows question list grouped by domain, each with:
  status (unasked / answered / sent-async / awaiting / blocking),
  inline answer, "send as async" button.
- /async/{token} endpoint:
  - validates token (single-use, time-limited, tenant-scoped)
  - renders a clean form with the one or two questions
  - HMAC-signs submissions
  - rate-limited
  - generic 404 on invalid/expired
- Readiness meter computed server-side; Stage 3 unlock blocked until
  blocking-set is empty (or consultant explicitly overrides with reason
  captured in audit_log).

# DONE WHEN
A Stage 2 round-trip works end-to-end: 5 questions answered live,
3 async links sent and submitted, readiness reaches 100%, Stage 3
unlocks. Org Profile updated from Stage 2 deltas with confirmation.
Logs page shows: async.issued, async.opened, async.submitted events
per token. No sensitive value appears in any log line.

Prompt 4 · Stage 3 Architecture + Blueprint v1 + Logs page

Build Stage 3 synthesis, the Blueprint compiler, and the operator Logs page.

# STAGE 3
- arq job stage3.synthesise:
  inputs = stage1_artifact, stage2_inventory, org_profile, catalog_rag(top_K=12)
  flow = aiproxy → claude-sonnet-4-6 with extended thinking, prompt-cached
  catalog snapshot. max_tokens=8192. Streams progress to the api which
  forwards via SSE.
- Output: structured JSON manifest with components, data-flow nodes,
  trade-offs, alternatives, open-questions, compliance-overlay.
- Auto-render SVG layered diagram + data-flow diagram from manifest.
- stage_artifacts.v1 stored on completion. Re-runs create v2, etc.

# BLUEPRINT
- /api/v1/workspaces/{ws}/blueprint/compile job:
  takes latest stage_artifacts, renders to a single HTML file using
  /srv/mihwar/web/templates/blueprint.html (server-side render with
  inlined CSS and inlined SVG). Manifest signed with Ed25519.
- Stored in blueprints table; downloadable + viewable in-browser.

# LOGS PAGE
At /admin/logs (gated by logs:read permission):
- Filters: time range, user_id, tenant_id, request_id, event class, level.
- "Trace request_id": joins all events with that request_id from api +
  worker + aiproxy logs into a chronological timeline.
- "Trace user_id": last N hours of all events.
- "Trace error reference": paste ERR-… → the full stack trace + context.
- Cost view: ai_calls aggregated per feature / tenant / user / day.
- Export to CSV (capped 10k rows; logs:export permission; audit-logged).

# DONE WHEN
A workspace progresses cleanly from empty → 1-pager → inventory → synthesis
→ Blueprint HTML download → walkthrough mode. The Logs page reconstructs
every step. Stage 3 synthesis costs < $10 per run with cache hit rate
> 80%.

Prompt 5 · Stage 4 Playbook + AR localisation + signing + dogfood

Wrap up Mihwar V1: Stage 4 (Build Playbook), full Arabic localisation,
Blueprint manifest signing, and dogfooding hooks.

# STAGE 4
- Five outputs: 6-week build plan, risk register, vendor short-list,
  reference repos pointer (Tier-3 only), RFP spec (optional).
- Tier flag on workspace controls which outputs are produced.
- Each output is editable in the UI before signoff.

# AR LOCALISATION
- Translate the UI shell using Mihwar AR pack (provided).
- RTL layout for AR mode (logical CSS properties; no physical
  margin-left/right).
- Blueprint render in AR uses Amiri for body, Plus Jakarta for
  numerals/code; the manifest carries language tags.

# MANIFEST SIGNING
- On Blueprint compile, the manifest JSON is canonicalised
  (RFC 8785 JCS), hashed (SHA-256), signed with Ed25519
  using BLUEPRINT_SIGNING_KEY. Public key embedded for offline
  verification.

# DOGFOOD HOOKS
- /admin/feedback inline form for Ahmed to log paper-cuts during
  the first real engagement; entries auto-tagged with the workspace
  and request_id at the moment.

# DONE WHEN
A full Tier-2 engagement runs end-to-end, EN and AR Blueprints both
render correctly, manifest verifies via the embedded public key, and
the engagement Blueprint is the first $25k delivery.
Working with Claude Code
For each prompt: open a fresh Claude Code session in the repo, paste the prompt, let it scaffold, then iterate in small commits. Don't run prompt 2 in the same session as prompt 1 — start fresh so context stays clean. Review every diff. Reject what doesn't match the masterplan; the model will accept the correction.
Part D · Build · New

Operations handbook

Mihwar runs as a single-operator service with a small team layered in over time. This is the day-to-day playbook: deploys, on-call, change windows, customer-facing incidents.

Deploy cadence

  • Active engagement weeks: deploy ≤ once per day, only outside the client's working hours (KSA 06:00 GMT+3 — 18:00). Ship-right-now-please reserved for security or correctness fixes.
  • Quiet weeks: trunk-based, ship as needed.
  • Schema migrations: reviewed in PR. Big-table migrations use CREATE INDEX CONCURRENTLY + chunked backfills. Never apply in the middle of a Stage 3 synthesis.
  • Release tagging: every prod deploy creates a tag vYYYY.MM.DD-HHmm-sha. Coolify retains last 5 deploys for one-click rollback.

On-call (V1)

Ahmed is on call 24/7 in V1. The job: respond to alerts within 2 hours during the working day, 4 hours overnight. Pager is Pushover on a personal device.

  • Sev-1 (data exposure / total outage during active engagement): drop everything.
  • Sev-2 (degraded service / non-critical alert): respond before next business day.
  • Sev-3 (cosmetic / forecast): triage in next standup with self.

Standard incidents — short playbooks

SymptomFirst 5 minutesResolution path
500s spikingCheck Logs page → top error references for the spike window. Identify offending endpoint.Rollback if regression; hotfix if data shape; communicate if external dependency.
aiproxy cost spikeLogs cost view → which feature, which tenant, which user.If runaway loop: kill jobs, raise max_iterations floor. If catalog cache miss: fix cache_control. If legitimate: confirm with consultant.
Worker DLQ fillingLogs page → DLQ events → root cause for the type of job failing.Fix and replay. If transient (Anthropic 5xx), wait + retry from DLQ.
Backup didn't fireCron status, disk, backup target reachability.Trigger manually. If recurring, ticket runbook fix.
Suspected key leakRotate the suspected key in Coolify (single command). Force logout all sessions.Audit log review for the exposure window. Communicate per DR table.
Async link mis-issued (wrong recipient)Revoke the token via POST /admin/async/revoke. Confirm not consumed.Re-issue to correct recipient. Audit trail captured.
Customer "I can't see my Blueprint"Logs by user_id → most recent compile event → status.If failed: reproduce in staging; if version mismatch: re-compile.

Change-management discipline

  • Every change to main goes through a PR with at least one reviewer (Ahmed reviews Claude Code's PRs; another consultant reviews Ahmed's, when one exists).
  • CI must be green: lint, types, tests, vuln scans, gitleaks.
  • Schema changes have a "rollback notes" section in the PR description. If rollback isn't safe, reviewer pushes back.
  • "Boring change" exceptions: copy edits, README, comment-only diffs — solo merge allowed.

Customer communication

For active engagement clients, communication is direct (Ahmed → CTO). For Phase 2 customers, a status page (status.mihwar.app) is published from V1's Day 1 even though it has nothing on it; this normalises the surface for when it matters.

  • Outages affecting a live engagement: client notified within 30 minutes by Ahmed directly.
  • Data exposure (any plausible): client notified within 24 hours; SDAIA within 72 hours if PDPL-qualifying.
  • Maintenance: 48h notice for non-trivial windows; mid-night by default.
  • Security advisories from upstream: reviewed and disclosed within 7 days if customer-affecting.

Adding a second operator

When NMO hires consultant #2, the handoff:

  1. SSH access via the WireGuard VPN (their key only; never share Ahmed's).
  2. Coolify dashboard read-only.
  3. Postgres DB user: scoped read of audit_log only; no app-write privileges.
  4. Pushover added to the alert routing.
  5. Tabletop runbook walkthrough — incident scenario end-to-end before they're on call.
  6. First 30 days: Ahmed reviews every PR. Second 30 days: reviewer + author rotates.

Quarterly operating cadence

  • Catalog review — prune, refresh, add. Documented "what changed and why" per quarter.
  • Engagement retro — every Blueprint shipped that quarter, what worked, what didn't, what feeds the catalog.
  • Cost review — actual aiproxy spend vs forecast; per-feature optimisations identified.
  • Security review — CVE sweep beyond CI gating, dependency pruning, access list audit.
  • Restore drill — annual minimum; quarterly preferred.
  • Forecast refresh — pipeline, capacity, burn — feeds Phase 2 trigger evaluation.
Part E · SaaS Phase

The SaaS path — Phase 2 overview

If V1 succeeds, Mihwar evolves from a consultant's cockpit into a self-serve platform clients run themselves. This section sketches what that looks like — the product, the billing, the go-to-market — so V1 architecture stays compatible.

When Phase 2 becomes real

Three triggers, any one of which validates the pivot:

  • Demand pull. ≥10 distinct prospects ask "can we get a Mihwar login" in any 6-month window.
  • Capacity ceiling. NMO's consultant team is fully booked, pipeline stronger than capacity, and adding consultants doesn't scale margin.
  • Catalog moat is mature. NMO's catalog reaches 300+ entries with quarterly review cycles, making it a deliverable in itself.

Until then, V1 stays disciplined. Phase 2 too early kills the consulting margin.

What changes

ConceptPhase 1Phase 2
Tenants1 (NMO)Many (each subscriber org)
AuthSingle passphrase + TOTPSSO (OIDC, SAML), invite-only first, public sign-up later
BillingEngagement invoices (manual)Stripe, per-seat or per-Blueprint, with metered overage
CatalogNMO's, used internallyNMO premium tier (read-only, paid) + customer-private tier (writeable)
TemplatesHard-coded NMO brandingPer-tenant theming, custom logos, optional white-label
Consultant roleDrives every engagementOptional 2-hour expert-review at Tier; otherwise self-serve
OperatorAhmedCustomer admin per tenant + NMO meta-admin
SupportEmail + WhatsAppIn-app chat, knowledge base, ticketed

What stays the same

  • The five-stage workflow.
  • The Architecture Gate.
  • The Blueprint format.
  • The catalog schema (just expanded with tiers).
  • The data model (multi-tenant from day one).
  • The Org Profile concept (becomes central).
  • aiproxy as single egress.
  • RLS at the database layer.
  • The Logs page.

Phase 2 build budget — sketch

Estimating from scratch when triggers fire:

  • Auth migration to OIDC + SAML: 2 weeks.
  • Stripe integration + billing pages: 2 weeks.
  • Self-serve onboarding flow + first-Blueprint guide: 2 weeks.
  • Embedded coaching surfaces (tooltips, "show me an example", inline catalog samples): 2 weeks.
  • Per-tenant theming + white-label: 1 week.
  • Catalog tiering enforcement + customer-private writes: 1 week.
  • Status page, public marketing site, pricing page: 1 week.
  • Polish, telemetry, beta program: 2 weeks.

Total: ~13 weeks (≈3 months) for Phase 2 v1, assuming V1 architecture has held the line. Funded by ~5 V1 engagements at $25k.

The risk of getting this wrong

  • Diluting the consulting brand. If Phase 2 launches before NMO has 10+ shipped Blueprints, the SaaS sells "AI strategy in a box" — a generic value prop that competes with $99/mo tools, not $25k engagements.
  • Self-serve UX without the discipline. If the gate doesn't survive contact with non-expert users, Mihwar's differentiator dies. Embedded coaching has to enforce the gate, not soften it.
  • Cost runaway. Phase 2 multiplies usage. Without the AI-economics discipline holding, margin collapses. See AI Economics.
  • Cross-tenant leak. Phase 2 is the moment one bad query becomes a regulatory event. The cross-tenant fence test must be ironclad before public sign-up opens.
The phasing principle
Phase 1 earns the right to Phase 2. The signals that earn that right are real money + real Blueprints + real testimonials, not hopes. When the triggers fire, the team that earned the consulting brand turns it into a product. That's the order.
Part E · SaaS Phase · New

The self-serve product spec

What changes about the experience when a client — not a senior NMO consultant — drives the workflow. The engine stays the same; the surfaces around it must compensate for the absence of the consultant in the room.

The core challenge

In Phase 1, the consultant interprets, refines, pushes back. In Phase 2, the user is alone with the AI. Without compensation, three failure modes appear:

  • Surface confusion. The user doesn't know what "blast radius" means and abandons the question.
  • Hallucinated confidence. The AI accepts a vague answer and produces a Blueprint that misrepresents the use case.
  • Gate erosion. The user is impatient, can't be talked through Stage 2 by a human, and either gives up or pressures Mihwar to skip it.

The compensations

1 · Embedded coaching

Each prompt in Stage 1 ships with three affordances:

  • "What good looks like" example — collapsible card showing a sample answer derived from a real (anonymised) past Blueprint.
  • Inline tooltip with a one-paragraph definition of the term.
  • "I'm stuck — coach me" button that asks Haiku for a context-aware question rephrasing or a suggestion of who in their org to ask.

2 · Org Profile-driven personalisation

The Profile is the engine of the self-serve experience. A user who's been on Mihwar 6 months has a Profile that pre-fills 70%+ of every Stage 2 they touch. The third Blueprint they make takes a third of the time of the first.

3 · Smarter gate-enforcement

The gate adapts when the consultant isn't there. Instead of "go ask your DBA", Mihwar offers:

  • A pre-filled email template ("Here's the question I need answered. Forward to your DBA.").
  • An "invite a colleague to fill this slice" link — a workspace member at limited role.
  • A "skip with admission" path: the user can mark a question "I don't know and have no way to find out" — the architecture is then synthesised with that explicitly noted in the Blueprint as an open question, not silently glossed.

4 · Optional expert review

For Tier "Team" and above, the user can pay for a 2-hour NMO expert review of their Blueprint draft before signoff. The reviewer reads the Blueprint, leaves margin notes, has a 30-minute call with the user, signs off the result with NMO's seal. This is the bridge between self-serve and consulting — and a high-margin upsell.

The onboarding flow

  1. Sign-up — email + workspace name. Email verified.
  2. SSO setup (Team+ only) — connect Microsoft Entra / Google / Okta.
  3. Org Profile wizard — guided 8–12 question version of the Profile (full version takes longer; the wizard captures the high-leverage stuff first).
  4. "Your first Blueprint" guide — a guided Stage 1 with extra coaching density.
  5. Catalog browse — the user reads through the NMO catalog, picks 5 entries to "favourite" (drives recommendation tailoring).
  6. First Blueprint generated — at this point, normal coaching density resumes; user is "on board".

Workspace roles (Phase 2)

RolePermissions
OwnerWorkspace admin, billing, can invite, can delete.
EditorRun stages, edit artifacts, request signoff.
ReviewerRead-only access plus comment on artifacts.
ContributorLimited access — fill assigned Stage 2 slices, no Stage 3 access.
NMO Reviewer (paid)External NMO consultant invited for Tier expert review.

Public Mihwar product surfaces

  • mihwar.app — public marketing site, pricing, sign-up.
  • app.mihwar.app — the application itself.
  • status.mihwar.app — public status page.
  • docs.mihwar.app — documentation, video walkthroughs, tutorials, sample Blueprints.
  • nmopartners.com/mihwar — Phase 1 cockpit URL preserved for NMO's continued internal use.

What the Phase 2 user can't do

  • Skip the gate without admission.
  • Override the catalog to recommend an arbitrary vendor.
  • Generate a Blueprint with sensitive client data they're not authorised on (workspace permissions enforced).
  • Export beyond their tier's monthly cap without an upgrade prompt.
  • Bypass NMO's premium catalog (read-only) — they can add private entries, not edit NMO's.
The product principle
Phase 2 is not "Mihwar with the consultant ripped out". It is "Mihwar where the discipline of the consultant is encoded into the surface." Coaching, gating, expert-review upsells — these are how the discipline survives self-serve.
Part E · SaaS Phase · New

Billing & tiers

Phase 2 plans, what each includes, and the metering that makes them work without runaway cost.

The four plans

PlanPriceAudienceIncludes
Starter $1,200/mo or $9,600/yr Single AI champion at a mid-market enterprise 1 workspace · 3 Blueprints/yr · premium catalog read-only · EN/AR · email support · standard branding
Team $3,500/mo 5-seat AI office 5 seats · unlimited Blueprints (within fair-use cost cap) · custom branding · SSO · 1 expert review/qtr included · priority support
Consultancy $25k/yr + per-Blueprint Boutique AI shops licensing Mihwar for their clients White-label · multi-client workspaces · customer-private catalog tier · NMO catalog as premium · API access · per-Blueprint metering ($150 each beyond 50/yr included)
Enterprise Custom (from $80k/yr) Large org with strict residency / SSO / audit needs Dedicated tenant in-region · BYO IDP · audit export · contractual residency · SLA · dedicated support

Metering

Two meters, both implemented atomically in Redis:

  • Blueprint count — incremented at compile success. Reset on plan period (monthly or annual).
  • aiproxy cost — incremented on every aiproxy call by the call's cost_usd. Soft warn at 80% of plan-implied budget; hard refuse at 110% (the 10% headroom prevents incidents from a clean compile failing for a single dollar).

Stripe integration

  • Stripe customer = tenant. One subscription per tenant.
  • Subscription items: base plan + metered overage components (per-Blueprint for Consultancy, per-seat for Team).
  • Payment failures handled per dunning rules: grace 7 days, then suspend writes (reads still allowed for export/download), then suspend reads after 30 days, then hard offboard after 90 days with the tenant-deletion procedure.
  • Invoices include the cost-meter window report (with rounding) — transparency feeds trust.

Promo / pilots / trials

  • Free 14-day pilot — Starter-tier features, capped at 1 Blueprint. Requires email + light KYC for KSA buyers.
  • Conversion incentive: 30% off year-1 if a pilot user converts within 30 days.
  • NMO-introduced clients get a Concierge code that bundles Starter for 6 months at $0 if they signed an engagement. This protects the consulting margin while letting Phase 2 build the case-study set.

Cost-to-serve modelling

PlanExpected Blueprints/yrAI costOther costMargin at sticker
Starter3~$60–$100~$120 (infra share, support)~97%
Team~20~$400–$700~$1,800 (incl. 1 expert review)~94%
Consultancy50–150~$2,500–$5,000~$3,000 (white-label support)≈75–80%
Enterprisevariesvariesdedicated infra share + named CSM≈60–70%

Margins look generous — they assume V1's AI economics discipline survives. Without prompt caching, batch usage, two-tier model selection and tenant cost caps, those numbers degrade fast. See AI Economics.

Phase 2 invoicing & tax

  • VAT applied per KSA rules (currently 15%) for KSA-resident buyers.
  • Withholding tax handled per buyer's regulatory regime — captured in onboarding KYC.
  • Currency SAR primary; USD available for international.
  • Receipts and invoices generated automatically; archived for 10 years per local accounting standards.
Cost flag Cost
The Consultancy and Enterprise tiers can become loss-makers fast if a single sub-tenant of a Consultancy buyer drives extreme cost. Per-sub-tenant caps inside Consultancy plans are designed-for in V1's data model and turned on at Phase 2 launch.
Part E · SaaS Phase · New

Phase 2 go-to-market

How Mihwar opens to the public when the trigger fires. A 12-week launch plan from "trigger met" to "first 20 paying tenants."

Pre-launch — weeks 1–4

  • Build (per SaaS Path): auth, billing, onboarding, embedded coaching, per-tenant theming.
  • Marketing site: mihwar.app pricing page, 5–7 case studies from V1 engagements (anonymised), bilingual.
  • Documentation: 5 explainer videos (one per stage), 20 KB articles, sample Blueprint downloads.
  • Sales motion: NMO's existing pipeline gets the first invitation; conversion incentive offered.

Beta — weeks 5–8

  • Closed beta: 8–12 invited users from prior NMO engagements + 4 boutique AI consultancies. Free Starter for the duration.
  • Weekly user-feedback sessions. Each session feeds catalog updates and UX patches.
  • Beta SLA: 4-hour response on issues; each beta user has a direct line to Ahmed.
  • Beta exit criteria: at least 8 Blueprints generated by users without intervention; cross-tenant fence test passes; cost-per-Blueprint within model.

Public launch — weeks 9–12

  • Signup opens. Concierge code reserved for NMO-introduced clients (preserves consulting margin).
  • Cohort 1 marketing push: KSA tech press, LinkedIn, Vision-2030-aligned content.
  • NMO email list (estimated ~600 senior-CTO contacts) gets a launch announcement with case studies.
  • Pricing live; 14-day pilot live.
  • Weekly cohort check-ins: monitoring activation rate, time-to-first-Blueprint, cost-per-Blueprint, support volume.
  • Status page goes from "internal" to public.

Acquisition channels

ChannelPhase 2 fitEffortCAC ceiling
NMO existing pipelineHighest — warm, in-marketLow~10% of ARR
LinkedIn thought leadership (Ahmed)Direct — KSA AI champions followMed~15% of ARR
SDAIA / Vision-2030 conferencesStrong for government tierHigh~25% of ARR
Boutique AI shops (Consultancy plan)Two-sided lever — they bring their clientsMed~30% of ARR
Public docs & SEOLong compounding — start day 1Med~5% of ARR
Paid adsAvoid in V1 — low intent

Phase 2 success metrics

MetricMonth 3 targetMonth 12 target
Paying tenants20120
ARR$200k$1.5M
Activation rate (sign-up → first Blueprint)40%60%
Time-to-first-Blueprint≤14 days≤7 days
Net revenue retention≥110%
Avg cost-per-Blueprint≤$30≤$25
NPS (customer survey)≥40≥55
Phase 1 → Phase 2 cannibalisation<10% engagement loss0% (Phase 1 is now upsell)

Phase 2 ↔ Phase 1 relationship

Phase 2 isn't a replacement for Phase 1 — it's a complement. Mihwar's full motion at year-2 looks like:

  • Top of funnel: Self-serve Starter customers explore AI use cases inside Mihwar. Catalog ships them to a $25–60k engagement when their use case is ambitious.
  • Mid-funnel: Team customers run Blueprints solo, occasionally pay for an expert-review upsell.
  • Top of value: Consultancies licensing Mihwar drag NMO into joint engagements at the strategic layer; NMO Apex picks up the build.
  • Bottom of value: Enterprise tier funds the dedicated-infrastructure and compliance roadmap, which strengthens every other tier.
The right ordering
Phase 1 builds the brand. Phase 2 turns it into volume. Both halves are needed; neither half is sufficient. The masterplan is built for both.
Part F · Operations

Success metrics

What "Mihwar is working" actually means, measured in numbers Ahmed can read off a dashboard. Lead measures predict business outcomes; lag measures confirm them.

Three tiers of metrics

Mihwar's metrics fall into three tiers. Tier 1 is the only one Ahmed checks daily. Tier 2 is reviewed weekly. Tier 3 is the quarterly retrospective.

Tier 1 — Business outcomes

MetricV1 targetWhy it matters
Engagements signed per quarter5+ by Q3The revenue line. Below 3 means Mihwar isn't shifting deals.
Blueprint price realised (avg)$20k+Below this, NMO is competing on price not on quality.
Conversion: Blueprint → Build≥30%The most important number. Mihwar's whole thesis. Below 20% means Blueprints aren't selling next-stage work.
Margin per Blueprint≥65%Engagement P&L test. Below this, the tool isn't compressing time enough.
NPS from Blueprint recipients≥50Survey delivered 30 days after Blueprint signoff. Drives word-of-mouth referrals.

Tier 2 — Operating health

MetricV1 targetWhy
Time-to-Blueprint≤7 working daysThe core promise. Engagements that overrun erode the value proposition.
Stage 2 → Stage 3 cycle time≤4 daysDiscovery is the bottleneck Mihwar exists to fix. Trend down.
Catalog growth rate+5 entries/monthCompounding IP. Stagnant catalog means Mihwar isn't learning.
aiproxy cost per Blueprint≤$30 (P50), ≤$60 (P95)Margin discipline; verifies prompt caching, two-tier model, batch.
aiproxy cache hit rate≥80%Direct verification of AI Economics discipline.
Stage signoff rework rate<15%Stages reopened after signoff. Above 15% means the AI's output isn't trustworthy.
Async response rate≥70% within 7 daysIf async forms aren't being filled, Stage 2 stalls.
Uptime (rolling 30 days)≥99.5%Engagements get cancelled by 12-hour outages.
Backup success rate100%Anything below 100% is a Sev-2 incident.
5xx rate<0.5%Above this, the Logs page becomes the daily site.

Tier 3 — Quarterly health

  • Catalog freshness: ≥80% of entries reviewed within last 6 months.
  • Engagement retro coverage: 100% of completed Blueprints retro'd within 14 days.
  • Cross-tenant fence test: green on every CI run; deviations = Sev-1.
  • Restore drill: performed at least once per quarter; documented timing.
  • Security review: dep audit, key rotation, access review.
  • Phase 2 trigger evaluation: demand-pull count, capacity utilisation, catalog size.

The dashboard

Part F · Operations

Risks

What could go wrong, ranked by likelihood × impact, with concrete mitigations. The discipline of writing risks down is half the mitigation.

Engagement risks

R-1 · Client refuses to do discovery HIGH × HIGH

The CTO wants the architecture deck now and is impatient with Stage 2 questions. This is the #1 expected friction point.

Mitigation: Sales script up-front: "We do discovery before architecture. That's not negotiable. It's why our deliverable doesn't fall apart in your procurement committee." If a client truly won't do Stage 2, NMO walks. The gate is the product.

R-2 · Client gives wrong inventory data MED × HIGH

Stage 2 captures what the client believes their environment looks like. Reality occasionally diverges. Architecture lands, build starts, surprise.

Mitigation: Stage 2 captures source-of-truth pointers (DBA name, dashboard URL) for every claim. Every architecture component cites its inventory source. Trade-offs section explicitly flags assumptions. Build phase starts with a 1-day "validate Stage 2" sprint.

R-3 · Blueprint conversion to build below target MED × HIGH

Below 20%, the whole productisation thesis weakens.

Mitigation: 30-day post-Blueprint follow-up call mandatory. Common conversion blockers tracked, fed back into Stage 4 templates and the catalog. NPS survey identifies dissatisfaction before it becomes lost revenue.

Product / technical risks

R-4 · Cross-tenant data leak LOW × CATASTROPHIC

A bug in a query lets one tenant see another's data. In Phase 1 this is one bug; in Phase 2 it ends the company.

Mitigation: RLS at DB layer + tenant_id in every app query (defence in depth). Cross-tenant fence test in CI on every commit. Schema-per-tenant for Enterprise tier. Manual audit of every new query that joins multiple workspace_id values. See Multi-Tenancy.

R-5 · Anthropic API outage during a live engagement LOW × HIGH

Stage 1 is mid-conversation; Stage 3 is mid-synthesis; Anthropic returns 5xx for an hour.

Mitigation: aiproxy retries with exponential backoff. Background jobs survive transient failures via DLQ. Stage 1 chat shows a "service temporarily unavailable" banner without losing draft state. Multi-region key support designed-for; failover provider candidacy reviewed quarterly.

R-6 · Cost runaway MED × HIGH

A bug or pathological prompt drives 10× expected aiproxy spend.

Mitigation: Hard tenant + per-feature daily caps in aiproxy. Cost-spike alert at $10/h sustained. Logs cost view drives same-day diagnosis. Budget gate on agentic loops. See AI Economics.

R-7 · Prompt injection from client document LOW × MED

Client pastes a contract; the contract carries a hidden instruction to exfiltrate data via a tool call.

Mitigation: Untrusted input wrapped in delimited blocks. Tools require human-in-the-loop confirmation for any side effect. Outbound allowlist blocks unauthorised destinations. Output guardrail rejects unexpected tool calls. See Client Security.

R-8 · House-style drift MED × MED

Over 3–6 months the Blueprint voice creeps toward generic LLM tone — exclamation marks, "I'd love to help!", emoji.

Mitigation: Banned-phrases filter in aiproxy rejects offending output. Quarterly Blueprint review by Ahmed catches subtle drift. House-style prompt versioned and updated based on observed regressions.

Business risks

R-9 · KSA market consolidates around large suppliers MED × HIGH

Vision-2030 procurement vehicles favour mega-vendors; boutique consultancies are squeezed out of preferred-supplier lists.

Mitigation: Speed of Mihwar's deliverable (7 days) gives NMO an entry point that mega-vendors cannot match. The Consultancy Phase 2 plan turns the squeeze into an opportunity (boutiques license Mihwar). Government tier pursued via Tier 2 Playbook + RFP spec deliverables.

R-10 · Phase 2 cannibalises Phase 1 LOW × MED

Self-serve Phase 2 erodes the perceived value of $25k consulting engagements.

Mitigation: Phase 2 priced for the segment that wouldn't have engaged a $25k consultant anyway. Concierge codes preserve the engagement margin for NMO-introduced clients. Expert-review upsell within Phase 2 funnels into Phase 1 work.

R-11 · Single-operator key-person risk HIGH × HIGH

Mihwar is one Ahmed. Anything happening to Ahmed is an existential risk.

Mitigation: Hire consultant #2 by Q3 (operational redundancy). Documented runbooks for every system surface. Backup passphrases sealed-envelope held by trusted party. Insurance review.

R-12 · Regulatory change (PDPL / SDAIA) LOW × MED

New PDPL implementing regulation tightens residency or processing rules.

Mitigation: Multi-tenancy levels 5 + 6 (dedicated schema / dedicated DB / in-region) designed-for. aiproxy abstracts model provider. Phase 2 sovereign tier ready as escape hatch for regulated clients.

Risk review cadence

This list is reviewed quarterly. Risk status (likelihood × impact) is re-rated. New risks added; resolved risks moved to an archive. Any risk that goes up in either dimension drives a same-quarter mitigation plan, not a "we'll think about it" slot.

Part F · Operations

Monday morning

The masterplan is real only when the first action is taken. This page lists, in order, the concrete actions Ahmed takes in the first working week to turn this document into a running app.

Day 1 · Decisions & domain

  • Re-read this masterplan in one sitting. Note any disagreement. Edit before moving.
  • Confirm the domain: mihwar.nmopartners.com for Phase 1; reserve mihwar.app for Phase 2.
  • Provision the Hostinger KVM VPS (4 vCPU, 16 GB RAM minimum). Point DNS.
  • Create empty private repo Arcahmed93/mihwar. Apply branch protection on main.
  • Open Anthropic + Voyage accounts. Generate scoped API keys. Store in 1Password.
  • Order the Pushover license. Configure on Ahmed's phone.
  • Identify the first paying client to dogfood with — secure the pre-engagement agreement so Week 6 has a real engagement on the tool.

Day 2 · VPS hardening

  • Provision non-root user. Disable password SSH. Move SSH port. Apply fail2ban.
  • UFW: default deny, allowlist :443 + custom SSH + WireGuard.
  • Set up WireGuard, install on Ahmed's laptop and phone.
  • Install Coolify (over WireGuard endpoint).
  • Generate Ed25519 Blueprint signing key; archive private half securely; embed public half in repo.
  • Configure off-region object-storage backup target with encryption passphrase.

Day 3 · Run Prompt 1

  • Open fresh Claude Code session in /srv/mihwar/.
  • Paste Prompt 1 from Claude Code Prompts. Steer through scaffolding.
  • Review every diff. Reject what doesn't match the masterplan.
  • Commit, push, watch CI go green, watch Coolify deploy.
  • Visit https://mihwar.nmopartners.com. Login page renders. Auth works. Healthcheck green.
  • Verify nightly backup fires (manual trigger to test).
  • Verify outbound allowlist blocks an unauthorised destination (try curl from a container — should fail).

Day 4 · Run Prompt 2

  • Fresh Claude Code session. Paste Prompt 2.
  • Review the workspace shell + Stage 1 Lab implementation.
  • Run a sample Lab against a placeholder use case. Verify cache_read_tokens > 0 by turn 2.
  • Verify house-style filter rejects "I'd love to help!" if it appears.
  • Verify Logs page shows the conversation joined by request_id.
  • Verify ai_calls table is populating with cost data.
  • Commit, deploy, dogfood with one warm prospect over Zoom — capture friction.

Day 5 · Plan Week 2 & communicate

  • Triage the friction list from the dogfood Lab.
  • Land 3–5 quick fixes; defer the rest.
  • Send a one-paragraph update to NMO's mailing list: "Mihwar V1 is being built; first engagements available from Week 6." Soft-book first paying engagement.
  • Schedule Day-1-of-Week-3 Prompt 3 session.
  • Restore drill: pick yesterday's backup, restore into a sandbox container, smoke test, document timing.
  • Pre-commit master checklist (top of doc): walk every applicable box; flag any that won't be met by Week 6.

By end of Week 6

  • First paying engagement Blueprint shipped on a real client.
  • External penetration test booked for Week 8.
  • Catalog has 80+ entries seeded from the AI Ecosystem Primer.
  • Logs page operational; one real ERR-… reference traced end-to-end.
  • Cost-per-Blueprint inside the $30 cap.
  • Cache hit rate ≥ 80%.
  • Backup + restore drill green.
  • Phase 2 trigger log started — first prospect that asks "can we get a login" gets recorded with date.

The first 90 days

  • Days 1–42: Build & ship V1 (the 6-week roadmap).
  • Days 43–60: Ship 2 more engagements at full price. Refine catalog, fix paper-cuts.
  • Days 61–90: Ship 2 more. Run the first quarterly catalog review. Begin Phase 2 trigger watching in earnest.
محور · يدفع المحادثة من "نريد الذكاء الاصطناعي" إلى مخطط ${"$"}25,000 موقّع في أسبوع عمل واحد. Mihwar — pivots the conversation from "we want AI" to a signed $25k Blueprint in a working week.
The masterplan ends here, the build begins now
This document is v2 — Phase 1 + Phase 2, security baked in, AI economics modelled, observability shipped, the Org Profile concept that stops Phase 2 being an unbearable infrastructure quiz, the operator Logs page that lets one person run this at 2am six months from now. Everything that needs to be true on day one of the build is on these pages. The rest is execution.