Executive Summary · At a Glance

What Mihwar is — and where it goes from here

Mihwar (محور · "pivot, axis") is the operating system for an AI consulting practice. It ships in three real versions and one speculative one. V0.1 is the personal MVP — a single page where Ahmed types an idea and Mihwar returns a stack-aware build playbook in minutes. V1 productizes the same engine as a 5-stage cockpit NMO uses to ship $25,000 client Blueprints. V2 opens it to clients as a self-serve SaaS. V-future is a marketplace bet kept alive only in the architecture.

First ship

V0.1 · 3–7 days

personal idea compiler

Then

V1 · 6 weeks

consultant cockpit · $25k Blueprints

V1 cycle

≤7 days

workspace → signed Blueprint

Backup

Daily · off-server

encrypted · restore-tested

One-sentence pitch

Mihwar in a sentence

Mihwar turns ideas into stack-aware build playbooks — first for Ahmed himself in a single page (V0.1), then for NMO consulting teams as a $25,000 Blueprint engine (V1), then for clients as a self-serve SaaS (V2). One engine, three operators in sequence, no rewrite between them.

محور · من فكرة شخصية إلى مخطط بناء في صفحة واحدة، ثم محرك استشاري بـ ٢٥ ألف دولار، ثم منصة ذاتية الخدمة. نفس المحرك، ثلاثة مشغلين. From personal idea to client SaaS — one engine, three operators in sequence.

The versions, at a glance

Mihwar ships in four layers. Each inherits the one below. V0.1 is the next thing to build — the personal MVP that closes the loop in days. V1 productizes it for NMO consulting in 6 weeks after V0.1 validates. V2 unlocks when clients pull. V-future is a bet kept alive only so today's architecture doesn't foreclose it.

What's in each version

V0.1 ◉ MVP · ships first

Idea Compiler

A single page where Ahmed types an idea; Mihwar returns a stack-aware build playbook with explicit local-vs-cloud flags and agent assignments — in minutes.

OperatorAhmed only · single user · zero auth surface

InputFree-form idea + pre-loaded Operator Profile (your stack: VPS, agents, services, APIs)

OutputBuild playbook — architecture, agent assignments, sequenced steps, local/cloud breakdown, cost estimate, risks

Local/cloudEvery component flagged 🔵 LOCAL (your VPS) · ☁ CLOUD (3rd-party / API) · 🌗 HYBRID

Cycle1 idea → 1 playbook in <5 minutes (single LLM call)

WorkflowSingle-shot · structured output · no multi-stage gating

StackNext.js single page · Anthropic API · Operator Profile in JSON · no Postgres yet

Build effort3–7 days · one Claude Code prompt · displaces nothing in V1's design

Why firstDogfood the engine on your own ideas before selling V1 as $25k Blueprints — validates the loop in days, not weeks

StatusSpecced · ready to build

V0.1 spec → Roadmap →

V1 ● After V0.1 · 6 weeks

Consultant Cockpit

The cockpit NMO Partners uses to run client engagements end-to-end.

OperatorNMO Partners — Ahmed today, the growing team from Month 4

TenancySingle tenant in operation · multi-tenant in the data model from day one

Deliverable$25,000 Blueprint · single self-contained HTML · bilingual EN/AR · signed manifest

Cycle≤7 days from kickoff to signed Blueprint · target ≥30% conversion to build

Workflow5 stages — Lab → Discovery → Architecture → Playbook → Handoff

StackPostgres · Redis · FastAPI · Next.js · Hostinger VPS · Coolify · Traefik

SecurityWireGuard admin · per-tenant DEK · audit trail · signed exports · self-isolated from Apex

BackupDaily off-server encrypted · restore drill on Day 5 · drill repeated quarterly

Cost target≤$30 in API spend per Blueprint · cache hit rate ≥80%

StatusDesigned · queued behind V0.1 masterplan live at this URL

Two-Phase Strategy → Stages 1–5 → 6-Week Roadmap → Self-Security →

V2 ◐ Triggered · post-V1

Client Platform · SaaS

Same engine, exposed to clients. Self-serve AI visioning inside the client's organisation.

OperatorClients (CTO / Head of AI) — self-serve, no NMO consultant in the loop

TenancyMulti-tenant · per-tenant DEK · row-level security · cross-tenant isolation tested

DeliverableSame Blueprint format · per-tenant branding · same signed manifest

PricingPer-seat OR per-Blueprint subscription · billing tier preview lives in §SaaS Billing

New vs V1Org Profile · public sign-up · billing · branding · embedded coaching · annual profile review

Trigger10+ inbound prospects asking for self-serve access (logged starting Week 6)

ReusesEngine · catalog · 5-stage workflow · Blueprint format · security model · logging

BackupPer-tenant DEK · same daily off-server cadence · per-tenant restore-tested

StatusDesigned not yet building · waiting for pull signal

SaaS Path → Self-Serve Product → Billing & Tiers → Phase 2 GTM →

V-future ○ Speculative · year 2+

Federated Catalog & Build Bridge

Speculative. Not committed. Listed so the architecture doesn't foreclose it.

IdeaCurated catalog opens to partners · Blueprint → Build handoff to NMO Apex agents

Why laterNeeds ≥25 V2 tenants and a separate, mature product (Apex) before it earns attention

RiskMarketplace dynamics are hard · unfair to commit to before V2 data exists

StatusA bet · not a plan · revisit after first 25 V2 tenants ship

Plan honesty

V0.1 is the next thing being built — the personal MVP that proves the loop in days. V1 productizes the same loop into the 5-stage Blueprint engine after V0.1 validates. V2 is fully designed but conditional on inbound pull — capital is not burned building it before clients ask. V-future is on this page only because today's architecture decisions should not foreclose it. That's the honest version of the roadmap.

The four mental tests Mihwar is built against

Every implementation choice in this document — every endpoint, every query, every prompt — is checked against four questions. They appear as flags throughout the rest of the masterplan.

1×

Scale

Would this still work at 100× current load?

2×

Security

How could this be abused by a hostile actor?

3×

Observability

Could this be investigated at 2am, six months from now?

4×

Economics

Affordable at 100× usage? Cost per user per month?

Where to read next

Six entry points, depending on what you came here for:

If you want the why

Vision & Promise

The product story. Who needs it, why now, what it replaces.

Open Vision →

If you want the how

Architecture & Stack

System architecture, data model, multi-tenancy, security boundary.

Open Architecture →

If you want when

6-Week Roadmap

Day-by-day plan from VPS Day 0 to first signed engagement.

Open Roadmap →

If you want what could go wrong

Risks & Mitigations

12 named risks, mapped to mitigations baked into the plan.

Open Risks →

If you want the SaaS picture

Phase 2 & SaaS

What V2 is, when it triggers, how it sells, how it bills.

Open SaaS Path →

If you start Monday

Monday Morning Actions

The 7-day plan from "approved" to "first prompt running on the VPS".

Open Monday →

This is a one-screen view

Every claim above is unpacked, justified, and made falsifiable in one of the 30+ sections that follow. Use the sidebar.

Last page← Monday Morning NextVision & Promise →

V0.1 · Personal MVP · Ships First

V0.1 · The Idea Compiler

One page. You type an idea. Mihwar returns a stack-aware build playbook — what to build, which of your agents owns each step, what runs locally on your VPS, what runs in the cloud. Three to seven days from spec to live URL. The personal MVP that ships before V1, dogfoods the engine on Ahmed's own ideas, and earns the right to build the rest.

Build time

3–7 days

single Claude Code prompt

User cycle

≤5 min

idea → playbook

Operator

1 (you)

no auth · no tenancy

Cost / playbook

≤$0.50

single Sonnet call · cached profile

Why V0.1 exists

V1 is six weeks of build before the engine compiles its first idea into a deliverable. That's too long to validate the core loop. V0.1 compresses everything that matters about V1 into a single page that Ahmed uses on himself, every day, on whatever idea is on his mind that morning. If the playbooks it produces are useful, V1 is worth building. If not, the masterplan changes before any client sees it.

The loop V0.1 closes

You have an idea → Mihwar grounds it in your actual stack → Mihwar returns a build playbook with local/cloud flags and agent assignments → you build it (or you don't) → next idea. Same loop V1 closes for clients, just one tier up the abstraction ladder.

The single page

Input — what the page asks for

Idea field — free-form textarea. "I want to build X for Y reason." Two to five sentences typical. No structure forced.
Operator Profile — pre-loaded from a JSON file. The user does not re-enter it every time. Edits happen once via a settings page (Phase 2 of V0.1, optional).
Optional context — paste a link to an existing repo, or a one-line preference like use Next.js, not Astro.

Output — the build playbook

Section	What it contains	Why it's here
1 · Idea summary	One sentence reflecting the idea back, plus the success metric implied by it.	Confirms the engine understood. Catches misreads early.
2 · Architecture	Component list. Each component flagged 🔵 LOCAL · ☁ CLOUD · 🌗 HYBRID. Includes runtime, storage, queue, frontend, observability.	Local/cloud is the whole reason V0.1 exists. Surfaces decisions that change cost, latency, and sovereignty before you build.
3 · Agent assignments	For each build step, the suggested agent from your roster (e.g. `PM`, `Dev-1`, `VPS Admin`, `Cyber`). External tasks (e.g. domain registration) flagged as human-only.	Lets you forward sections of the playbook directly to the agent that will execute them.
4 · Build sequence	Ordered steps with effort estimate (≈hours), dependencies, and "definition of done" per step.	So "build playbook" doesn't mean "vague TODO list". You can start work after reading.
5 · Cost estimate	Monthly cost split by 🔵 LOCAL (sunk · already paid via VPS) and ☁ CLOUD (per-API / per-month). Worst-case at 100× usage.	Mental-test #4 (Economics) baked in from idea zero.
6 · Risks & unknowns	3–5 named risks with likelihood and mitigation. Explicit "what we don't know yet" list.	Prevents the playbook from feeling more confident than it should.

The local-vs-cloud flag system

Every component in the architecture section gets one flag. The flag is the whole point of V0.1.

🔵 Local

Runs on your VPS

Postgres in a container, Coolify-managed services, n8n flows, files on disk, the cron that runs daily backups. Sunk cost — no per-call billing. Sovereignty stays with you.

☁ Cloud

External SaaS or API

Anthropic API calls, GitHub repos, Linear issues, Stripe payments, S3-compatible backup target, third-party SMTP. Per-call billing. Scales without your hardware.

🌗 Hybrid

Local with cloud fallback

Local Ollama with Anthropic API failover, on-VPS embedding model with cloud fallback for spikes, local SMTP relay routed through SES on volume. The pragmatic default for variable load.

Operator Profile — Ahmed's edition

The Operator Profile is the JSON Mihwar reads as static context on every call. It's pre-loaded for Ahmed; future operators (NMO consultants, then clients in V2) will get their own.

Section	Examples	Update cadence
Infrastructure	Hostinger VPS · Coolify · Traefik · Postgres available · Redis available · WireGuard admin	Annual or on stack change
Cloud APIs	Anthropic API · OpenAI fallback · GitHub · Linear · n8n · Hostinger DNS	On API key rotation
Agents available	The agents in your Apex roster — PM · Productizer · VPS Admin · Dev-1 · Dev-2 · Data Sci · Cyber · HR · Marketing	On agent roster change
Personal preferences	Stack defaults (Next.js · FastAPI · Postgres) · auth library · deploy tool	Whenever taste changes
Constraints	VPS RAM cap · cost cap per project per month · regions allowed · sovereign-cloud requirement	Annual
Existing assets	Sibling products on the same VPS (e.g. Apex, n8n) · their networks · domains owned · wildcard cert availability	On infrastructure change

Sibling-product safety Security

The Operator Profile names sibling products on the same VPS so V0.1 can avoid collisions (port, network, domain). It does not share filesystems, networks, or credentials with them. Mihwar V0.1 generates plans that respect the existing isolation boundary; it never proposes touching sibling-product internals.

Architecture · what V0.1 itself looks like

Layer	What	Flag
Frontend	Single Next.js page · textarea + submit · renders the returned playbook as styled HTML	🔵 Local · runs in your VPS container
Backend	One Next.js API route or FastAPI endpoint · receives idea + reads Operator Profile from disk · calls Anthropic API · returns structured playbook	🔵 Local app · ☁ Anthropic call
LLM	Claude Sonnet 4.6 · structured output (JSON schema) · prompt-cached static prefix (system + Operator Profile)	☁ Cloud (Anthropic API)
Storage	Operator Profile lives in a single JSON file on disk · past playbooks saved as HTML files in a folder · no database	🔵 Local
Auth	Behind WireGuard / IP allowlist — Ahmed only · no login UI · no tenancy logic	🔵 Local
Observability	Per-call log to a JSON file: model · input tokens · output tokens · cache_read_tokens · cost_usd · request_id · timestamp · idea hash	🔵 Local
Cost ceiling	Hard cap: ≤$0.50 per playbook · monthly soft alert at $20 spend (it'll never get close)	☁ Anthropic billing

Build effort — 3 to 7 days, one Claude Code prompt

Day 1 — Scaffold Next.js page, paste Operator Profile JSON, hardcode a sample idea, get the LLM call working with structured output.
Day 2 — Wire the form, render playbook sections, style the local/cloud flags, add the cost-log JSON file.
Day 3 — Prompt caching on the static prefix, verify cache_read_tokens > 0 on call #2, tighten the prompt to enforce the 6-section output.
Day 4–5 — Dogfood on five real ideas Ahmed has been sitting on. Observe where the playbook gets vague, sharpen the prompt, refine the Operator Profile.
Day 6–7 — Deploy to https://mihwar.nmopartners.com/v01 behind WireGuard. Add a tiny TOC of past playbooks. Decide whether V1 is worth building based on whether you used V0.1 daily.

What V0.1 inherits to V1

The loop — idea → grounded playbook → execute. V1's 5-stage workflow is the same loop with explicit gates.
Operator Profile concept — V1 generalises it: each consultant has one, each client (in V2) has an Org Profile that is the same idea at the buyer level.
Local/cloud flagging — promoted to a first-class field in V1's Blueprint format. Clients get the same colour-coded breakdown.
Agent assignment language — V1's Stage 4 (Playbook) inherits the same agent-references vocabulary.
Cost-log shape — the JSON written per call in V0.1 becomes the schema for the ai_calls table in V1's Postgres.
Sonnet + prompt caching — V0.1 proves the cache hit rate behaviour before V1 scales it across five distinct prompts.

What V0.1 explicitly is not

Not a Blueprint engine. Output is a build playbook for Ahmed, not a $25k client deliverable.
Not multi-tenant. One JSON profile. One user. One folder of past playbooks.
Not a five-stage workflow. Single shot. No gates. No async forms. No house-style filter.
Not the catalog. Recommendations are grounded in the Operator Profile, not in a curated library of vendors and patterns.
Not for clients. Behind WireGuard. Never shown externally.

Success criteria

Ahmed uses V0.1 on at least 5 of his own ideas in the first 14 days.
At least one playbook leads to a built thing that ships.
Cost stays under $20/month.
Cache hit rate ≥ 80% by call #3 in any session.
Ahmed is willing to commit the next 6 weeks to V1 because V0.1 worked — or honestly say "the loop isn't useful" and reshape V1 before building it.

V0.1 is the cheapest correct first step

Three to seven days. One prompt. One static JSON file. One LLM call. If V0.1 doesn't change how Ahmed builds, V1 won't change how clients buy. Find that out in days, not weeks.

Previous← Executive Summary NextVision & Promise →

Part A · Vision

Mihwar — A consultant's cockpit today, a client platform tomorrow

Mihwar (محور · "pivot, axis") is a two-phase platform. Phase 1 is a single-operator web app that turns 3-week AI consulting discoveries into 3-day Blueprint deliverables for NMO Partners. Phase 2 is a SaaS where clients run their own AI visioning and roadmaps inside the same engine. The same axis turns — the operator changes.

Phase 1 cycle

≤7 days

empty workspace → signed Blueprint

Phase 1 price anchor

$25k

single Blueprint deliverable

Phase 2 trigger

10+ pulls

clients asking for self-serve access

Build budget

6 weeks

VPS to first signed engagement

What Mihwar is, in one paragraph

Mihwar is the operating system for an AI consulting practice. In Phase 1 it is the private cockpit Ahmed and the NMO team use to run client engagements: a five-stage workflow that takes a vague client wish and produces a $25,000 Blueprint deliverable in a working week, grounded in a curated catalog of vendors, models, and patterns and in the client's own infrastructure inventory. In Phase 2 the same engine is exposed to clients directly so they can self-serve AI visioning and roadmaps inside their own organisations, paying NMO a subscription for the platform and the catalog. The Phase 1 codebase is built so the Phase 2 pivot is a deployment change, not a rewrite.

The promise

Without Mihwar

3 weeks · PowerPoint · dies in committee

Discovery drags: 14 unrecorded conversations, contradictions, sprawling notes.
Architecture is gut-call: senior architect picks tools from memory, no audit trail.
Deliverable is a 60-slide deck plus a Word doc — unsearchable, unverifiable, unforwardable.
Conversion to build: ≈15%. Most decks rot in a finance committee for 90 days.

With Mihwar

7 days · interactive HTML · ships

Discovery is structured, partially async, and refuses to advance until complete.
Architecture is grounded in the curated catalog and the client's actual stack — every recommendation traceable.
Deliverable is a single self-contained HTML file: bilingual, navigable, signed manifest, opens in any browser.
Target conversion to build: ≥30%. The CTO can forward the Blueprint, search inside it, and reason about it.

What V1 must be

Single-tenant in operation, multi-tenant in design. Tenancy lives at the data layer from day one so V3 isn't a rewrite.
Built primarily by Claude Code, in five carefully-scoped prompts. Ahmed reviews diffs and steers; the LLM writes most of the lines.
Boring stack. Postgres, Redis, FastAPI, Next.js, Hostinger VPS, Coolify. No Kubernetes, no microservices.
Bilingual EN/AR by default. KSA market. Arabic is not an afterthought.
Self-defended. Mihwar's own security and infrastructure get the same rigour we sell to clients.

What V1 is not

Not a SaaS yet. No public sign-up, no Stripe, no per-tenant theming. The data model is ready; the UI is not.
Not a build platform. Mihwar produces the Blueprint and the Playbook. Building is downstream — by NMO Apex's agent team, by a partner, or by the client themselves.
Not a generic "AI ChatGPT for consulting". Every recommendation is grounded in a curated catalog and an inventoried environment. House style is enforced. House voice is mandatory.

The four mental tests

Every implementation decision in this masterplan is checked against four questions. They run through every section that follows.

1×

Scale

Would this still work at 100× current load?

2×

Security

How could this be abused by a hostile actor?

3×

Observability

Could this be investigated at 2am, six months from now?

4×

Economics

Affordable at 100× usage? What's the cost per user per month?

A note on this document

This masterplan is the v2 — it preserves the v1 vision and substance, and adds: explicit Phase 1 / Phase 2 framing, an Organisation Infrastructure Profile section (so the same client doesn't re-enter their stack each engagement), a dedicated Mihwar Self-Security & Infrastructure section, an AI Economics section, an Observability & Logs Page section, an Operations Handbook, and a deeper Phase 2 product spec. Every prior section is kept and tightened.

Previous← V0.1 · Idea Compiler NextProblem & Insight →

Part A · Vision

The problem & the insight

Why AI consulting projects stall in KSA right now, and the single observation that turns Mihwar from "another workshop tool" into a defensible product.

The problem — three layers

Layer 1 · Discovery is slow and expensive

A typical AI use-case discovery in a KSA enterprise takes 3–6 weeks. Stakeholders are scattered across IT, business units, security, procurement, vendors. Information arrives in WhatsApp threads, email PDFs, three different SharePoint tenants and a printed spreadsheet from a DBA. The consultant spends 60% of the engagement chasing data dictionaries and license terms, not designing the system.

Layer 2 · Architecture decisions are gut-call

By the time the inventory is "good enough", the senior architect picks tools from memory. The recommendation is rarely written down beside the alternatives that were rejected. Six months later when the build runs into trouble, no one remembers why Snowflake was chosen over BigQuery — and there is no audit trail to consult.

Layer 3 · The deliverable is dead on arrival

Most engagements end with a 60-slide PowerPoint plus a Word document. CTOs forward them to a procurement committee, who can't navigate them, can't search them, can't share fragments without re-formatting, and can't verify whether the architecture has been validated against the actual environment. The artifact is dead on arrival.

The insight

The bottleneck in AI consulting is not the architecture. It's the discovery interview. The architecture step takes a senior architect a few days at most — pattern-match the use case, pick from the toolkit, run the numbers. What takes weeks is dragging information out of stakeholders. So the leverage is not "AI that designs systems" — it is "AI that runs the interview." — The thesis

The reframe

Mihwar is not a chatbot for architects. It is an interviewing instrument. Stage 1 sharpens the use case. Stage 2 conducts the inventory. Both stages structurally refuse to advance until the inputs to Stage 3 are complete. Stage 3 — architecture synthesis — is fast precisely because Stages 1 and 2 made it possible. Most AI consulting tools start at Stage 3 and skip the discovery; that is exactly why their outputs feel hallucinated.

Why this works in KSA, specifically

KSA enterprises have real budgets for AI right now and weak internal AI talent. They need consultants who move fast but stay rigorous.
The local consulting market is dominated by Big Four-style PowerPoint teams. A bilingual interactive HTML deliverable signed by an Arabic-fluent boutique is a credible differentiator.
PDPL, SAMA, NCA cybersecurity controls — these introduce architecture constraints that grounded recommendations must respect. Generic AI tools can't see those constraints; Mihwar's catalog encodes them.

Defensibility

The Mihwar workflow is copyable in 90 days. The Mihwar catalog — opinionated, KSA-localised, vendor-vetted, refreshed quarterly — is the moat. Every engagement adds rows. Every quarterly review prunes them. By month 12, the catalog is a deliverable in itself.

Previous← Executive Summary NextMarket Context →

Part A · Vision

Market context

2026 is the loudest year in KSA AI consulting history. Mihwar's job is to be the most differentiated voice in the room — not the loudest.

The 2026 KSA AI landscape

Saudi Arabia has declared 2026 the Year of AI. Concretely:

SDAIA is funding AI capability programs across ministries and Vision-2030 entities. Tier-2 and Tier-3 government bodies (universities, regulators, regional admin) are being told to "have an AI strategy" by year-end.
Banks under SAMA are running parallel AI initiatives — fraud, KYC summarisation, contact centre — under increasing regulatory scrutiny.
Mid-market enterprises (logistics, retail, healthcare networks, family offices) are watching the giants and want a credible mid-budget option for AI exploration.
The Big Four are quoting 8–14 weeks and $200k+ for AI strategy decks. Most clients can't afford that or won't.

The competitive shape

Competitor	Strength	Weakness Mihwar exploits
Big Four (Deloitte, EY, PwC, KPMG)	Brand, regulatory comfort, large delivery teams	Slow, expensive, generic decks, junior delivery on senior pitch
BCG / Bain / McKinsey	Strategy chops, board-level access	$300k+ floor, no implementation grounding, no KSA-localised vendor view
Local SI consultancies	Relationships, ministry pre-quals	Body-shop economics, no productised IP, no AI-specific differentiation
Boutique AI shops (regional / overseas)	Technical depth	No Arabic delivery, no PDPL fluency, no in-region presence
"AI strategy" SaaS tools	Cheap, fast	Generic catalog, not grounded in client's actual stack, no consultant orchestration

The wedge

Mihwar plus NMO occupies a specific gap: a senior, KSA-fluent consultant team backed by a productised workflow that produces a verifiable, interactive deliverable in 7 days for a $25k anchor price. No Big Four competes there because their cost structure forbids it. No SaaS competes there because they have no senior consultant. No body-shop competes there because they have no productised IP.

Two-tier client thesis

Government clients

Vision-2030 entities, ministries, regulators.

Want: defensible architecture, PDPL/NCA compliance, bilingual deliverable for ministerial review.
Pain: Big Four cost, slow delivery, decks that don't survive procurement scrutiny.
Mihwar fit: Tier 2 (Blueprint + Playbook + RFP spec) at $40–60k. Stage 4 outputs an RFP-ready spec they can put to public tender.

Mid-market clients

Banks, logistics, retail, healthcare, family offices.

Want: a fast, defensible AI strategy that doesn't need a $300k commitment to start.
Pain: CFO won't sign $200k for a deck. CTO won't trust a $5k SaaS tool.
Mihwar fit: Tier 1 ($15–30k Blueprint), conversion to Tier 3 build later.

Phase 2 market

When Mihwar opens to clients directly (Phase 2), the addressable market widens substantially: every mid-market enterprise that does not need a consultant in the room but does need a structured visioning process becomes a buyer. Pricing shifts from engagement-based to per-seat or per-Blueprint subscription. NMO captures consultancies as a meta-tier — small AI shops who license the Mihwar engine and the catalog and use it inside their own client engagements.

Risk acknowledged

The KSA market is rapidly consolidating around 4–5 large preferred suppliers. Mihwar's window is the next 18 months. After that, the ground hardens. Every milestone in this masterplan is calibrated to that window.

Previous← Problem & Insight NextPositioning & Pricing →

Part A · Vision

Positioning & pricing

How Mihwar is sold, what it costs the client, and how it makes NMO defensibly profitable.

The three-tier pricing model

Tier	Deliverable	Price	Cycle	Margin
Tier 1 · Blueprint	Bilingual interactive HTML Blueprint signed off by client CTO. One 90-min walkthrough.	$15–30k	1–2 weeks	≥75%
Tier 2 · Blueprint + Playbook	Tier 1 plus 6-week build plan, risk register, vendor short-list, RFP-ready spec.	$30–60k	2–3 weeks	≥65%
Tier 3 · End-to-end engagement	Tier 2 plus orchestrated build (NMO Apex agents or partner squad).	$120k+	3–9 months	30–50% on build portion

The pricing anchor

Always quote the Blueprint price first. Even when a client wants a full build, the conversation starts with: "The first deliverable is the Blueprint. It's $25,000 and it takes us about a week. Once you have it, you'll decide what to build, when, and with whom — including potentially us." — Sales discipline

This anchors the value of discovery, separates it from build risk, and makes Tier 1 feel reasonable. Never quote a Tier 3 price first. It triggers procurement scrutiny that the engagement isn't sized for.

Phase 2 pricing (preview)

When Mihwar becomes a SaaS, pricing shifts. The Blueprint becomes a unit of work the customer self-produces; NMO charges for access to the engine and the catalog.

Phase 2 plan	Audience	Price target	What's included
Starter	Single AI champion at a mid-market enterprise	$1,200/mo or $9,600/yr	1 workspace, 3 Blueprints/yr, premium catalog read-only, EN/AR
Team	5-seat AI office	$3,500/mo	5 workspaces, unlimited Blueprints, custom branding, SSO
Consultancy	Boutique AI shops licensing Mihwar for their clients	$25k/yr + per-Blueprint	White-label, multi-client workspaces, customer-private catalog tier, NMO catalog as premium
Enterprise	Large org with strict residency / SSO needs	Custom	Dedicated tenant in-region, BYO IDP, audit export, contractual residency

Margin discipline

Phase 1: The Blueprint price minus the Anthropic/embeddings cost minus 1.5 days of senior consultant time must clear 65% margin. If a Blueprint burns more than 1.5 days of consultant attention, the workflow has failed.
Phase 2: Per-Blueprint AI cost must stay below $40 at the 95th percentile via prompt caching, two-tier model selection, and batch-API for non-realtime steps. See AI Economics.

House style

Every Mihwar deliverable carries a recognisable visual and verbal signature. Tight typography. No lorem-ipsum tone. Decisions named, not hedged. Trade-offs explicit. The deliverables look like $30k consulting artifacts, not "an AI generated this."

Previous← Market Context NextPersonas →

Part A · Vision

Personas

Mihwar serves four distinct user roles. The product treats each one differently. Phase 1 is built for the first three; Phase 2 adds the fourth.

Persona 1 · Ahmed (Founding Consultant) — the driver

Role

Founder of NMO Partners. Senior technologist. Runs every engagement personally in V1.

Touches the tool

Daily during active engagements. Power user.

Jobs to be done

Move clients from "we want AI" to a signed Blueprint in <7 working days. Maintain consistency across engagements. Build catalog IP.

Jobs avoided

Slow tooling. Generic AI voice. Forced sequence when context demands flexibility. Lost work.

Phase 1 access

Full admin. Phase 2: meta-admin role across all client tenants.

Persona 2 · NMO Consultant (future, Month 4+) — the growing team

Role

First or second consultant Ahmed hires as engagement volume grows.

Jobs to be done

Run engagements without Ahmed in the room, get the same Blueprint quality, learn from prior engagements (pattern reuse), hand off to Ahmed for review at clear checkpoints.

Implications for V1

RBAC is deferred to V2, but the data model supports per-user audit from day one. Every workspace action records the consultant who took it.

Persona 3 · Client CTO / Head of AI — the audience

Role

Senior technology leader at the client. Receives the Blueprint as the engagement deliverable.

Touches the tool

Indirectly: opens the Blueprint HTML, possibly fills async forms during Stage 2, attends the 90-min walkthrough.

Jobs to be done

Get a defensible, board-ready AI strategy. Know it is grounded in his/her actual stack. Be able to forward and search it.

Frustrations to avoid

Decks that look generic. Recommendations that ignore PDPL or KSA presence. Inability to verify source of a claim.

Persona 4 · Phase 2 self-serve client — the future buyer Phase 2

Role

Mid-market AI champion or in-house digital lead. Uses Mihwar without an NMO consultant in the room.

Touches the tool

Self-serve workspace, runs Stages 1–3 themselves, optionally pays NMO for a 2-hour expert review before Stage 4.

Jobs to be done

Get a board-ready AI roadmap. Reuse the Org Profile across multiple use cases. Export to Confluence / SharePoint / PDF.

Implications for design

The interview surfaces must work without a senior consultant interpreting them. Tooltips, examples, "what good looks like" hints throughout. Stage 2 must self-validate.

Phase 1 readiness

Already accounted for: the data model is multi-tenant, the workflow is structured, the catalog tier system is designed. Phase 2 is a deployment + UI polish, not a rewrite.

Why four personas matter for V1

Even though Phase 2 is months away, every V1 design decision is checked against all four. A surface that only Ahmed could love will not survive the pivot. A schema that only V1 needs will require a rewrite. Designing for the audience now costs little; designing for them later costs months.

Previous← Positioning & Pricing NextTwo-Phase Strategy →

Part A · Vision · New

The two-phase strategy

One codebase, two operating modes. Phase 1 is the consulting cockpit operated by NMO. Phase 2 is the self-serve platform operated by clients. The same engine drives both — the difference is who holds the steering wheel.

Why two phases, in this order

Phase 1 first earns the right to Phase 2. Without ten signed engagements producing real Blueprints, Mihwar has no proof, no testimonials, no catalog moat, and no understanding of where the workflow actually breaks for a non-expert user.
Phase 1 funds Phase 2. Each $25k Blueprint at 75% margin contributes $18k toward the SaaS lift. Five engagements pay for the Phase 2 build wholesale.
Phase 1 stress-tests every Phase 2 surface. Every UX paper-cut Ahmed hits with a real client is a paper-cut a self-serve user would have hit harder. Phase 1 is a moving usability test.
Phase 2 too early kills the consulting margin. Self-serve at $1,200/mo dilutes the perceived value of the $25k engagement. Phase 2 launches when Phase 1 is sold out, not before.

What Phase 1 builds that Phase 2 inherits

Capability	Phase 1 use	Phase 2 inheritance
Five-stage workflow	NMO consultant runs it	Self-serve user runs it with embedded coaching
Catalog	NMO's curated knowledge base	Premium tier (NMO) + customer-private tier
Blueprint format	$25k deliverable	Self-produced artifact
Multi-tenant data layer (RLS)	One tenant: NMO	Many tenants: subscribers
Org Infrastructure Profile	Captured per engagement, reused on repeat	Captured per organisation, drives every Blueprint they make
aiproxy + AI economics discipline	Cost control across few engagements	Cost discipline at scale; per-tenant budget caps
Audit log	Per-user actions for NMO team	Compliance trail for regulated subscribers

What Phase 2 adds on top of Phase 1

Identity provider integration: OIDC, SAML, Microsoft Entra, Google Workspace.
Self-serve onboarding: sign-up flow, email verification, workspace creation wizard, first-Blueprint guide.
Embedded coaching: the AI plays "consultant in the room" for users without one. Higher tooltips density, "show me an example" affordances, sample answers from the catalog.
Per-tenant customisation: theme, logo, custom blueprint cover page, branded export.
Billing & metering: Stripe, per-seat or per-Blueprint counters, hard caps, soft alerts.
Catalog tiering: NMO premium catalog (read-only, paid), customer-private catalog (writeable, scoped to that tenant).
Public Mihwar website & pricing page.

The phase-pivot triggers

Phase 2 development starts when any one of the following becomes true:

Demand pull: ≥10 distinct prospects ask "can we get a Mihwar login" within any 6-month window.
Capacity ceiling: NMO's consultant team is fully booked, pipeline is stronger than capacity, and adding consultants doesn't scale margin.
Catalog moat is mature: ≥300 entries with quarterly review cycles. Catalog itself is now a deliverable.

Until then

V1 stays disciplined. Phase 2 features only land in the codebase if they cost nothing to add now — schema columns, tenant scoping, request-id propagation, scrubbing middleware. UI for Phase 2 is built when the trigger fires, not before.

Previous← Personas NextFive-Stage Workflow →

Part B · Product

The five-stage workflow

Mihwar's core mechanic. Five sequential stages, each producing a versioned artifact, each unlocking the next. The Architecture Gate between Stages 2 and 3 is the rule that earns Mihwar its existence.

The complete workflow

Stage	Mode	Duration	Output	AI model
1 · Ideation Lab	Live workshop · Socratic AI	60–90 min	Sharpened use case (1-pager)	Claude Sonnet
2 · Discovery	Hybrid live + async forms	2–3 elapsed days	Infrastructure inventory	Haiku for filtering · Sonnet for synthesis
⚑ Architecture Gate · Stage 3 locked until Stage 2 is signed off
3 · Architecture	AI synthesis · consultant edits	~1 day	Use Case Blueprint	Sonnet (extended thinking)
4 · Playbook	Optional · Tier 2+ only	~1 day	Build plan · risks · vendors · RFP spec	Sonnet
5 · Handoff	Compile · present · export	90-min walkthrough	Final HTML Blueprint deliverable	—

The stage mechanics

Each stage is a panel in the Mihwar UI with three sub-panels:

The interview — a chat-like surface where the consultant works through prompts. The AI asks follow-ups; the consultant types, edits, or pastes.
The artifact — the structured output of the stage, continuously updated as the interview progresses. The consultant sees it forming.
The signoff — a single button at the bottom: "I am satisfied this stage is complete." Clicking it freezes the artifact, creates an immutable version row, and unlocks the next stage. The consultant always controls signoff — never the AI.

Why "the gate" matters

The Architecture Gate is the most important rule in Mihwar. Every other AI consulting tool will happily synthesise architecture from incomplete inputs, because the AI doesn't care. Mihwar refuses. The gate is what makes the deliverable trustworthy. — Architecture discipline

Concretely: when the consultant tries to advance to Stage 3, the system checks Stage 2 completeness against the use case category. Missing critical fields ("nobody has told us where the data is") block advance with a specific, actionable list. The consultant cannot bypass this from the UI; they would have to edit the database directly to override.

Stages can iterate

The flow is sequential, but versioned. If Stage 3 reveals a missing data point, the consultant can re-open Stage 2, capture it, re-sign off, and Stage 3 re-synthesises with the new context. The audit log records every back-and-forth.

Previous← Two-Phase Strategy NextStage 1 · Lab →

Part B · Product · Stage 1

Stage 1 — The Ideation Lab

A 60–90 minute live conversation that turns a vague client wish ("we want to use AI in our call centre") into a sharp, scoped use case with measurable success criteria. Socratic AI interrogates ambiguity until consultant and client agree on what they're actually building.

When this stage runs

Typically the first or second meeting with a new client. The CTO has expressed interest, may have a fuzzy idea of what they want, and needs the consultant to help them sharpen it. The Lab can also be skipped if the client arrives with a fully-scoped use case (rare) — they get a discount for not needing it.

The six sharpening questions

Mihwar runs the Lab through six question phases. The AI generates the specific questions in context, but they always probe these dimensions:

#	Dimension	The question behind the question
1	The pain	What specific operational pain are we removing? Not "improving efficiency" — "reducing first-call resolution time from 14 minutes to under 6 minutes."
2	The user	Who is the human in the loop? Internal employee? External customer? Regulated principal?
3	The current state	How is this done today? With what tools, by whom, at what cost? Sketch the unhappy path.
4	The success metric	If we did this perfectly, what number moves and by how much? Who measures it?
5	The blast radius	What happens if the AI is wrong 5% of the time? 20%? Tolerable / catastrophic?
6	The first-mile constraints	Who has the data? Who has the budget? Who must approve?

The AI's behaviour

The Lab uses Claude Sonnet (latest) with a system prompt that turns it into a Socratic interviewer. Behaviour rules:

Asks one question at a time. Never bundle two probes.
Reflects what it heard before moving on. Confirms understanding in the consultant's words.
Surfaces contradictions politely. "Earlier you said X. This sounds like Y. Which one is the real one?"
Refuses to recommend solutions. Stage 1 is about the problem, not the answer. If the consultant tries to leap forward, the AI parks the answer for later.
Honours house style. No exclamation marks. No "I'd love to help!" No emoji. Direct, professional, KSA-appropriate.

The artifact: the 1-pager

As the conversation progresses, the artifact panel renders a structured Use Case 1-pager:

USE CASE: [name]
PAIN:    [one sentence]
USER:    [persona, role, jurisdiction]
TODAY:   [current process, cost, owner]
TARGET:  [metric, baseline, goal, by when]
BLAST:   [tolerable failure modes, intolerable failure modes]
INPUTS:  [what data is needed, who owns it]
DECISION-OWNER: [who signs off the build]
OUT-OF-SCOPE: [explicit non-goals]

The signoff

When the consultant is satisfied, they hit "Sign off Stage 1". The 1-pager is frozen as v1. If they re-open later, edits create v2, v3, etc. — never overwrite. This becomes the input to Stage 2's question-set tailoring.

Phase 2 considerations Phase 2

For self-serve Phase 2 users, Stage 1 needs more scaffolding: example 1-pagers from the catalog ("see how a contact-centre AI was scoped"), inline tooltips that explain each dimension, and a "show me a strong answer" affordance on each prompt. The schema doesn't change — just the surface.

Cost note Cost

Stage 1 averages ~30 turns. With prompt caching on the system prompt + house style + catalog examples, marginal cost per turn is dominated by output tokens. Expected per-Lab spend: $1.20–$2.50 at $3/Mtok input, $15/Mtok output.

Previous← Five-Stage Workflow NextStage 2 · Discovery →

Part B · Product · Stage 2

Stage 2 — Discovery

The infrastructure inventory. The most labour-intensive stage and the one most clients hate. Mihwar's job is to make it bearable, structured, partially async — and to refuse to advance until it's actually complete.

Why this stage is the bottleneck

In a traditional engagement, Stage 2 takes 3–6 weeks. It's where consultants chase stakeholders for data dictionaries, screenshots of dashboards, license confirmations, GPU specs. It's where projects stall.

Mihwar compresses to 2–3 elapsed days by:

Generating only the questions that matter. The AI uses the Stage 1 1-pager to filter the discovery taxonomy down to ~30 questions out of a possible ~150.
Splitting live and async. Questions the consultant can answer live; questions that need a DBA or vendor contract get sent as a structured form to the right person via a single-use, time-limited link.
Auto-detecting completeness. The AI tells the consultant exactly which questions are still blocking architecture synthesis.
Reusing the Org Profile. If the client has done a previous engagement (or is a Phase 2 user), most infrastructure questions are already answered. See Org Infra Profile.

The discovery taxonomy

Domain	What we capture
Data sources	Warehouses (Teradata, Snowflake, BigQuery), lakes (S3, ADLS), operational DBs, file shares, SaaS APIs, Excel sprawl. License terms. Volume. Freshness. Owner.
Compute	Cloud accounts, on-prem servers, GPU clusters, Kubernetes, VPS providers, edge devices. Capacity. Region. Procurement model.
Identity & access	IDP (Entra, Okta, custom), SSO state, MFA coverage, service-account hygiene, secret stores.
Network & perimeter	VPN, ZTNA, private endpoints, egress controls, region restrictions, SAMA / NCA controls applicable.
Existing AI/ML	Models in production, vendors used, licensing, evaluation discipline, MLOps maturity.
Compliance	PDPL, SAMA, NCA ECC, sector-specific (healthcare, education). Data classification scheme.
People	Sponsors, decision owners, champions, blockers. Skill availability.
Budget & procurement	Approved spend envelope. Procurement vehicle (direct, RFP, framework). Vendor preferences.
Constraints	Residency, on-prem mandates, vendor exclusions, contractual SLA shape, audit cadence.

Async forms — how they work

For each async question, Mihwar generates a single-use form link, scoped to the question, time-limited (default 7 days), bound to the recipient's email and IP-logged. The link looks like:

https://mihwar.nmopartners.com/async/01HV7Z9K3J5XPQ8WMY4N6T2RES

Recipients land on a clean, branded page with one or two questions, an "I don't know — ask X" escape, and a submit button. No login required. Submissions stream back into the consultant's Stage 2 panel.

Security flag Security

Async-link tokens are cryptographically random ≥128-bit (ULIDs server-generated via secrets.token_urlsafe(16) not uuid4 when used for auth). Single-use: marked consumed on first valid submission. Time-bound: hard expiry at 7 days, rejected at the API layer. IP-logged for audit. Form pages return generic errors on invalid/expired tokens, never leak whether the token existed. See Client Security & PDPL.

The completeness check

The AI maintains a running gate-check: which Stage 3 architecture decisions can be made given current Stage 2 inputs? The consultant sees this as a live readiness meter, with the specific blocking questions named:

Stage 3 readiness: 76% · 4 questions remain blocking
✓ Data residency captured
✓ Identity provider captured
✗ GPU availability — pending response from CloudOps (sent 3 days ago)
✗ PDPL classification of customer voice transcripts — pending Legal
✗ SAMA AI governance applicability — async link expired, resend?
✗ Production traffic peak — async link sent today

Phase 2 considerations Phase 2

Self-serve users don't have a consultant orchestrating Stage 2. Mihwar must:

Pre-populate from the persistent Org Profile wherever it overlaps.
Suggest who to ask for each blocker ("typically your DBA can answer this — here's a template message").
Allow inviting collaborators into the workspace to answer their slice directly.

Previous← Stage 1 · Lab NextStage 3 · Architecture →

Part B · Product · Stage 3

Stage 3 — Architecture synthesis

The AI proposes a complete reference architecture for the use case, grounded entirely in the client's actual infrastructure (Stage 2) and the curated catalog. The consultant reviews, edits, and signs off the result.

What "grounded" means

The AI is given:

The Stage 1 1-pager (sharp use case).
The Stage 2 inventory (what they have).
The full catalog, RAG-retrieved (vendors, models, frameworks, patterns, constraints).
The house style guide and banned-phrases list.

The AI is forbidden from:

Recommending a vendor not in the catalog.
Recommending a vendor without KSA presence when the inventory says residency is required.
Recommending a tool the inventory shows the client doesn't have a license for, without flagging procurement implications.
Inventing pricing.

The six architecture outputs

Layered architecture diagram — auto-generated SVG using the 10-layer model from the AI Ecosystem Primer, populated with chosen tools.
Component manifest — table of every component, its role, its catalog reference, the rationale.
Data flow diagram — auto-generated, showing how data moves from sources to user-visible outputs and back.
Trade-offs & alternatives — what was considered and rejected, and why. With explicit catalog references.
Open questions — anything the AI flagged as needing human judgement before build commits.
Compliance overlay — a separate read of the architecture against PDPL, SAMA, NCA controls, depending on what Stage 2 captured.

How synthesis actually runs

Synthesis is asynchronous. The consultant clicks "Generate Architecture v1"; the request lands in a background queue (BullMQ-equivalent on Redis). A worker:

Loads the 1-pager, the inventory, the relevant catalog slice (RAG: top-K embeddings).
Runs Sonnet with extended thinking enabled, system prompt cached via cache_control.
Streams progress to the consultant's UI via SSE.
Persists the result as stage_artifacts v1.
Auto-renders the SVG diagrams from the structured component manifest.

Total elapsed time: typically 60–120 seconds. The consultant sees a "thinking…" beam during synthesis and reads the result when it lands.

Scale flag Scale

Synthesis is the heaviest call in Mihwar (10–30k output tokens with extended thinking). Running it inline in the request thread would block the API for two minutes. Background queue + SSE keeps the UI responsive and lets us retry on transient failures without losing user work. See System Architecture.

Editing the result

The consultant can:

Edit any field directly (textarea / structured editors per output).
Ask the AI to "regenerate just the trade-offs section with these adjustments."
Override a vendor recommendation and have the AI re-rationalise.
Accept and freeze the result, creating Stage 3 v1.

Cost flag Cost

One Stage 3 synthesis run costs ~$3–8 depending on extended-thinking depth. Prompt caching of the catalog snapshot (~50k tokens at hit rate >80%) dominates the savings. See AI Economics.

Previous← Stage 2 · Discovery NextStage 4 · Playbook →

Part B · Product · Stage 4

Stage 4 — The Build Playbook

Optional but high-margin. Adds detailed build planning, risk register, vendor short-list, and reference repositories. Sold as Tier 2+ pricing. The Playbook is what a buy-side procurement officer actually reads.

Why this stage is optional

Many engagements stop at Stage 3 — the client signs off the Blueprint, takes it to their finance committee, comes back later for the build. Mihwar respects that — Stage 4 is opt-in and adds days, not hours.

When clients do want Stage 4, they're typically committed to building and need the planning rigour. They're paying $30–60k for the Blueprint+Playbook combo and they expect a deliverable they can hand to a build team.

The five Playbook outputs

6-Week Build Plan — week-by-week milestones, dependencies, owner per task. Conservative estimates.
Risk Register — every risk identified during discovery and architecture, with severity, mitigation, owner.
Vendor & Tooling Short-List — for every component, 1–3 specific vendors with KSA presence, pricing model, contact, last-reviewed date.
Reference Repositories — pointers to NMO Apex's accumulated build patterns: starter Helm charts, FastAPI templates, Next.js shells. (Tier 3 only — IP is gated.)
RFP Specification — for government clients: a procurement-ready scope of work, evaluation criteria, and acceptance test plan. Optional add-on.

The Risk Register format

Risk	Likelihood	Impact	Mitigation	Owner
Anthropic API quota tightened mid-build	Med	High	Multi-region key, fallback to second model family	NMO platform lead
Customer voice transcripts contain PHI under MoH classification	High	High	Pre-classify sample, redact pipeline before LLM, legal sign-off Week 1	Client legal + NMO

The RFP spec

For government engagements, the RFP spec is the keystone. It mirrors the Blueprint structurally but reformats it as a procurement document: scope of work, deliverables, milestones, acceptance criteria, evaluation matrix, security clauses (NCA-ECC, PDPL), and pre-qualified vendor categories. The client's procurement team can lift it into their tender platform with minimal editing.

Margin discipline

Stage 4 is high-margin only if Stage 1–3 outputs are clean. If the consultant has to rebuild Stage 2 inventory in Stage 4 to get Vendor short-list right, the gate has been violated upstream.

Previous← Stage 3 · Architecture NextStage 5 · Handoff →

Part B · Product · Stage 5

Stage 5 — Handoff

The final stage. Compiles the Blueprint, generates the proposal/scope document if relevant, and exports the deliverable.

What gets delivered

The Blueprint HTML — single self-contained file. Bilingual EN/AR. Mihwar branding minimal; NMO + client logos prominent. Opens in any browser, works offline, prints respectably.
Manifest & signature — embedded JSON manifest with version, signoffs, catalog snapshot hash, generation timestamp. Cryptographically signed.
Source pack (optional) — for clients on Tier 2+, a zip of the structured artifacts (1-pager, inventory, architecture, playbook) as JSON, for downstream tooling.
Walkthrough recording — Tier 1+ engagements include a 90-minute walkthrough. With consent, the recording becomes a deliverable too.

Presentation mode

Mihwar supports a presentation mode — full-screen, larger fonts, navigable page-by-page. The consultant shares the Blueprint screen, walks the client CTO through each section, answers questions, captures any final adjustments. Adjustments create a v(n+1) without invalidating the original signed manifest.

After handoff

The workspace doesn't disappear. Mihwar retains it indefinitely (subject to data retention policy). NMO can:

Reopen later to update the Blueprint if the client requests changes.
Reference patterns in future engagements (with consent and anonymisation).
Track which Blueprints converted to builds — feeds the catalog's "used in N past engagements" field.
Run quarterly catalog reviews against the body of past Blueprints to catch drift.

Retention & deletion Security

Default retention is indefinite for active clients. On client request (PDPL right of erasure) or at end of business relationship, the workspace can be hard-deleted with audit-logged confirmation. Catalog patterns derived from a deleted workspace stay in the catalog only if the rationale is structurally anonymised (no client name, no specific volumes). See Client Security & PDPL.

Previous← Stage 4 · Playbook NextBlueprint Format →

Part B · Product

The Blueprint format

The Blueprint is the deliverable. Everything in Mihwar exists to produce it. This page specifies exactly what it looks like, how it's structured, and why each design choice matters.

Five hard requirements

Self-contained. Single HTML file, no external dependencies (one optional Google-Fonts link). Opens offline. Works on any device, any year.
Bilingual. EN by default, AR view available with full RTL. Translation is consultant-reviewed, not raw machine output.
Navigable. Sidebar TOC, in-page anchors, search box (Ctrl+K). The CTO must be able to find any claim in <10 seconds.
Branded. Client logo, NMO logo, project name, version, document classification, date. Looks like a $30k document, not a generated artifact.
Verifiable. A signed manifest and version stamp prove provenance. Reader can hash-check.

The Blueprint structure

§	Section	Content
0	Cover	Client logo, project name, date, NMO logo, version, document classification
1	Executive Summary	One-page overview. The CFO reads only this.
2	Use Case Definition	The Stage 1 1-pager, formatted
3	Current State	Stage 2 inventory, summarised — what they have today
4	Proposed Architecture	Diagram, component manifest, rationale
5	Data & Agent Flow	How information and decisions move through the system
6	Trade-Offs & Alternatives	What we considered and rejected, and why
7	Compliance & Risk	PDPL / SAMA / NCA reading. Risk register summary.
8	Build Playbook (Tier 2+)	Plan, vendors, RFP spec
9	Glossary	Plain-language definitions of every acronym used
A	Manifest	Versions, signoffs, catalog hash, signature

Why HTML, not PDF

Searchable. Ctrl+F works inside the browser. PDFs garble across reader software.
Linkable. Internal anchors. The CTO can email a link to "§4.2 Vector Store choice" rather than "look at page 23 of the attached".
Copyable. Tables paste into Confluence. Code blocks copy clean. PDFs are write-only.
Live diagrams. SVG architecture diagrams scale on retina; PDF diagrams pixelate.
Forward-shippable. A 1.2MB HTML file forwards cleanly. A 40MB PDF gets stripped by mail filters.

The manifest

{
  "blueprint_id": "01HV8XQGT7K5R2W3M9N6P8Y4ZS",
  "client": "Tadawul",
  "project": "Customer Voice AI",
  "version": "1.0",
  "generated_at": "2026-05-07T14:32:18.420Z",
  "engagement_id": "eng-0042",
  "tenant_id": "nmo-001",
  "stages": {
    "stage_1": {"version": 3, "signed_off_by": "ahmed@nmopartners.com",
                "signed_at": "2026-05-03T10:14:00Z"},
    "stage_2": {"version": 5, "signed_off_at": "2026-05-05T16:22:00Z"},
    "stage_3": {"version": 2, "signed_off_at": "2026-05-06T09:08:00Z"}
  },
  "catalog_snapshot_hash": "sha256:7c4b8d…",
  "signing_key_id": "mihwar-prod-2026-05",
  "signature": "ed25519:0x4f2a1c9b…"
}

Why a manifest?

Six months from now, when a procurement officer asks "is this the same Blueprint we received in May?" — the manifest answers in 5 seconds. The Ed25519 signature also lets clients verify cryptographically that the Blueprint hasn't been tampered with after signoff.

Previous← Stage 5 · Handoff NextThe Catalog →

Part B · Product

The Catalog

Mihwar's grounding source. A curated, opinionated reference of vendors, models, frameworks, and patterns — maintained by NMO, used by the AI for every recommendation.

Why a catalog

If the AI is allowed to recommend any vendor based on its training data, three things go wrong:

Hallucinations. It recommends vendors that don't exist in KSA, or quotes pricing from 2023.
Inconsistency. Two engagements get different recommendations for the same problem.
Loss of differentiation. NMO's Blueprint reads like every other consultancy's because it's drawn from the same public training data.

The catalog solves all three. It's NMO's opinionated knowledge base, evolving with every engagement.

The 10-layer architecture atlas

The catalog is organised around a 10-layer reference architecture covering the full AI stack. Every catalog entry attaches to one or more layers. The same atlas is used in Stage 3's auto-generated diagram.

Catalog schema

Entity	Fields
Vendor	name, layers (1–10), region availability, KSA presence (none / partner / direct), pricing model, NMO opinion (rating 1–5 + notes), known limits, partner contacts, last reviewed
Model	name, family, provider, context window, cost/M input, cost/M output, languages (incl. AR strength), strengths, weaknesses, NMO opinion
Framework	name, layer, license, language, maturity, NMO opinion, when-to-use, when-to-avoid
Pattern	name, problem solved, components used, reference repo, used in N past engagements, success notes
Constraint	type (PDPL, SAMA, NCA, on-prem-only, etc.), description, implication for architecture
Question	domain, question text (EN + AR), category, async/live default, depends-on Stage-1 fields, gating Stage-3 decisions

Catalog seeding (V1)

The catalog is seeded from two sources:

The AI Ecosystem Primer (NMO's existing reference doc) — every vendor, model, and framework listed there imports as a catalog entry.
Pattern templates derived from NMO Apex's prior builds — contact-centre AI, document Q&A, fraud screening, KSA/AR-language tasks.

Seed target for V1: 80–120 vendors, 30 models, 20 frameworks, 12 patterns, 30 constraints, 150 questions. Within 6 months of operation: 200+ entries, quarterly review cycle.

Catalog tiers (Phase 2 preview) Phase 2

Public tier — minimal entries, good for the public Mihwar marketing site.
Premium tier (NMO) — the full opinionated catalog. Phase 2 subscribers get read-only access, NMO writes.
Customer-private tier — each Phase 2 tenant can add their own private entries (preferred internal vendors, contractual exceptions, data-classification overrides). Never visible to other tenants.

The catalog is the moat

Previous← Blueprint Format NextOrg Infra Profile →

Part B · Product · New

The Organisation Infrastructure Profile

Stage 2 asks 30+ questions about the client's stack. Most of those answers don't change between engagements with the same client. The Org Profile captures them once, persists them at the organisation level, and pre-populates every future Blueprint — both inside Phase 1 and across Phase 2.

The problem this solves

Today, when NMO does a second engagement with a returning client, the consultant manually re-enters 80% of Stage 2 — same Snowflake, same Entra tenant, same SAMA-registered subsidiary, same procurement rules. The client wonders why they're answering the same questions twice. Phase 2 makes this unbearable: a self-serve user shouldn't face a 30-question infrastructure quiz on every Blueprint they generate.

What the Org Profile captures

The Org Profile is a structured, versioned document attached to a tenant (Phase 2) or to a client entity within NMO's tenant (Phase 1). It mirrors the Stage 2 taxonomy:

Section	Examples	Update cadence
Identity & tenant	Legal entity, sector, regulator(s), HQ region, employee count, AR/EN preference	Annual or on change
Data platform	Warehouses, lakes, ETL tooling, BI tools, classification scheme	Per use case unless changed
Compute & cloud	Cloud accounts, regions, Kubernetes, GPU access, on-prem footprint	Quarterly
Identity & security	IDP, MFA coverage, ZTNA, secret stores, SOC, incident response shape	Annual
Compliance posture	PDPL applicable, SAMA registered, NCA-ECC tier, sector controls (MoH, MoE)	Annual or on regulatory change
Procurement	Approved vendor list, RFP framework, procurement vehicle, budget cycle	Annual
AI maturity	Models in production, MLOps state, AI champion, governance committee	Per use case
Constraints	Data residency mandates, vendor exclusions, on-prem-only systems, sovereign-cloud requirement	Annual or on change

How the Profile feeds discovery

Why this is a separate concept from Stage 2

Different lifecycle. Stage 2 inventory is engagement-bound and frozen with the Blueprint. The Org Profile lives across engagements and is updated in place.
Different ownership. Stage 2 is owned by the consultant. The Org Profile is owned by the client (Phase 2) or by the consultant on behalf of the client (Phase 1).
Different write surface. Stage 2 is a workflow. The Org Profile is a settings page.
Different security needs. Org Profile contains the sensitive long-form picture of the organisation — encrypt at rest at field level, see Mihwar's Own Security.

The "delta" experience

When a returning client starts a new engagement:

Mihwar loads the Org Profile, marks it as the baseline for Stage 2.
The AI scans the Stage 1 1-pager, identifies which Stage 2 questions are still unanswered or stale for this specific use case (e.g. a contact-centre use case might trigger questions about voice telephony platforms not relevant to a previous fraud-screening engagement).
The consultant only sees the delta: ~5–8 use-case-specific questions instead of 30.
On signoff, any edits flow back into the Org Profile (with an "this updates the profile?" confirmation), versioning the profile too.

Versioning

Org Profile is versioned in the same pattern as stage_artifacts: every meaningful update creates an immutable version row with author + timestamp. Stage 2 inventories link to the specific Profile version they were derived from — so re-reading a Blueprint a year later shows what the world looked like then, not now.

Security & privacy

Org Profile is sensitive Security

The Profile contains the most sensitive picture of the client: cloud accounts, security tooling, compliance posture, vendor names. Treat it as crown-jewel data. Encrypt at field level with per-tenant DEKs (KMS-wrapped). Restrict export. Audit every read. See Mihwar's Own Security.

Phase 2 fit Phase 2

The Org Profile is the central artifact of Phase 2. A self-serve user fills it once at onboarding (with a guided wizard), then every Blueprint they generate inherits it. Profile review becomes an annual event — pushed by Mihwar with email reminders. Without the Profile concept, Phase 2 is unusable; with it, the second Blueprint a customer generates feels effortless.

Previous← The Catalog NextSystem Architecture →

Part C · Architecture

System architecture

Five containers on a single VPS, joined to a private Docker network. Egress to Anthropic strictly through the aiproxy. Boring, well-understood, swap-safe.

The six containers

Container	Role	Port	Notes
`mihwar-web`	Next.js 14 frontend (SSR + SSE)	3000 internal	Renders the workspace UI & the Blueprint viewer
`mihwar-api`	FastAPI backend	8000 internal	Auth, data, async-link issuing, all business logic
`mihwar-worker`	arq async worker	—	Stage 3 synthesis, embeddings, scheduled jobs
`mihwar-aiproxy`	LiteLLM gateway	4000 internal	Single egress to Anthropic + Voyage; cost meter; cache
`mihwar-redis`	Redis 7	6380 internal	Queue, session store, rate-limit counters, response cache
`mihwar-postgres`	Postgres 16 + pgvector	5435 internal	Persistent state. Nightly backup. RLS enforced.

All containers run on a private Docker network mihwar_net. Only Caddy (managed by Coolify) is exposed to the public internet on 80/443. Postgres and Redis ports are never bound to host or to the public Apex network.

Request flow — typical write path

Browser → Caddy (TLS, HSTS, CSP applied) → mihwar-web.
Server component issues fetch to mihwar-api with the user's session cookie.
API authenticates the session, attaches verified identity to the request context (contextvars), generates or accepts request_id.
API authorises the action against tenant + workspace ownership.
Long path: API enqueues a job on Redis (with request_id + user_id + tenant_id in payload) and returns 202 + job ID. Worker consumes, calls aiproxy, persists result.
Short path: API writes to Postgres directly, returns 200.
Streaming path (Stage 1 chat, Stage 3 progress): API holds an SSE connection, relays tokens from aiproxy as they arrive.

Single egress — why aiproxy

One key in one place. The Anthropic key lives only in the aiproxy environment. Worker / API / web never see it.
Cost meter. Every call is logged with model, input tokens, output tokens, cache hit, cost. See AI Economics.
Rate cap. Per-tenant + per-feature soft caps with hard refusal at threshold.
Model swap. Switching from Anthropic to a regional sovereign-cloud LLM in Phase 2 is a config change in aiproxy, not a code change in 14 places.
Cache. aiproxy can layer response-level caching for deterministic prompts (catalog questions, glossary expansions).

Outbound allowlist

The VPS firewall (UFW) restricts outbound traffic to:

api.anthropic.com, api.voyageai.com — from aiproxy only.
Coolify update servers, OS package mirrors — for system updates.
Backup target (object storage in a separate region) — encrypted, signed.

Anything else is denied by default. This kills two attack classes at once: data exfiltration via a compromised container, and prompt-injection-driven outbound calls.

Scale flag Scale

One VPS handles V1's load comfortably (~5 active engagements concurrently, <200 req/min peak). Phase 2 needs horizontal scale: read replicas, multiple worker nodes, Redis Sentinel, Anthropic key sharding. The architecture supports this — the data model is sharded by tenant_id and the work surfaces are stateless except for Postgres + Redis. See Multi-Tenancy.

Previous← Org Infra Profile NextData Model →

Part C · Architecture

Data model

17 tables. Multi-tenant from day one. Versioned artifacts. Immutable audit log. Designed so Phase 2 doesn't require a rewrite.

Schema overview

Table	Purpose
`tenants`	The org owning a Mihwar instance. V1 has exactly one row (NMO). Phase 2 has many.
`users`	People who can log in. Belongs to a tenant.
`sessions`	Login sessions. Cookie-bound, expiry-tracked, regenerated on login, IP-bound (soft).
`service_principals` new	Non-user callers: aiproxy, worker, async-form-submitter, cron. Each has its own credential type.
`clients`	The end-customer organisation NMO is consulting for. Belongs to a tenant. Owns Org Profiles.
`org_profiles` new	The persistent infrastructure profile of a client. Versioned. Field-level encrypted at rest for sensitive sections.
`workspaces`	One per client engagement. The unit of work. References the Org Profile version it started from.
`workspace_members`	Which users have access to which workspace, at what role.
`stage_artifacts`	The output of each stage, per workspace. Versioned: every signoff creates a new immutable row.
`messages`	Conversational log per stage — every AI exchange, every consultant entry. Linked to `request_id`.
`catalog_entries`	Vendors, models, frameworks, patterns, constraints. Tenant-scoped (Phase 2 supports tier system).
`questions`	Discovery question bank. Tenant-scoped, multilingual.
`async_links`	Per-stakeholder async form URLs. Time-limited, single-use, IP-logged.
`async_responses`	Answers submitted via async links.
`blueprints`	Compiled Blueprint exports. Stored as both structured JSON and rendered HTML, with manifest hash.
`audit_log`	Immutable. Every privileged action — signoffs, edits, exports, recommendations.
`ai_calls`	Every aiproxy call: input/output tokens, cache hits, model, cost, workspace, request_id, latency.

Versioning rules

All artifacts (stage_artifacts, blueprints, org_profiles) follow the same versioning pattern:

New row per signoff — never UPDATE.
version column auto-increments per parent.
signed_by + signed_at on the row that becomes "current".
parent_version link for diffing.
Working draft kept in a separate *_draft column or table; only frozen on signoff.

The tenant boundary

-- every business table has tenant_id with NOT NULL
ALTER TABLE workspaces ADD COLUMN tenant_id UUID NOT NULL
  REFERENCES tenants(id);
CREATE INDEX idx_workspaces_tenant ON workspaces(tenant_id);

-- row-level security enforced at the DB layer
ALTER TABLE workspaces ENABLE ROW LEVEL SECURITY;
CREATE POLICY ws_tenant_isolation ON workspaces
  USING (tenant_id = current_setting('app.tenant_id')::uuid);

-- API sets the session-local var on every request
SET LOCAL app.tenant_id = '01HV8Z…';

Indexes — the non-negotiable list

Every business table: (tenant_id) first, then (tenant_id, workspace_id) compound.
messages(workspace_id, stage, created_at DESC) — chat history retrieval.
stage_artifacts(workspace_id, stage, version DESC) — load latest version fast.
audit_log(tenant_id, actor_id, created_at DESC) — operator Logs page queries.
ai_calls(tenant_id, created_at DESC), plus (tenant_id, feature, created_at DESC) — cost dashboards.
async_links(token_hash) unique — single lookup on form load.
pgvector: HNSW index on catalog_entries.embedding for RAG retrieval.

Scale flag Scale

Pagination is keyset, not OFFSET. Long lists (messages, audit_log, ai_calls) use (created_at, id) < cursor. OFFSET on a 5M-row audit_log will scan from the start every page; keyset stays O(log n). See Observability Logs page.

Migrations

Alembic. Every migration declares its indexes with CONCURRENTLY for tables expected to grow past 100k rows (messages, ai_calls, audit_log). Migrations are reviewed in PR before being applied — no auto-apply on deploy.

Previous← System Architecture NextStack Choices →

Part C · Architecture

Stack choices

Every choice annotated with why. The bias is toward boring, well-documented, swap-safe technology — the kind a future contributor will thank us for.

Backend

Choice	Why
Python 3.12	The AI ecosystem is Python-native. Anthropic SDK, vector DBs, embeddings — all Python-first.
FastAPI	Modern async framework, OpenAPI auto-gen, Pydantic-driven request validation.
SQLModel + SQLAlchemy 2.0	One model serves database + API. No drift between schema and types.
Alembic	Mature schema migration. Boring on purpose.
asyncpg	Fastest Postgres driver in Python.
arq	Lightweight Redis-backed task queue. Idempotency keys, retries, DLQ.
Anthropic SDK (Python)	First-party. Streaming, tool use, prompt caching, extended thinking.
LiteLLM	The aiproxy. Single egress, model swap, cache, cost.
structlog	Structured JSON logs with auto context. See Observability.
OpenTelemetry SDK	Traces. Quiet in V1, ready for distributed in V3.

Frontend

Choice	Why
Next.js 14 (App Router)	Server components reduce JS shipped to browser. Perfect for the Blueprint viewer.
TypeScript	Catches errors at build time. Required for a multi-month codebase.
Tailwind CSS	Utility-first. Lets the LLM (Claude Code) write consistent components without designing from scratch each time.
shadcn/ui (selected)	Composable, accessible. Lifted into the repo, not added as a dependency.
Zod	Shared validation between client and server. Pydantic models at the API end, Zod schemas at the form end, both generated from the same source.
SWR	Client-side caching for read endpoints. Optimistic updates for the workshop UI.

Data & vectors

Choice	Why
Postgres 16	RLS, JSONB, generated columns, extensions. The default for everything.
pgvector + HNSW	Catalog has <5k entries — pgvector handles it well at this scale. Phase 2 may justify a dedicated vector store; pgvector is the right starting point.
Voyage AI embeddings	Strong multilingual including Arabic. Paid API, kept behind aiproxy.
Redis 7	Queue, session, rate-limit, response cache. One tool, four jobs.

Infra & ops

Choice	Why
Hostinger KVM VPS	Predictable cost, root access, KSA-adjacent regions. Sufficient for V1 throughput.
Coolify	Self-hosted deployment platform. Git-driven deploys, rollbacks, env management.
Caddy (managed by Coolify)	Automatic TLS, HSTS, CSP injection.
Docker Compose	Five containers, one VPS. Kubernetes is overkill at this scale.
UFW	Outbound allowlist, default deny.
Cloudflare	DNS, DDoS shield, optional country-restriction rules.
Bahrain S3-compatible object storage	Encrypted backup target, separate region.

Why no microservices

Microservices are a horizontal-scaling pattern. Mihwar V1 has one tenant and a handful of users. Microservices would buy nothing and cost weeks of build time, more failure modes, harder local development. The shape of "5 containers, one VPS" lets us ship the workflow in 6 weeks. Phase 2 may eventually warrant horizontal scaling — but that's a graduation move, not a starting point.

Boring is a feature

Every choice on this page can be hired against in KSA today. Every choice has 5+ years of production track record. Every choice has a clear "what would replace this" answer if it ever needs to change.

Previous← Data Model NextMulti-Tenancy →

Part C · Architecture

Multi-tenancy strategy

V1 has one tenant. Phase 2 may have hundreds. The data model and security boundaries are designed today so the V3 pivot is a deployment change, not a rewrite.

The core decision

Bake tenancy in from day one. This is the #1 architectural decision in this masterplan. Every AI startup that "added multi-tenancy later" rebuilt their backend at month 9. — Architectural commitment

The cost of doing it now is one column on a few tables and one Postgres feature (RLS). The cost of doing it later is months of refactoring while engagements are paused.

The three tenancy levels

Level	Status	Where it lives	What it gives
1 · Schema-aware	V1	`tenant_id` column on every business table; index leads with it	Cheap query scoping; trivial to add
2 · Row-level security	V1	Postgres RLS policies use `app.tenant_id` session var	DB enforces tenant isolation even if app has a bug
3 · Tenant context plumbing	V1	FastAPI dep extracts tenant from session, sets `SET LOCAL app.tenant_id` per request	Application layer is incapable of cross-tenant queries by accident
4 · Per-tenant DEK	V1 for sensitive fields	KMS-wrapped data encryption keys, one per tenant	Field-level encryption for Org Profile sensitive sections; tenant deletion = key deletion
5 · Schema-per-tenant	Phase 2 enterprise tier	Dedicated schema per tenant, switched via `search_path`	Stronger isolation for regulated subscribers
6 · DB-per-tenant	Phase 2 sovereign tier	Dedicated Postgres instance per tenant, deployed in-region	Hard residency, full backup separation

Cross-tenant tests

Every CI run executes a "tenant fence test": create two tenants, two users, two workspaces. Authenticate as user-A. Try to read user-B's workspace, message, audit log, blueprint. Assert 404 (not 403 — 403 leaks the existence of the resource). The test fails the build if any cross-tenant read returns data.

Tenant deletion (Phase 2)

When a Phase 2 tenant cancels and confirms erasure:

Org Profile DEK is destroyed in KMS — encrypted fields become unrecoverable.
All tenant_id-scoped rows are hard-deleted in a single transaction.
Backup retention for that tenant is honoured (90 days) then purged.
An audit log entry is written to a separate tenant-deletion ledger (kept indefinitely for compliance reasons).

A common mistake to avoid

Do not stuff tenant_id into the JWT and trust it client-side. The client never names its own tenant. The server resolves session_id → user_id → tenant_id on every request and uses the server-resolved value. Trusting client-supplied tenant IDs is one of the top sources of multi-tenant data leaks.

Previous← Stack Choices NextClient Security & PDPL →

Part C · Architecture

Client security & PDPL

Mihwar handles client data — sometimes sensitive infrastructure inventories, sometimes regulated information. Security is not a sprinkle on the end; it's a structural choice baked into the architecture.

The threat model

Mihwar must defend against, in order of likelihood:

Accidental data exposure. Bug returning the wrong client's data. Mitigated by RLS at the database layer.
Compromised API keys. Anthropic key leaked. Mitigated by aiproxy as single egress and Anthropic's per-key rate limits + outbound allowlist.
Stolen session token. Mitigated by short cookie lifetime, httpOnly, SameSite=Strict, IP-binding (soft), regenerate-on-login, MFA on the passphrase.
Unauthorised async-link access. Mitigated by single-use cryptographically-random tokens, time expiry, IP logging, generic 404 on invalid.
Malicious prompt injection in client docs. Mitigated by input quarantine (untrusted data in a delimited <document> block, never as system prompt), tool-use isolation, output filtering before any tool invocation.
Mihwar host compromise. Mitigated by isolated networks, encrypted backups offsite, no shared secrets between containers, OS hardening, regular patching. See Mihwar's Own Security.
Insider error. Mitigated by per-user audit log, separation of admin vs operator roles, two-person sign-off for destructive actions (V3).

Authentication

Single passphrase + TOTP MFA for V1. Passphrase Argon2id-hashed (memory cost ≥64 MB, time cost ≥3, parallelism 1). MFA code stored as TOTP secret encrypted at rest.
Account lockout with exponential backoff after 5 failed attempts within 15 min. Logged.
Session cookies: httpOnly, Secure, SameSite=Strict, ≤8h lifetime, sliding refresh. Tokens are secrets.token_urlsafe(32) — 256 bits of entropy. Stored as SHA-256 hashes in the DB.
Session regenerated on login (no session fixation). Old session ID invalidated on logout.
Phase 2: SSO via OIDC (Microsoft Entra, Google Workspace, Okta) and SAML for enterprise tier.

Caller identity model new

Every API call identifies its caller before any work. Caller types are explicit and disjoint, each with its own credential mechanism:

Actor type	Credential	Where it lives	Example
`user`	Session cookie (Argon2id-derived)	Browser, httpOnly	Ahmed running a Lab
`service`	Service token (random ≥256-bit)	Container env, never logged	Worker calling api
`agent`	Tool-use token, scoped per call	Issued per-job by API	aiproxy-driven tool call
`webhook`	HMAC-signed payload	Signing secret rotated quarterly	Async form submission
`cron`	Service token, restricted to cron paths	Coolify env	Nightly catalog re-embed

Verified identity is attached to the request context (contextvars) and used for every downstream check. Permission is checked against the verified caller, never against client-supplied identifiers. Rate limit applies per verified identity, not IP alone.

Authorisation

Default DENY at framework layer. Every endpoint declares its required permission explicitly.
Object-level ownership check on every read/write. "Does this user have access to this workspace?" — answered server-side, no exceptions.
tenant_id in every query at the app layer + RLS at the DB. Two layers of defence; the second one catches the first one's bugs.
Async-link tokens are scoped to a single question + recipient + workspace and expire. They are not session tokens.

Input validation

Zod / Pydantic schema at every API boundary. Reject malformed, never "clean".
Body size limits at HTTP layer (1 MB default; 10 MB for known upload paths).
File uploads: validate MIME + extension + magic bytes; UUID filename server-side; outside web root; AV scan on receive (ClamAV); never executed.
SQL: parameterised queries only. Dynamic identifiers (rare — ORDER BY columns) come from a server-side allowlist.
Command injection: never pass user input to shell. Argument arrays only.

XSS / CSRF / headers

Auto-escape via React. Raw HTML rendering only via DOMPurify with strict allowlist (used in Blueprint preview, never in chat).
Strict CSP: default-src 'self'; img-src 'self' data:; style-src 'self' 'unsafe-inline' fonts.googleapis.com; font-src fonts.gstatic.com; connect-src 'self' — no unsafe-eval, no unsafe-inline scripts. Nonces for inline if absolutely needed.
HSTS preloaded. X-Content-Type-Options: nosniff, Referrer-Policy: strict-origin-when-cross-origin, X-Frame-Options: DENY, Permissions-Policy tightly restricted.
Strip Server / X-Powered-By.
CSRF: SameSite=Strict cookies + double-submit token on state-changing endpoints from Next.js Server Actions.

Data protection

TLS 1.2+ everywhere. HSTS enabled.
Sensitive fields at rest: Org Profile sensitive sections (cloud account IDs, security tooling vendor names, regulatory pointers) and the discovery inventory free-text fields are field-level AES-256-GCM encrypted using per-tenant DEKs wrapped by a master KEK in KMS.
Display masking: sensitive fields show •••• •••• until explicitly revealed; reveal is audit-logged.
Data minimisation: Stage 2 captures structural facts, not raw documents. If the consultant pastes a vendor contract into chat, Mihwar warns and recommends extracting only the structural answer.
Retention: active workspaces indefinite; closed workspaces 5 years (PDPL records-of-processing rationale); audit log indefinite or per-legal; chat messages 2 years; ai_calls 1 year aggregated then summarised.

PDPL compliance

The Saudi Personal Data Protection Law applies whenever Mihwar processes personal data of KSA residents. Key obligations:

Lawful basis. Discovery interviews capture infrastructure data and stakeholder names — processed under "performance of a contract" with NMO's client. Consent collected separately for case-study reuse.
Data residency. If a client requests residency, NMO can deploy a dedicated Mihwar instance in a Bahrain or Riyadh region. Default Hostinger VPS is sufficient for most engagements but does not meet strict residency for SAMA-Tier-1 banks. Phase 2 enterprise tier ships with explicit residency contractual commitments.
AI calls. Anthropic's API processes data outside KSA. Disclosed in client engagement agreement. For residency-sensitive clients, Phase 2 sovereign tier routes through a regional model deployment via aiproxy.
Data subject rights. Right of access (export Org Profile + Blueprint history). Right of erasure (tenant deletion flow above). Right of correction (edit Org Profile, versioned).
Breach notification. Documented runbook (see Ops Handbook) — SDAIA notification within 72h for qualifying breaches.

Prompt-injection defence

Untrusted input — pasted client docs, async form responses, third-party content — is treated as data, not instruction:

Wrapped in delimited blocks (<document index="1">…</document>) in the prompt. The system prompt instructs the model to treat block content as data only.
Tool-use is gated: any tool that would write to the database, send an email, or call an external API requires explicit consultant confirmation in the UI before execution. The AI cannot autonomously act.
Output goes through a guardrail check before any side-effect: if the AI emits an unexpected tool call, malformed JSON, or attempts to address the user with apparent privileged instructions, the call is rejected and logged as a suspected injection.

Dependency hygiene

Pinned versions, lockfiles committed (uv.lock or poetry.lock for Python; pnpm-lock.yaml for JS).
CI runs pip-audit + pnpm audit + trivy fs+image + gitleaks. HIGH or CRITICAL fails the build.
One-paragraph justification on every new dependency in the PR description.
Quarterly dependency review in addition to CI gating.

Errors & information disclosure

Production errors return: {"error":"internal","reference":"ERR-7K2P9X"}. The reference ID maps server-side to the full stack trace + request_id + tenant_id + user_id. Stack traces, paths, and schema info never reach the client. The Logs page lets the operator look up any reference ID in 5 seconds.

CRITICAL flag Security

Any HIGH or CRITICAL security risk identified in this section is fixed before deployment, not behind a feature flag. The list above is the working contract; deviations require explicit, dated, written exceptions.

Previous← Multi-Tenancy NextMihwar's Own Security →

Part C · Architecture · New

Mihwar's own security & infrastructure

We give clients $30k consulting on AI architecture security. We will not run a sloppy host. This page is the rigour we apply to Mihwar itself — what we lock down, how we patch, where the keys live, what the backups look like, who responds when something breaks.

The principle

A consultancy that runs sloppy infrastructure cannot credibly sell architecture advice. Mihwar's own posture is the first thing a security-aware client will probe — and it had better answer well. — Operating standard

VPS hardening — day one

Non-root user. No login as root. SSH key-only, password auth disabled.
SSH: port moved off 22 (low-effort but cuts ambient scan noise), fail2ban with bans on auth failure, allowed-from-IP list for the operator's static IPs (with break-glass procedure documented).
Firewall (UFW): default deny inbound; allow only :443 (Caddy), the moved SSH port, and the WireGuard endpoint for ops access. Default deny outbound; allowlist Anthropic/Voyage/Coolify/backups.
OS: unattended-upgrades enabled for security patches. Auto-reboot scheduled in low-traffic window with notification.
Auditd running with rules for SSH login, sudo, and config-file modification. Logs ship to the same structured pipeline as application logs.
No bare ports. Postgres + Redis + aiproxy + worker bind to 127.0.0.1 on the host, exposed to other containers via the Docker network only.
WireGuard ops VPN. Admin / DB / Coolify dashboards reachable only inside the VPN, not on public internet.

Secrets management

Secret	Where it lives	Rotation
Anthropic API key	aiproxy env (Coolify-injected)	Quarterly + on suspicion
Voyage API key	aiproxy env	Quarterly
Postgres superuser	Coolify-managed, never in repo	Annual
App DB user	Coolify env, least-privilege	Annual
Session signing secret	Coolify env, ≥256-bit	Quarterly
HMAC webhook secret (async forms)	Coolify env	Quarterly
Blueprint signing key (Ed25519)	aiproxy env, archived versions kept for verification	Annual
KMS master key	External KMS (DigitalOcean / Hetzner / cloud-managed)	Annual + on suspicion
Per-tenant DEK	KMS-wrapped in DB; plaintext only in app memory at request time	On tenant request or annually
Backup encryption passphrase	Offline copy in a 1Password vault + sealed envelope physically held	Annual

Never in source. A pre-commit hook (gitleaks) and CI scan reject any push that looks like a secret. The .env.example file is committed with placeholder values; the real .env is gitignored and lives only on the VPS via Coolify.

Database safety

App connects as a least-privilege role (no DROP, no ALTER, no TRUNCATE). Migrations run as a separate role only during deploys.
Daily logical backup via pg_dump, encrypted client-side with the backup passphrase, shipped to off-region object storage. 30-day retention.
Weekly base backup + WAL archiving for PITR (point-in-time recovery up to 7 days).
Quarterly restore drill — restore yesterday's backup into a sandbox, confirm checksum match, walk a smoke test, document timing.
Backup encryption verified by openssl enc -d dry-run weekly via cron; alert on failure.

Container safety

Images pinned by digest, not :latest.
Read-only root filesystem where the app permits (Postgres and Redis need writable; api / web / aiproxy / worker can all run RO).
Drop all Linux capabilities except those the process actually needs. No --privileged.
Secrets injected via env, never baked into images. Build args reviewed.
Docker socket NOT mounted into any application container.
trivy image scans every image at build time; HIGH/CRITICAL fails the deploy.

CI/CD security

GitHub Actions runners use OIDC to fetch deploy credentials — no long-lived secrets in repo settings.
Branch protection on main: required reviews, required status checks (lint, types, tests, vuln scan, gitleaks).
Signed commits enforced (Ahmed's GPG key documented).
Deploy is a Coolify webhook fired by CI on green main. Deploys produce a release tag + git SHA + image digest record.
Rollback is one click in Coolify or one command on the VPS — last 5 deploys retained.

Incident response

A documented runbook in /srv/mihwar/runbooks/incident.md on the VPS itself (so it's available even if the website is down). Phases:

Detect. Alerts (cost spike, 5xx burst, auth-failure burst, backup failure).
Triage. Severity classification: data exposure / availability / cost / minor.
Contain. Standard containment per severity. For suspected key leak: aiproxy rotates the Anthropic key immediately and revokes; new key activated within 10 minutes.
Communicate. Active engagement clients told within 24h if their data plausibly affected. SDAIA notification within 72h for qualifying PDPL breaches.
Restore. Per the playbook for each scenario.
Postmortem. Blameless within 7 days. Lessons feed the catalog and the runbook.

Disaster recovery

Scenario	RPO	RTO	Procedure
VPS lost (provider outage)	≤24h (last backup)	≤4h	Provision new VPS via Terraform-recipes (kept in repo); Coolify recovery; restore latest backup; rotate all secrets; validate.
Database corruption	≤1h (WAL)	≤2h	PITR to last clean point; replay missed work from messages log + audit log; client notification if signoffs invalidated.
Anthropic API outage	—	—	aiproxy fails open with a "synthesis temporarily unavailable" UI message. Background queue retains jobs; resumes on recovery.
Key compromise	—	≤30 min to rotate	Runbook drives rotation: Anthropic key, session signing, KMS keys (with re-wrap), HMAC, Ed25519 signing.
Single-passphrase compromise	—	≤10 min	Force logout all sessions, rotate passphrase + TOTP, audit-log review for unexpected actions.

Monitoring & alerts

Health checks: /health on api & web; Caddy probes them every 30s. Down for >2 min → Pushover alert to Ahmed.
Cost alerts: hourly aiproxy cost > threshold (e.g. $10/h sustained 2h) → alert. Daily $200+ → page.
Auth alerts: 10+ failed logins in 5 min from any IP → notify; specific user 5+ in 15 min → forced lockout + alert.
Backup alerts: nightly backup not produced by 03:00 → page. Backup checksum failure → page.
Disk: <15% free → notify; <5% → page.
Certificate: Caddy auto-renews; alert if renewal fails.

Annual security tasks

External penetration test against staging environment (post-Phase 1, before Phase 2 launch).
Restore drill with timing measurement.
Key rotation: KMS master, signing keys, all long-lived secrets.
Dependency review beyond CI: pruning unused, evaluating maintenance state.
Access review: who has VPS access, who has Coolify dashboard access, who has DB access — and is that still right?
Runbook tabletop: walk an incident scenario end-to-end with the team.

What gets deferred to Phase 2

SOC 2 Type II evidence (start collecting in V1; certify after the second full year of operation).
Bug-bounty program.
WAF in front of Caddy (Cloudflare provides much of this for free; managed WAF added when client demand justifies).
Customer-facing security portal with SOC reports + sub-processor list.
Dedicated SIEM. (V1 ships logs to a structured pipeline — see Observability — and grep-via-Logs-page covers V1 needs.)

Posture summary

None of the above is exotic. All of it is achievable in 6 weeks for a disciplined operator. The whole point: "boring, well-executed, documented" beats "modern, half-implemented, undocumented" every time. This is the rigour we sell.

Previous← Client Security & PDPL NextAI Economics →

Part C · Architecture · New

AI economics

Mihwar is built on Claude. Claude is the most expensive line item in the operating cost. The discipline that keeps a $25k Blueprint at 75% margin in Phase 1 — and makes a $1,200/mo subscription affordable in Phase 2 — is on this page.

The unit economics, modelled

Before any feature ships, we model: cost per call × calls per Blueprint × Blueprints per month. The targets:

Stage	Calls / Blueprint	Avg cost / call	Cost contribution
Stage 1 · Lab (Sonnet · streaming)	~30 turns	$0.05–$0.12	~$1.50–$3.50
Stage 2 · Discovery filtering (Haiku)	~5 calls	$0.01–$0.03	~$0.10
Stage 2 · Async prompt drafting (Haiku)	~10 calls	$0.01	~$0.10
Stage 3 · Synthesis (Sonnet · ext. thinking)	1–3 generations	$3–$8	~$5–$20
Stage 4 · Playbook generation	1–2 generations	$2–$5	~$3–$8
Embedding catalog reads (Voyage)	~50 lookups	$0.0005	~$0.03
Total per Blueprint (target)			$10–$32

At a $25k Blueprint, AI cost is ≤0.13% of revenue. The discipline below is what keeps it there.

The seven levers

1 · Smallest capable model

Two-tier routing throughout. Haiku handles: discovery question filtering, async prompt drafting, glossary expansion, classification (is this an inventory question or a use-case question?), single-turn lookups, simple tool selection. Sonnet handles: Lab interviewing, architecture synthesis, Playbook generation, anything where reasoning quality matters. Never use Opus unless an unsolved-for-Sonnet workload appears — and that becomes a separate budgeted decision.

2 · Prompt caching (every static prefix)

The catalog snapshot, the house style guide, and the system prompt for each stage are cached via cache_control: ephemeral. Order: stable → variable. Verify on call #2+ that cache_read_input_tokens > 0; if zero, the prefix is drifting (timestamp, random tool order, mutable preamble). Hit rate target: ≥80% for repeated within a 5-min cache TTL.

messages = client.messages.create(
  model="claude-sonnet-4-6",
  system=[
    {"type":"text","text":HOUSE_STYLE,
     "cache_control":{"type":"ephemeral"}},
    {"type":"text","text":CATALOG_SNAPSHOT,   # ~50k tokens
     "cache_control":{"type":"ephemeral"}},
    {"type":"text","text":STAGE3_PROMPT,
     "cache_control":{"type":"ephemeral"}},
  ],
  messages=conversation_history,
  max_tokens=4096,
)
# log: input_tokens, cache_read_tokens, cache_creation_tokens

3 · Batch API for non-realtime work

50% off — used for: nightly catalog re-embedding, retroactive question generation when the catalog changes, eval runs against past Blueprints to spot regressions, scheduled summarisation of long workspace histories. Anything tolerating >seconds latency.

4 · Context discipline

Never dump full conversation history. Past turns >20 are summarised into a "rolling synopsis" by Haiku and re-injected.
Never pass the full catalog. RAG with Voyage embeddings → top-K (typically 8–15 entries).
Cap max_tokens on every call. Stage 1 turn: 1024. Stage 3 synthesis: 8192. Async draft: 256.
Ask for terse output / structured JSON in the system prompt. "No preamble. JSON only." saves 10–20% output tokens.
Stream + cancel for user-cancellable surfaces (chat). Cancel kills the call mid-token; tokens to that point are still billed but the rest is not.

5 · Response caching

Hash (model, prompt, tools, temperature) → cache the response in Redis for hours when the prompt is non-personalised (catalog-only Q&A, glossary expansions). Semantic cache for near-duplicate queries (cosine ≥ 0.95) — not used in V1 but designed-for. Pre-compute predictable queries on a schedule (e.g. "expand each catalog entry into a one-paragraph summary" — done in Batch API, served from cache).

6 · Cheaper alternatives first

Before any LLM call, the question: is there a regex / SQL aggregation / classical-ML / rules path that gets us to the answer 100×–10,000× cheaper? Examples in Mihwar:

Email validation in async forms — regex, not Sonnet.
Stage 2 question selection when the use case category is well-known — rules-based filter, not "ask the LLM which questions to ask".
Markdown rendering of artifacts — server-side renderer, not "ask Sonnet to format this nicely".
PII scrubbing in logs — regex blocklist, not LLM.

7 · Agentic loop budgets

Any tool-using flow caps max_iterations (default 10) AND max_tokens_per_session (default 30k). Tool selection is done by Haiku where possible. Tool results are cached. Independent tool calls are parallelised. The loop refuses on hitting a budget rather than spending unbounded.

Per-call observability

Every aiproxy call writes a row to ai_calls:

{
  "request_id": "01HV…",
  "tenant_id": "nmo-001",
  "user_id": "ahmed",
  "workspace_id": "ws-0042",
  "feature": "stage3.synthesis",
  "model": "claude-sonnet-4-6",
  "input_tokens": 52800,
  "cache_read_tokens": 50100,
  "cache_creation_tokens": 0,
  "output_tokens": 4200,
  "latency_ms": 38400,
  "cost_usd": 0.279,
  "cache_hit_rate": 0.949
}

Per-feature / per-tenant cost dashboards

The operator Logs page (see Observability) includes a Cost view: per-feature bar chart for the last 30 days, per-tenant ranking, anomaly highlights (a tenant burning 10× their normal rate). Drilling in shows the calls behind any bar.

Hard budget caps

Per-tenant monthly cap. Soft warn at 80%, hard refuse at 100%. Phase 1: $500 default for NMO (well above expected). Phase 2: tied to subscription tier.
Per-feature daily cap. Stage 3 synthesis: 50 generations / day / tenant. Beyond is rate-limited with a clear UI message.
Per-user 1-min cap. Anti-runaway: 20 calls in 60 seconds → temporary cooldown.

Phase 2 cost discipline Phase 2

Phase 2 changes the math: many tenants, lower revenue per Blueprint, more risk of pathological usage. Discipline tightens:

Subscription tiers include Blueprint-count caps (Starter: 3 Blueprints/yr; Team: unlimited within $1,200/mo notional cost; overage charged).
Per-tenant aiproxy budget enforced atomically: counter incremented in Redis, hard refused at threshold, transparent UI.
Cheap-path features promoted: Org Profile reuse cuts ~40% of Stage 2 cost on repeat Blueprints; the cost saved becomes Phase 2 margin.
Free-tier-evaluation: a "draft Blueprint" mode using only Haiku for evaluation prospects, with clear "upgrade to full" CTA.

Cost flag Cost · the rule

If a feature would push expected cost-per-Blueprint > $40, it is redesigned (cheaper model, cached prefix, RAG instead of dump, batch instead of realtime) before shipping. There is no "we'll optimise later" — later is when the cost has already trained users to expect the feature.

Previous← Mihwar's Own Security NextObservability & Logs →

Part C · Architecture · New

Observability & the Logs page

Every product Mihwar produces ships with an operator Logs page on day one. We hold ourselves to the same standard: when a client says "something happened on Tuesday at 3pm", an operator can reconstruct it in 60 seconds.

The principle

Logs are not for grep'ing on the day of an incident. Logs are the system's memory. Mihwar's logs let an operator at 2am, six months from now, answer: which user did what, with what data, when, with what result, and what did the system do downstream?

The mandatory envelope

Every line of every service is structured JSON, one event per line, with this envelope:

{
  "timestamp": "2026-05-07T14:32:18.420Z",
  "level": "info",
  "service": "mihwar-api",
  "env": "prod",
  "event": "stage.signoff",
  "message": "Stage 2 signed off",

  "request_id": "01HV8Z9K3J5XPQ8WMY4N6T2RES",
  "tenant_id": "nmo-001",
  "user_id": "ahmed",
  "actor_type": "user",
  "session_id_hash": "sha256:7c4b…",
  "ip": "91.193.x.x",
  "user_agent": "Mozilla/5.0 …",

  "workspace_id": "ws-0042",
  "stage": 2,
  "version": 5,
  "duration_ms": 14
}

Identity on every line

user_id — stable internal ID, never email. Explicit null with reason for unauthenticated paths.
tenant_id — required on every request/job line. No exception.
actor_type — user | service | agent | webhook | cron | system.
request_id — generated at the edge (Caddy via X-Request-Id if present, else minted by api). Propagates to every downstream call.
session_id_hash — for grouping a user's actions in a session without exposing the raw token.
Login attempts log the email/username attempted. Never the password attempted.

Request-id propagation

What gets logged

Event class	Examples	Reason
Auth events	`auth.login.success`, `auth.login.failure`, `auth.logout`, `auth.mfa.enrolled`, `auth.token.refresh`, `auth.lockout`	Forensic reconstruction of who-was-where.
Sensitive reads	`org_profile.read` with field list, `blueprint.export`	PDPL audit trail for personal/regulated data access.
Writes	Stage signoffs, profile updates with compact diff of changed fields	Reconstruct what changed when a client disputes a recommendation.
External calls	`aiproxy.call` with model, status, latency, retries, cost	Cost forensics; vendor incident correlation.
Jobs	`job.enqueued`, `job.started`, `job.succeeded`, `job.failed`, `job.dead_lettered`	"Why did Stage 3 never finish?" answered in 5s.
Errors	Stack trace + reference ID + tenant + user + request_id	Map a client's "ERR-7K2P" reference back to root cause.
Async link events	`async.issued`, `async.opened`, `async.submitted`, `async.expired`	Forensics on form-based data submissions.

What never gets logged

Passwords / token values / API keys / JWTs / session secrets / refresh tokens / signing keys / encryption keys / TLS private keys.
Full credit-card numbers / CVVs / bank-account numbers / full national IDs / passport numbers.
Raw request bodies for password / payment / sensitive-PII endpoints.
Authorization headers / session cookies / any credential-bearing header.
Full personal addresses / phone numbers / email addresses unless the event specifically requires them (login attempts include the username; profile updates include the changed field but with the value redacted unless it's structurally non-PII).

Scrubbing middleware

Two-layer defence. Field-name blocklist (password, token, secret, cookie, authorization, api_key, plus tenant-specific entries) recursively replaces values with ***REDACTED***. Value-pattern scrubbing catches credit-card / JWT / AWS-key shapes regardless of field name. Unit tests assert that a known sensitive payload never reaches the sink intact — these tests fail the build.

The Logs page

Every Mihwar product has a Logs page from V1 — including Mihwar itself. Operator UI features:

Filter by user_id / tenant_id / request_id / event class / level / service / time range.
One-click "all events for this request_id" — joins every line across services into a chronological view.
One-click "all events for this user_id in last N hours" — for forensic and support workflows.
One-click "trace this error reference" — paste an ERR-… code, see the stack + context.
Cost view (see AI Economics): per-feature, per-tenant, per-user.
Permission-gated: logs:read for general; logs:read:sensitive for sensitive-read events; logs:export for CSV export with audit-log entry per export.
Export with row cap (10k default) and audit log entry stating who exported what window.

Retention

Class	Hot retention	Cold retention
App logs (info)	30 days	Compressed off-host for 90 days, then deleted
Errors / warns	90 days	Off-host for 1 year
Audit log (auth, permissions, sensitive reads, admin)	1 year hot	Indefinite cold storage with integrity hashing
ai_calls	90 days raw	Aggregated (per-feature daily) kept indefinitely
Debug logs	≤7 days, off in prod by default	—

Distributed-tracing readiness

OpenTelemetry SDK is wired in V1 but quiet. Spans are created for: HTTP request, DB query, aiproxy call, queue job. Exporter is configured but pointed at a dev sink. When Phase 2 demands distributed tracing (e.g. dedicated DB tier triggers cross-host calls), turning on Tempo / Honeycomb / cloud trace is a config change, not a code change.

Alerts driven by logs

5xx burst (10+ in 60s on api) → page.
Auth-failure burst → Pushover.
Backup-not-emitted by 03:00 → page.
aiproxy cost > threshold/hour → notify.
Worker DLQ depth > 0 → notify within 15 min.
Async-link bulk submission anomaly (e.g. 50 submissions on one token in 1 minute) → flag suspected abuse.

Observability flag Observability

The Logs page is not a v2 feature. It ships in Week 4 of the V1 build. Without it, Mihwar is uninvestigable; with it, Mihwar is operable by one person at 2am.

Previous← AI Economics Next6-Week Roadmap →

Part D · Build

6-week roadmap

From empty Hostinger directory to first signed-off engagement Blueprint in six calendar weeks. Built primarily by Claude Code with Ahmed reviewing and steering.

Build philosophy

Mihwar is built in vertical slices: each week ends with something demoable, not a half-finished horizontal layer. By Week 2 there's a working login and a working Lab. By Week 4 there's a working full Stage 1 → Stage 3 → Blueprint export. Weeks 5 and 6 are polish, AR localisation, and dogfooding on a real client engagement.

Week 1 · Foundation — VPS, Docker, schema, auth

Provision mihwar.nmopartners.com on Hostinger.
Set up mihwar_net with Docker Compose: postgres, redis, aiproxy, api skeleton, web skeleton, worker.
Implement single-passphrase auth + TOTP scaffolding.
Define and migrate the 17-table schema (incl. service_principals, org_profiles).
Set up Coolify deployment with branch-protected GitHub repo.
VPS hardening per Mihwar's Own Security: SSH, UFW, fail2ban, WireGuard.

Demo at end of week: Ahmed logs in to an empty workspace UI on a real domain, healthcheck green, all containers running, backups firing nightly.

Week 2 · Stage 1 Lab — fully working

Workspace shell (sidebar, stage panel layout, theme toggle).
Stage 1 end-to-end: chat surface, streaming AI responses (Sonnet via aiproxy), live-updating 1-pager artifact panel, signoff button.
Seed catalog with ~10 sample entries (just enough to test).
House-style prompt + banned-phrases filter live.
Prompt caching wired and verified (cache_read_input_tokens > 0 by call 2).

Demo: Ahmed runs a full Lab session with a real client, produces a 1-pager.

Week 3 · Stage 2 Discovery + Org Profile

Discovery taxonomy + question selection logic (Haiku-driven).
Org Profile schema + settings UI + field-level encryption with per-tenant DEK.
Async link issuing, form rendering at /async/{token}, response capture.
Stage 2 readiness meter live.
Stage 3 unlock gate enforced server-side.

Demo: a Stage 2 inventory completed across two live answers + three async-form submissions, with Stage 3 cleanly unlocked.

Week 4 · Stage 3 Architecture synthesis + Logs page

arq worker + Stage 3 synthesis job.
Catalog RAG via pgvector + Voyage embeddings.
SVG diagram auto-generation from component manifest.
Blueprint v1 compile (HTML render with all sections).
Logs page MVP: filter by request_id / user_id / tenant_id, "all events for this request" join.
ai_calls table + cost view in Logs page.

Demo: empty workspace → Stage 1 → Stage 2 → Stage 3 → click "Compile Blueprint" → bilingual HTML opens. Logs page shows the entire chain by request_id.

Week 5 · Stage 4 Playbook + AR localisation polish

Stage 4 implementation (build plan, risk register, vendor short-list, RFP spec template).
Full AR translation review of UI + Blueprint render. RTL polish.
Presentation mode for Blueprint walkthrough.
Manifest signing (Ed25519).
Catalog seeding from AI Ecosystem Primer (target 80–120 entries).

Demo: a full Tier 2 engagement walked end-to-end with all five stages, bilingual export, signed manifest verifiable.

Week 6 · Dogfood + ship

Run the first paying engagement on the live tool.
Track every paper-cut Ahmed hits — fix the breaking ones, file the rest.
External penetration test booked (runs in Week 8 against staging clone).
Backup restore drill — full recovery from yesterday's backup into a sandbox.
Final audit: master pre-commit checklist (see top of doc) — every box ticked or explicitly deferred with date.

Demo: First $25k Blueprint shipped. Mihwar is real.

Slip discipline

If Week 4 doesn't land Stage 3 working, the Playbook (Week 5) drops first — never compromise the gate or the Logs page. Mihwar without those is a different product, less defensible.

Previous← Observability & Logs NextClaude Code Prompts →

Part D · Build

Claude Code prompts

Five sequential prompts covering the build. Each prompt is self-contained and is run inside a single Claude Code session. Run them in order. After each one, review the diff, commit, and proceed.

Pre-flight

Before running any prompt:

SSH to the Hostinger VPS as a sudo-capable user.
Confirm Coolify is running and accessible at the WireGuard-only admin URL.
Confirm *.nmopartners.com resolves to the VPS.
Have these env vars ready: ANTHROPIC_API_KEY, VOYAGE_API_KEY, ADMIN_PASSPHRASE (Argon2id-hashed at boot), SESSION_SIGNING_SECRET (≥256-bit), HMAC_WEBHOOK_SECRET, BLUEPRINT_SIGNING_KEY (Ed25519 private), KMS_MASTER_KEY_ID.
Create empty directory /srv/mihwar/.
Create empty GitHub repo Arcahmed93/mihwar (private), with branch protection on main.
Configure pre-commit hook with gitleaks + ruff + mypy + biome.

Prompt 1 · Foundation

Scaffold the project, set up the database schema, implement single-passphrase auth, get Mihwar deployable on Hostinger via Coolify.

You are building Mihwar, a private consulting cockpit for AI use-case
discovery. This prompt scaffolds the project, sets up the database schema,
implements single-passphrase auth, and gets Mihwar deployable on Hostinger
via Coolify.

# DEPLOYMENT TARGET
- Hostinger KVM VPS, working directory /srv/mihwar/
- Subdomain mihwar.nmopartners.com (DNS already resolves to the VPS)
- Reverse proxy + TLS managed by Caddy via Coolify
- Containers on a NEW Docker network called mihwar_net (do NOT join apex_net)

# SIX CONTAINERS
1. mihwar-postgres — Postgres 16 + pgvector, volume mihwar_pg_data,
   port 5435 internal only
2. mihwar-redis — Redis 7, port 6380 internal only
3. mihwar-aiproxy — LiteLLM proxy, routes claude-* via Anthropic and
   voyage-* via Voyage. Port 4000 internal only
4. mihwar-api — Python 3.12 / FastAPI / SQLModel / asyncpg / arq client,
   port 8000 internal only
5. mihwar-worker — same image as api, runs arq worker
6. mihwar-web — Next.js 14 (App Router) / TypeScript / Tailwind / shadcn,
   SSR, port 3000 internal only

# FOUNDATIONAL RULES
- Pin every image by digest. No :latest.
- Containers run as non-root. Read-only root FS where possible.
- App DB user is least-privilege; migrations run as a separate role.
- gitleaks pre-commit hook in repo. CI runs trivy + pip-audit + pnpm audit.
- Structured JSON logging from line one (structlog in Python; pino in Node).
- request_id middleware on api: accept X-Request-Id, else mint ULID.
- Caller-identity context: contextvars carrying user_id, tenant_id,
  actor_type, request_id. Every log line emits these via a structlog processor.

# SCHEMA (17 tables — see masterplan p-data)
Generate the SQLModel definitions and an initial Alembic migration.
Every business table has tenant_id NOT NULL with an index leading on it.
Enable Postgres RLS on every business table; policies use
current_setting('app.tenant_id')::uuid.

# AUTH
Single-passphrase login with Argon2id (memory_cost=65536, time_cost=3).
TOTP enrolment endpoint (issues secret + QR via otpauth URL, stored encrypted
under the per-tenant DEK). Session cookies: httpOnly, Secure, SameSite=Strict,
8h sliding. Sessions stored as SHA-256 hash of token.
Account lockout: 5 failures in 15min → 15min cooldown, exponential.

# CALLER IDENTITY
service_principals table seeded with:
  - svc:worker (token in env, used for worker→api calls)
  - svc:aiproxy (token in env, used for api→aiproxy calls)
  - svc:cron (used for nightly jobs)
  - webhook:async-form (HMAC verifier for /async/* submissions)

# DELIVERABLES
- /srv/mihwar/docker-compose.yml
- /srv/mihwar/api/ (FastAPI app, models, migrations, auth, identity)
- /srv/mihwar/web/ (Next.js scaffold, login page, theme toggle)
- /srv/mihwar/aiproxy/ (LiteLLM config, env)
- /srv/mihwar/worker/ (arq worker entrypoint)
- /srv/mihwar/.env.example with placeholders
- /srv/mihwar/Caddyfile (TLS, HSTS, CSP, security headers)
- /srv/mihwar/runbooks/ (incident.md, backup-restore.md, key-rotation.md)
- README.md with one-command bootstrap
- A green CI run, an opening commit, and a green Coolify deploy.

# DONE WHEN
Visiting https://mihwar.nmopartners.com presents the login page,
correct passphrase + TOTP yields an empty workspace UI, healthcheck
endpoint returns 200, structured JSON logs flow with request_id +
tenant_id + user_id on every authenticated line, and a sample
async-form GET returns a generic 404 for an unknown token.

Prompt 2 · Stage 1 Lab

Build Stage 1 of the Mihwar workflow: the Ideation Lab.

# UI
- Workspace shell with persistent sidebar (workspace list,
  current workspace, stage navigator).
- Stage 1 panel: three sub-panels — chat (left), 1-pager artifact (right),
  signoff bar (bottom).
- Theme toggle, EN/AR toggle.

# CHAT
- Streaming via SSE from /api/v1/workspaces/{ws}/stages/1/messages.
- Each turn enqueues a synchronous (not background) Sonnet call via aiproxy
  with prompt-caching on the system prompt + house style.
- Verify cache_read_tokens > 0 by turn 2; log it.
- Cap max_tokens at 1024 per turn.
- Persist every turn in messages with request_id, user_id, tenant_id.

# SYSTEM PROMPT (cached)
"You are a Socratic AI use-case interviewer for NMO Partners… [full prompt
in /srv/mihwar/api/prompts/stage1.md]"

# ARTIFACT (1-PAGER)
- Live-rendered structured object: USE_CASE, PAIN, USER, TODAY, TARGET,
  BLAST, INPUTS, DECISION_OWNER, OUT_OF_SCOPE.
- Updated incrementally as the conversation progresses (the AI emits
  structured updates which the renderer applies).
- Versioned on signoff. signoff button calls /api/v1/.../sign with a
  confirmation modal.

# CATALOG
Seed catalog_entries with 10 sample entries (provided in seed.json).
Stage 1 doesn't query the catalog yet; it's used in Stage 3.

# DONE WHEN
Ahmed runs a 30-turn Lab against a sample use case ("AI for our customer
voice line") and ends with a frozen v1 1-pager. Logs page shows every turn
joined by request_id. Cost view shows the lab session cost broken out.

Prompt 3 · Stage 2 Discovery + Org Profile

Build Stage 2 of the Mihwar workflow plus the Org Profile foundation.

# ORG PROFILE
- New table org_profiles, versioned, tenant-scoped, linked to clients.
- Field-level encryption (AES-256-GCM) for sensitive sections using a
  per-tenant DEK. DEK created on tenant creation, KMS-wrapped, stored as
  ciphertext in tenants.dek_wrapped. App decrypts in-memory per request.
- Settings UI under /workspace/{ws}/profile to edit; versioned on save.
- Display masking by default; reveal explicit, audit-logged.

# STAGE 2
- Discovery taxonomy seeded in questions table (CSV + script).
- "Filter questions" Haiku call: given the Stage 1 1-pager + Org Profile
  baseline, return the ~30 questions that need fresh answers.
- Stage 2 panel shows question list grouped by domain, each with:
  status (unasked / answered / sent-async / awaiting / blocking),
  inline answer, "send as async" button.
- /async/{token} endpoint:
  - validates token (single-use, time-limited, tenant-scoped)
  - renders a clean form with the one or two questions
  - HMAC-signs submissions
  - rate-limited
  - generic 404 on invalid/expired
- Readiness meter computed server-side; Stage 3 unlock blocked until
  blocking-set is empty (or consultant explicitly overrides with reason
  captured in audit_log).

# DONE WHEN
A Stage 2 round-trip works end-to-end: 5 questions answered live,
3 async links sent and submitted, readiness reaches 100%, Stage 3
unlocks. Org Profile updated from Stage 2 deltas with confirmation.
Logs page shows: async.issued, async.opened, async.submitted events
per token. No sensitive value appears in any log line.

Prompt 4 · Stage 3 Architecture + Blueprint v1 + Logs page

Build Stage 3 synthesis, the Blueprint compiler, and the operator Logs page.

# STAGE 3
- arq job stage3.synthesise:
  inputs = stage1_artifact, stage2_inventory, org_profile, catalog_rag(top_K=12)
  flow = aiproxy → claude-sonnet-4-6 with extended thinking, prompt-cached
  catalog snapshot. max_tokens=8192. Streams progress to the api which
  forwards via SSE.
- Output: structured JSON manifest with components, data-flow nodes,
  trade-offs, alternatives, open-questions, compliance-overlay.
- Auto-render SVG layered diagram + data-flow diagram from manifest.
- stage_artifacts.v1 stored on completion. Re-runs create v2, etc.

# BLUEPRINT
- /api/v1/workspaces/{ws}/blueprint/compile job:
  takes latest stage_artifacts, renders to a single HTML file using
  /srv/mihwar/web/templates/blueprint.html (server-side render with
  inlined CSS and inlined SVG). Manifest signed with Ed25519.
- Stored in blueprints table; downloadable + viewable in-browser.

# LOGS PAGE
At /admin/logs (gated by logs:read permission):
- Filters: time range, user_id, tenant_id, request_id, event class, level.
- "Trace request_id": joins all events with that request_id from api +
  worker + aiproxy logs into a chronological timeline.
- "Trace user_id": last N hours of all events.
- "Trace error reference": paste ERR-… → the full stack trace + context.
- Cost view: ai_calls aggregated per feature / tenant / user / day.
- Export to CSV (capped 10k rows; logs:export permission; audit-logged).

# DONE WHEN
A workspace progresses cleanly from empty → 1-pager → inventory → synthesis
→ Blueprint HTML download → walkthrough mode. The Logs page reconstructs
every step. Stage 3 synthesis costs < $10 per run with cache hit rate
> 80%.

Prompt 5 · Stage 4 Playbook + AR localisation + signing + dogfood

Wrap up Mihwar V1: Stage 4 (Build Playbook), full Arabic localisation,
Blueprint manifest signing, and dogfooding hooks.

# STAGE 4
- Five outputs: 6-week build plan, risk register, vendor short-list,
  reference repos pointer (Tier-3 only), RFP spec (optional).
- Tier flag on workspace controls which outputs are produced.
- Each output is editable in the UI before signoff.

# AR LOCALISATION
- Translate the UI shell using Mihwar AR pack (provided).
- RTL layout for AR mode (logical CSS properties; no physical
  margin-left/right).
- Blueprint render in AR uses Amiri for body, Plus Jakarta for
  numerals/code; the manifest carries language tags.

# MANIFEST SIGNING
- On Blueprint compile, the manifest JSON is canonicalised
  (RFC 8785 JCS), hashed (SHA-256), signed with Ed25519
  using BLUEPRINT_SIGNING_KEY. Public key embedded for offline
  verification.

# DOGFOOD HOOKS
- /admin/feedback inline form for Ahmed to log paper-cuts during
  the first real engagement; entries auto-tagged with the workspace
  and request_id at the moment.

# DONE WHEN
A full Tier-2 engagement runs end-to-end, EN and AR Blueprints both
render correctly, manifest verifies via the embedded public key, and
the engagement Blueprint is the first $25k delivery.

Working with Claude Code

For each prompt: open a fresh Claude Code session in the repo, paste the prompt, let it scaffold, then iterate in small commits. Don't run prompt 2 in the same session as prompt 1 — start fresh so context stays clean. Review every diff. Reject what doesn't match the masterplan; the model will accept the correction.

Previous← 6-Week Roadmap NextOps Handbook →

Part D · Build · New

Operations handbook

Mihwar runs as a single-operator service with a small team layered in over time. This is the day-to-day playbook: deploys, on-call, change windows, customer-facing incidents.

Deploy cadence

Active engagement weeks: deploy ≤ once per day, only outside the client's working hours (KSA 06:00 GMT+3 — 18:00). Ship-right-now-please reserved for security or correctness fixes.
Quiet weeks: trunk-based, ship as needed.
Schema migrations: reviewed in PR. Big-table migrations use CREATE INDEX CONCURRENTLY + chunked backfills. Never apply in the middle of a Stage 3 synthesis.
Release tagging: every prod deploy creates a tag vYYYY.MM.DD-HHmm-sha. Coolify retains last 5 deploys for one-click rollback.

On-call (V1)

Ahmed is on call 24/7 in V1. The job: respond to alerts within 2 hours during the working day, 4 hours overnight. Pager is Pushover on a personal device.

Sev-1 (data exposure / total outage during active engagement): drop everything.
Sev-2 (degraded service / non-critical alert): respond before next business day.
Sev-3 (cosmetic / forecast): triage in next standup with self.

Standard incidents — short playbooks

Symptom	First 5 minutes	Resolution path
500s spiking	Check Logs page → top error references for the spike window. Identify offending endpoint.	Rollback if regression; hotfix if data shape; communicate if external dependency.
aiproxy cost spike	Logs cost view → which feature, which tenant, which user.	If runaway loop: kill jobs, raise `max_iterations` floor. If catalog cache miss: fix cache_control. If legitimate: confirm with consultant.
Worker DLQ filling	Logs page → DLQ events → root cause for the type of job failing.	Fix and replay. If transient (Anthropic 5xx), wait + retry from DLQ.
Backup didn't fire	Cron status, disk, backup target reachability.	Trigger manually. If recurring, ticket runbook fix.
Suspected key leak	Rotate the suspected key in Coolify (single command). Force logout all sessions.	Audit log review for the exposure window. Communicate per DR table.
Async link mis-issued (wrong recipient)	Revoke the token via `POST /admin/async/revoke`. Confirm not consumed.	Re-issue to correct recipient. Audit trail captured.
Customer "I can't see my Blueprint"	Logs by user_id → most recent compile event → status.	If failed: reproduce in staging; if version mismatch: re-compile.

Change-management discipline

Every change to main goes through a PR with at least one reviewer (Ahmed reviews Claude Code's PRs; another consultant reviews Ahmed's, when one exists).
CI must be green: lint, types, tests, vuln scans, gitleaks.
Schema changes have a "rollback notes" section in the PR description. If rollback isn't safe, reviewer pushes back.
"Boring change" exceptions: copy edits, README, comment-only diffs — solo merge allowed.

Customer communication

For active engagement clients, communication is direct (Ahmed → CTO). For Phase 2 customers, a status page (status.mihwar.app) is published from V1's Day 1 even though it has nothing on it; this normalises the surface for when it matters.

Outages affecting a live engagement: client notified within 30 minutes by Ahmed directly.
Data exposure (any plausible): client notified within 24 hours; SDAIA within 72 hours if PDPL-qualifying.
Maintenance: 48h notice for non-trivial windows; mid-night by default.
Security advisories from upstream: reviewed and disclosed within 7 days if customer-affecting.

Adding a second operator

When NMO hires consultant #2, the handoff:

SSH access via the WireGuard VPN (their key only; never share Ahmed's).
Coolify dashboard read-only.
Postgres DB user: scoped read of audit_log only; no app-write privileges.
Pushover added to the alert routing.
Tabletop runbook walkthrough — incident scenario end-to-end before they're on call.
First 30 days: Ahmed reviews every PR. Second 30 days: reviewer + author rotates.

Quarterly operating cadence

Catalog review — prune, refresh, add. Documented "what changed and why" per quarter.
Engagement retro — every Blueprint shipped that quarter, what worked, what didn't, what feeds the catalog.
Cost review — actual aiproxy spend vs forecast; per-feature optimisations identified.
Security review — CVE sweep beyond CI gating, dependency pruning, access list audit.
Restore drill — annual minimum; quarterly preferred.
Forecast refresh — pipeline, capacity, burn — feeds Phase 2 trigger evaluation.

Previous← Claude Code Prompts NextSaaS Path →

Part E · SaaS Phase

The SaaS path — Phase 2 overview

If V1 succeeds, Mihwar evolves from a consultant's cockpit into a self-serve platform clients run themselves. This section sketches what that looks like — the product, the billing, the go-to-market — so V1 architecture stays compatible.

When Phase 2 becomes real

Three triggers, any one of which validates the pivot:

Demand pull. ≥10 distinct prospects ask "can we get a Mihwar login" in any 6-month window.
Capacity ceiling. NMO's consultant team is fully booked, pipeline stronger than capacity, and adding consultants doesn't scale margin.
Catalog moat is mature. NMO's catalog reaches 300+ entries with quarterly review cycles, making it a deliverable in itself.

Until then, V1 stays disciplined. Phase 2 too early kills the consulting margin.

What changes

Concept	Phase 1	Phase 2
Tenants	1 (NMO)	Many (each subscriber org)
Auth	Single passphrase + TOTP	SSO (OIDC, SAML), invite-only first, public sign-up later
Billing	Engagement invoices (manual)	Stripe, per-seat or per-Blueprint, with metered overage
Catalog	NMO's, used internally	NMO premium tier (read-only, paid) + customer-private tier (writeable)
Templates	Hard-coded NMO branding	Per-tenant theming, custom logos, optional white-label
Consultant role	Drives every engagement	Optional 2-hour expert-review at Tier; otherwise self-serve
Operator	Ahmed	Customer admin per tenant + NMO meta-admin
Support	Email + WhatsApp	In-app chat, knowledge base, ticketed

What stays the same

The five-stage workflow.
The Architecture Gate.
The Blueprint format.
The catalog schema (just expanded with tiers).
The data model (multi-tenant from day one).
The Org Profile concept (becomes central).
aiproxy as single egress.
RLS at the database layer.
The Logs page.

Phase 2 build budget — sketch

Estimating from scratch when triggers fire:

Auth migration to OIDC + SAML: 2 weeks.
Stripe integration + billing pages: 2 weeks.
Self-serve onboarding flow + first-Blueprint guide: 2 weeks.
Embedded coaching surfaces (tooltips, "show me an example", inline catalog samples): 2 weeks.
Per-tenant theming + white-label: 1 week.
Catalog tiering enforcement + customer-private writes: 1 week.
Status page, public marketing site, pricing page: 1 week.
Polish, telemetry, beta program: 2 weeks.

Total: ~13 weeks (≈3 months) for Phase 2 v1, assuming V1 architecture has held the line. Funded by ~5 V1 engagements at $25k.

The risk of getting this wrong

Diluting the consulting brand. If Phase 2 launches before NMO has 10+ shipped Blueprints, the SaaS sells "AI strategy in a box" — a generic value prop that competes with $99/mo tools, not $25k engagements.
Self-serve UX without the discipline. If the gate doesn't survive contact with non-expert users, Mihwar's differentiator dies. Embedded coaching has to enforce the gate, not soften it.
Cost runaway. Phase 2 multiplies usage. Without the AI-economics discipline holding, margin collapses. See AI Economics.
Cross-tenant leak. Phase 2 is the moment one bad query becomes a regulatory event. The cross-tenant fence test must be ironclad before public sign-up opens.

The phasing principle

Phase 1 earns the right to Phase 2. The signals that earn that right are real money + real Blueprints + real testimonials, not hopes. When the triggers fire, the team that earned the consulting brand turns it into a product. That's the order.

Previous← Ops Handbook NextSelf-Serve Product →

Part E · SaaS Phase · New

The self-serve product spec

What changes about the experience when a client — not a senior NMO consultant — drives the workflow. The engine stays the same; the surfaces around it must compensate for the absence of the consultant in the room.

The core challenge

In Phase 1, the consultant interprets, refines, pushes back. In Phase 2, the user is alone with the AI. Without compensation, three failure modes appear:

Surface confusion. The user doesn't know what "blast radius" means and abandons the question.
Hallucinated confidence. The AI accepts a vague answer and produces a Blueprint that misrepresents the use case.
Gate erosion. The user is impatient, can't be talked through Stage 2 by a human, and either gives up or pressures Mihwar to skip it.

The compensations

1 · Embedded coaching

Each prompt in Stage 1 ships with three affordances:

"What good looks like" example — collapsible card showing a sample answer derived from a real (anonymised) past Blueprint.
Inline tooltip with a one-paragraph definition of the term.
"I'm stuck — coach me" button that asks Haiku for a context-aware question rephrasing or a suggestion of who in their org to ask.

2 · Org Profile-driven personalisation

The Profile is the engine of the self-serve experience. A user who's been on Mihwar 6 months has a Profile that pre-fills 70%+ of every Stage 2 they touch. The third Blueprint they make takes a third of the time of the first.

3 · Smarter gate-enforcement

The gate adapts when the consultant isn't there. Instead of "go ask your DBA", Mihwar offers:

A pre-filled email template ("Here's the question I need answered. Forward to your DBA.").
An "invite a colleague to fill this slice" link — a workspace member at limited role.
A "skip with admission" path: the user can mark a question "I don't know and have no way to find out" — the architecture is then synthesised with that explicitly noted in the Blueprint as an open question, not silently glossed.

4 · Optional expert review

For Tier "Team" and above, the user can pay for a 2-hour NMO expert review of their Blueprint draft before signoff. The reviewer reads the Blueprint, leaves margin notes, has a 30-minute call with the user, signs off the result with NMO's seal. This is the bridge between self-serve and consulting — and a high-margin upsell.

The onboarding flow

Sign-up — email + workspace name. Email verified.
SSO setup (Team+ only) — connect Microsoft Entra / Google / Okta.
Org Profile wizard — guided 8–12 question version of the Profile (full version takes longer; the wizard captures the high-leverage stuff first).
"Your first Blueprint" guide — a guided Stage 1 with extra coaching density.
Catalog browse — the user reads through the NMO catalog, picks 5 entries to "favourite" (drives recommendation tailoring).
First Blueprint generated — at this point, normal coaching density resumes; user is "on board".

Workspace roles (Phase 2)

Role	Permissions
Owner	Workspace admin, billing, can invite, can delete.
Editor	Run stages, edit artifacts, request signoff.
Reviewer	Read-only access plus comment on artifacts.
Contributor	Limited access — fill assigned Stage 2 slices, no Stage 3 access.
NMO Reviewer (paid)	External NMO consultant invited for Tier expert review.

Public Mihwar product surfaces

mihwar.app — public marketing site, pricing, sign-up.
app.mihwar.app — the application itself.
status.mihwar.app — public status page.
docs.mihwar.app — documentation, video walkthroughs, tutorials, sample Blueprints.
nmopartners.com/mihwar — Phase 1 cockpit URL preserved for NMO's continued internal use.

What the Phase 2 user can't do

Skip the gate without admission.
Override the catalog to recommend an arbitrary vendor.
Generate a Blueprint with sensitive client data they're not authorised on (workspace permissions enforced).
Export beyond their tier's monthly cap without an upgrade prompt.
Bypass NMO's premium catalog (read-only) — they can add private entries, not edit NMO's.

The product principle

Phase 2 is not "Mihwar with the consultant ripped out". It is "Mihwar where the discipline of the consultant is encoded into the surface." Coaching, gating, expert-review upsells — these are how the discipline survives self-serve.

Previous← SaaS Path NextBilling & Tiers →

Part E · SaaS Phase · New

Billing & tiers

Phase 2 plans, what each includes, and the metering that makes them work without runaway cost.

The four plans

Plan	Price	Audience	Includes
Starter	$1,200/mo or $9,600/yr	Single AI champion at a mid-market enterprise	1 workspace · 3 Blueprints/yr · premium catalog read-only · EN/AR · email support · standard branding
Team	$3,500/mo	5-seat AI office	5 seats · unlimited Blueprints (within fair-use cost cap) · custom branding · SSO · 1 expert review/qtr included · priority support
Consultancy	$25k/yr + per-Blueprint	Boutique AI shops licensing Mihwar for their clients	White-label · multi-client workspaces · customer-private catalog tier · NMO catalog as premium · API access · per-Blueprint metering ($150 each beyond 50/yr included)
Enterprise	Custom (from $80k/yr)	Large org with strict residency / SSO / audit needs	Dedicated tenant in-region · BYO IDP · audit export · contractual residency · SLA · dedicated support

Metering

Two meters, both implemented atomically in Redis:

Blueprint count — incremented at compile success. Reset on plan period (monthly or annual).
aiproxy cost — incremented on every aiproxy call by the call's cost_usd. Soft warn at 80% of plan-implied budget; hard refuse at 110% (the 10% headroom prevents incidents from a clean compile failing for a single dollar).

Stripe integration

Stripe customer = tenant. One subscription per tenant.
Subscription items: base plan + metered overage components (per-Blueprint for Consultancy, per-seat for Team).
Payment failures handled per dunning rules: grace 7 days, then suspend writes (reads still allowed for export/download), then suspend reads after 30 days, then hard offboard after 90 days with the tenant-deletion procedure.
Invoices include the cost-meter window report (with rounding) — transparency feeds trust.

Promo / pilots / trials

Free 14-day pilot — Starter-tier features, capped at 1 Blueprint. Requires email + light KYC for KSA buyers.
Conversion incentive: 30% off year-1 if a pilot user converts within 30 days.
NMO-introduced clients get a Concierge code that bundles Starter for 6 months at $0 if they signed an engagement. This protects the consulting margin while letting Phase 2 build the case-study set.

Cost-to-serve modelling

Plan	Expected Blueprints/yr	AI cost	Other cost	Margin at sticker
Starter	3	~$60–$100	~$120 (infra share, support)	~97%
Team	~20	~$400–$700	~$1,800 (incl. 1 expert review)	~94%
Consultancy	50–150	~$2,500–$5,000	~$3,000 (white-label support)	≈75–80%
Enterprise	varies	varies	dedicated infra share + named CSM	≈60–70%

Margins look generous — they assume V1's AI economics discipline survives. Without prompt caching, batch usage, two-tier model selection and tenant cost caps, those numbers degrade fast. See AI Economics.

Phase 2 invoicing & tax

VAT applied per KSA rules (currently 15%) for KSA-resident buyers.
Withholding tax handled per buyer's regulatory regime — captured in onboarding KYC.
Currency SAR primary; USD available for international.
Receipts and invoices generated automatically; archived for 10 years per local accounting standards.

Cost flag Cost

The Consultancy and Enterprise tiers can become loss-makers fast if a single sub-tenant of a Consultancy buyer drives extreme cost. Per-sub-tenant caps inside Consultancy plans are designed-for in V1's data model and turned on at Phase 2 launch.

Previous← Self-Serve Product NextPhase 2 GTM →

Part E · SaaS Phase · New

Phase 2 go-to-market

How Mihwar opens to the public when the trigger fires. A 12-week launch plan from "trigger met" to "first 20 paying tenants."

Pre-launch — weeks 1–4

Build (per SaaS Path): auth, billing, onboarding, embedded coaching, per-tenant theming.
Marketing site: mihwar.app pricing page, 5–7 case studies from V1 engagements (anonymised), bilingual.
Documentation: 5 explainer videos (one per stage), 20 KB articles, sample Blueprint downloads.
Sales motion: NMO's existing pipeline gets the first invitation; conversion incentive offered.

Beta — weeks 5–8

Closed beta: 8–12 invited users from prior NMO engagements + 4 boutique AI consultancies. Free Starter for the duration.
Weekly user-feedback sessions. Each session feeds catalog updates and UX patches.
Beta SLA: 4-hour response on issues; each beta user has a direct line to Ahmed.
Beta exit criteria: at least 8 Blueprints generated by users without intervention; cross-tenant fence test passes; cost-per-Blueprint within model.

Public launch — weeks 9–12

Signup opens. Concierge code reserved for NMO-introduced clients (preserves consulting margin).
Cohort 1 marketing push: KSA tech press, LinkedIn, Vision-2030-aligned content.
NMO email list (estimated ~600 senior-CTO contacts) gets a launch announcement with case studies.
Pricing live; 14-day pilot live.
Weekly cohort check-ins: monitoring activation rate, time-to-first-Blueprint, cost-per-Blueprint, support volume.
Status page goes from "internal" to public.

Acquisition channels

Channel	Phase 2 fit	Effort	CAC ceiling
NMO existing pipeline	Highest — warm, in-market	Low	~10% of ARR
LinkedIn thought leadership (Ahmed)	Direct — KSA AI champions follow	Med	~15% of ARR
SDAIA / Vision-2030 conferences	Strong for government tier	High	~25% of ARR
Boutique AI shops (Consultancy plan)	Two-sided lever — they bring their clients	Med	~30% of ARR
Public docs & SEO	Long compounding — start day 1	Med	~5% of ARR
Paid ads	Avoid in V1 — low intent	—	—

Phase 2 success metrics

Metric	Month 3 target	Month 12 target
Paying tenants	20	120
ARR	$200k	$1.5M
Activation rate (sign-up → first Blueprint)	40%	60%
Time-to-first-Blueprint	≤14 days	≤7 days
Net revenue retention	—	≥110%
Avg cost-per-Blueprint	≤$30	≤$25
NPS (customer survey)	≥40	≥55
Phase 1 → Phase 2 cannibalisation	<10% engagement loss	0% (Phase 1 is now upsell)

Phase 2 ↔ Phase 1 relationship

Phase 2 isn't a replacement for Phase 1 — it's a complement. Mihwar's full motion at year-2 looks like:

Top of funnel: Self-serve Starter customers explore AI use cases inside Mihwar. Catalog ships them to a $25–60k engagement when their use case is ambitious.
Mid-funnel: Team customers run Blueprints solo, occasionally pay for an expert-review upsell.
Top of value: Consultancies licensing Mihwar drag NMO into joint engagements at the strategic layer; NMO Apex picks up the build.
Bottom of value: Enterprise tier funds the dedicated-infrastructure and compliance roadmap, which strengthens every other tier.

The right ordering

Phase 1 builds the brand. Phase 2 turns it into volume. Both halves are needed; neither half is sufficient. The masterplan is built for both.

Previous← Billing & Tiers NextSuccess Metrics →

Part F · Operations

Success metrics

What "Mihwar is working" actually means, measured in numbers Ahmed can read off a dashboard. Lead measures predict business outcomes; lag measures confirm them.

Three tiers of metrics

Mihwar's metrics fall into three tiers. Tier 1 is the only one Ahmed checks daily. Tier 2 is reviewed weekly. Tier 3 is the quarterly retrospective.

Tier 1 — Business outcomes

Metric	V1 target	Why it matters
Engagements signed per quarter	5+ by Q3	The revenue line. Below 3 means Mihwar isn't shifting deals.
Blueprint price realised (avg)	$20k+	Below this, NMO is competing on price not on quality.
Conversion: Blueprint → Build	≥30%	The most important number. Mihwar's whole thesis. Below 20% means Blueprints aren't selling next-stage work.
Margin per Blueprint	≥65%	Engagement P&L test. Below this, the tool isn't compressing time enough.
NPS from Blueprint recipients	≥50	Survey delivered 30 days after Blueprint signoff. Drives word-of-mouth referrals.

Tier 2 — Operating health

Metric	V1 target	Why
Time-to-Blueprint	≤7 working days	The core promise. Engagements that overrun erode the value proposition.
Stage 2 → Stage 3 cycle time	≤4 days	Discovery is the bottleneck Mihwar exists to fix. Trend down.
Catalog growth rate	+5 entries/month	Compounding IP. Stagnant catalog means Mihwar isn't learning.
aiproxy cost per Blueprint	≤$30 (P50), ≤$60 (P95)	Margin discipline; verifies prompt caching, two-tier model, batch.
aiproxy cache hit rate	≥80%	Direct verification of AI Economics discipline.
Stage signoff rework rate	<15%	Stages reopened after signoff. Above 15% means the AI's output isn't trustworthy.
Async response rate	≥70% within 7 days	If async forms aren't being filled, Stage 2 stalls.
Uptime (rolling 30 days)	≥99.5%	Engagements get cancelled by 12-hour outages.
Backup success rate	100%	Anything below 100% is a Sev-2 incident.
5xx rate	<0.5%	Above this, the Logs page becomes the daily site.

Tier 3 — Quarterly health

Catalog freshness: ≥80% of entries reviewed within last 6 months.
Engagement retro coverage: 100% of completed Blueprints retro'd within 14 days.
Cross-tenant fence test: green on every CI run; deviations = Sev-1.
Restore drill: performed at least once per quarter; documented timing.
Security review: dep audit, key rotation, access review.
Phase 2 trigger evaluation: demand-pull count, capacity utilisation, catalog size.

The dashboard

Previous← Phase 2 GTM NextRisks →

Part F · Operations

Risks

What could go wrong, ranked by likelihood × impact, with concrete mitigations. The discipline of writing risks down is half the mitigation.

Engagement risks

R-1 · Client refuses to do discovery HIGH × HIGH

The CTO wants the architecture deck now and is impatient with Stage 2 questions. This is the #1 expected friction point.

Mitigation: Sales script up-front: "We do discovery before architecture. That's not negotiable. It's why our deliverable doesn't fall apart in your procurement committee." If a client truly won't do Stage 2, NMO walks. The gate is the product.

R-2 · Client gives wrong inventory data MED × HIGH

Stage 2 captures what the client believes their environment looks like. Reality occasionally diverges. Architecture lands, build starts, surprise.

Mitigation: Stage 2 captures source-of-truth pointers (DBA name, dashboard URL) for every claim. Every architecture component cites its inventory source. Trade-offs section explicitly flags assumptions. Build phase starts with a 1-day "validate Stage 2" sprint.

R-3 · Blueprint conversion to build below target MED × HIGH

Below 20%, the whole productisation thesis weakens.

Mitigation: 30-day post-Blueprint follow-up call mandatory. Common conversion blockers tracked, fed back into Stage 4 templates and the catalog. NPS survey identifies dissatisfaction before it becomes lost revenue.

Product / technical risks

R-4 · Cross-tenant data leak LOW × CATASTROPHIC

A bug in a query lets one tenant see another's data. In Phase 1 this is one bug; in Phase 2 it ends the company.

Mitigation: RLS at DB layer + tenant_id in every app query (defence in depth). Cross-tenant fence test in CI on every commit. Schema-per-tenant for Enterprise tier. Manual audit of every new query that joins multiple workspace_id values. See Multi-Tenancy.

R-5 · Anthropic API outage during a live engagement LOW × HIGH

Stage 1 is mid-conversation; Stage 3 is mid-synthesis; Anthropic returns 5xx for an hour.

Mitigation: aiproxy retries with exponential backoff. Background jobs survive transient failures via DLQ. Stage 1 chat shows a "service temporarily unavailable" banner without losing draft state. Multi-region key support designed-for; failover provider candidacy reviewed quarterly.

R-6 · Cost runaway MED × HIGH

A bug or pathological prompt drives 10× expected aiproxy spend.

Mitigation: Hard tenant + per-feature daily caps in aiproxy. Cost-spike alert at $10/h sustained. Logs cost view drives same-day diagnosis. Budget gate on agentic loops. See AI Economics.

R-7 · Prompt injection from client document LOW × MED

Client pastes a contract; the contract carries a hidden instruction to exfiltrate data via a tool call.

Mitigation: Untrusted input wrapped in delimited blocks. Tools require human-in-the-loop confirmation for any side effect. Outbound allowlist blocks unauthorised destinations. Output guardrail rejects unexpected tool calls. See Client Security.

R-8 · House-style drift MED × MED

Over 3–6 months the Blueprint voice creeps toward generic LLM tone — exclamation marks, "I'd love to help!", emoji.

Mitigation: Banned-phrases filter in aiproxy rejects offending output. Quarterly Blueprint review by Ahmed catches subtle drift. House-style prompt versioned and updated based on observed regressions.

Business risks

R-9 · KSA market consolidates around large suppliers MED × HIGH

Vision-2030 procurement vehicles favour mega-vendors; boutique consultancies are squeezed out of preferred-supplier lists.

Mitigation: Speed of Mihwar's deliverable (7 days) gives NMO an entry point that mega-vendors cannot match. The Consultancy Phase 2 plan turns the squeeze into an opportunity (boutiques license Mihwar). Government tier pursued via Tier 2 Playbook + RFP spec deliverables.

R-10 · Phase 2 cannibalises Phase 1 LOW × MED

Self-serve Phase 2 erodes the perceived value of $25k consulting engagements.

Mitigation: Phase 2 priced for the segment that wouldn't have engaged a $25k consultant anyway. Concierge codes preserve the engagement margin for NMO-introduced clients. Expert-review upsell within Phase 2 funnels into Phase 1 work.

R-11 · Single-operator key-person risk HIGH × HIGH

Mihwar is one Ahmed. Anything happening to Ahmed is an existential risk.

Mitigation: Hire consultant #2 by Q3 (operational redundancy). Documented runbooks for every system surface. Backup passphrases sealed-envelope held by trusted party. Insurance review.

R-12 · Regulatory change (PDPL / SDAIA) LOW × MED

New PDPL implementing regulation tightens residency or processing rules.

Mitigation: Multi-tenancy levels 5 + 6 (dedicated schema / dedicated DB / in-region) designed-for. aiproxy abstracts model provider. Phase 2 sovereign tier ready as escape hatch for regulated clients.

Risk review cadence

This list is reviewed quarterly. Risk status (likelihood × impact) is re-rated. New risks added; resolved risks moved to an archive. Any risk that goes up in either dimension drives a same-quarter mitigation plan, not a "we'll think about it" slot.

Previous← Success Metrics NextMonday Morning →

Part F · Operations

Monday morning

The masterplan is real only when the first action is taken. This page lists, in order, the concrete actions Ahmed takes in the first working week to turn this document into a running app.

Day 1 · Decisions & domain

Re-read this masterplan in one sitting. Note any disagreement. Edit before moving.
Confirm the domain: mihwar.nmopartners.com for Phase 1; reserve mihwar.app for Phase 2.
Provision the Hostinger KVM VPS (4 vCPU, 16 GB RAM minimum). Point DNS.
Create empty private repo Arcahmed93/mihwar. Apply branch protection on main.
Open Anthropic + Voyage accounts. Generate scoped API keys. Store in 1Password.
Order the Pushover license. Configure on Ahmed's phone.
Identify the first paying client to dogfood with — secure the pre-engagement agreement so Week 6 has a real engagement on the tool.

Day 2 · VPS hardening

Provision non-root user. Disable password SSH. Move SSH port. Apply fail2ban.
UFW: default deny, allowlist :443 + custom SSH + WireGuard.
Set up WireGuard, install on Ahmed's laptop and phone.
Install Coolify (over WireGuard endpoint).
Generate Ed25519 Blueprint signing key; archive private half securely; embed public half in repo.
Configure off-region object-storage backup target with encryption passphrase.

Day 3 · Run Prompt 1

Open fresh Claude Code session in /srv/mihwar/.
Paste Prompt 1 from Claude Code Prompts. Steer through scaffolding.
Review every diff. Reject what doesn't match the masterplan.
Commit, push, watch CI go green, watch Coolify deploy.
Visit https://mihwar.nmopartners.com. Login page renders. Auth works. Healthcheck green.
Verify nightly backup fires (manual trigger to test).
Verify outbound allowlist blocks an unauthorised destination (try curl from a container — should fail).

Day 4 · Run Prompt 2

Fresh Claude Code session. Paste Prompt 2.
Review the workspace shell + Stage 1 Lab implementation.
Run a sample Lab against a placeholder use case. Verify cache_read_tokens > 0 by turn 2.
Verify house-style filter rejects "I'd love to help!" if it appears.
Verify Logs page shows the conversation joined by request_id.
Verify ai_calls table is populating with cost data.
Commit, deploy, dogfood with one warm prospect over Zoom — capture friction.

Day 5 · Plan Week 2 & communicate

Triage the friction list from the dogfood Lab.
Land 3–5 quick fixes; defer the rest.
Send a one-paragraph update to NMO's mailing list: "Mihwar V1 is being built; first engagements available from Week 6." Soft-book first paying engagement.
Schedule Day-1-of-Week-3 Prompt 3 session.
Restore drill: pick yesterday's backup, restore into a sandbox container, smoke test, document timing.
Pre-commit master checklist (top of doc): walk every applicable box; flag any that won't be met by Week 6.

By end of Week 6

First paying engagement Blueprint shipped on a real client.
External penetration test booked for Week 8.
Catalog has 80+ entries seeded from the AI Ecosystem Primer.
Logs page operational; one real ERR-… reference traced end-to-end.
Cost-per-Blueprint inside the $30 cap.
Cache hit rate ≥ 80%.
Backup + restore drill green.
Phase 2 trigger log started — first prospect that asks "can we get a login" gets recorded with date.

The first 90 days

Days 1–42: Build & ship V1 (the 6-week roadmap).
Days 43–60: Ship 2 more engagements at full price. Refine catalog, fix paper-cuts.
Days 61–90: Ship 2 more. Run the first quarterly catalog review. Begin Phase 2 trigger watching in earnest.

محور · يدفع المحادثة من "نريد الذكاء الاصطناعي" إلى مخطط ${"$"}25,000 موقّع في أسبوع عمل واحد. Mihwar — pivots the conversation from "we want AI" to a signed $25k Blueprint in a working week.

The masterplan ends here, the build begins now

This document is v2 — Phase 1 + Phase 2, security baked in, AI economics modelled, observability shipped, the Org Profile concept that stops Phase 2 being an unbearable infrastructure quiz, the operator Logs page that lets one person run this at 2am six months from now. Everything that needs to be true on day one of the build is on these pages. The rest is execution.

Previous← Risks Back to startExecutive Summary →