Idea Compiler
A single page where Ahmed types an idea; Mihwar returns a stack-aware build playbook with explicit local-vs-cloud flags and agent assignments — in minutes.
Mihwar (محور · "pivot, axis") is the operating system for an AI consulting practice. It ships in three real versions and one speculative one. V0.1 is the personal MVP — a single page where Ahmed types an idea and Mihwar returns a stack-aware build playbook in minutes. V1 productizes the same engine as a 5-stage cockpit NMO uses to ship $25,000 client Blueprints. V2 opens it to clients as a self-serve SaaS. V-future is a marketplace bet kept alive only in the architecture.
Mihwar ships in four layers. Each inherits the one below. V0.1 is the next thing to build — the personal MVP that closes the loop in days. V1 productizes it for NMO consulting in 6 weeks after V0.1 validates. V2 unlocks when clients pull. V-future is a bet kept alive only so today's architecture doesn't foreclose it.
A single page where Ahmed types an idea; Mihwar returns a stack-aware build playbook with explicit local-vs-cloud flags and agent assignments — in minutes.
The cockpit NMO Partners uses to run client engagements end-to-end.
Same engine, exposed to clients. Self-serve AI visioning inside the client's organisation.
Speculative. Not committed. Listed so the architecture doesn't foreclose it.
Every implementation choice in this document — every endpoint, every query, every prompt — is checked against four questions. They appear as flags throughout the rest of the masterplan.
Would this still work at 100× current load?
How could this be abused by a hostile actor?
Could this be investigated at 2am, six months from now?
Affordable at 100× usage? Cost per user per month?
Six entry points, depending on what you came here for:
The product story. Who needs it, why now, what it replaces.
Open Vision →System architecture, data model, multi-tenancy, security boundary.
Open Architecture →Day-by-day plan from VPS Day 0 to first signed engagement.
Open Roadmap →12 named risks, mapped to mitigations baked into the plan.
Open Risks →What V2 is, when it triggers, how it sells, how it bills.
Open SaaS Path →The 7-day plan from "approved" to "first prompt running on the VPS".
Open Monday →One page. You type an idea. Mihwar returns a stack-aware build playbook — what to build, which of your agents owns each step, what runs locally on your VPS, what runs in the cloud. Three to seven days from spec to live URL. The personal MVP that ships before V1, dogfoods the engine on Ahmed's own ideas, and earns the right to build the rest.
V1 is six weeks of build before the engine compiles its first idea into a deliverable. That's too long to validate the core loop. V0.1 compresses everything that matters about V1 into a single page that Ahmed uses on himself, every day, on whatever idea is on his mind that morning. If the playbooks it produces are useful, V1 is worth building. If not, the masterplan changes before any client sees it.
use Next.js, not Astro.| Section | What it contains | Why it's here |
|---|---|---|
| 1 · Idea summary | One sentence reflecting the idea back, plus the success metric implied by it. | Confirms the engine understood. Catches misreads early. |
| 2 · Architecture | Component list. Each component flagged 🔵 LOCAL · ☁ CLOUD · 🌗 HYBRID. Includes runtime, storage, queue, frontend, observability. | Local/cloud is the whole reason V0.1 exists. Surfaces decisions that change cost, latency, and sovereignty before you build. |
| 3 · Agent assignments | For each build step, the suggested agent from your roster (e.g. PM, Dev-1, VPS Admin, Cyber). External tasks (e.g. domain registration) flagged as human-only. |
Lets you forward sections of the playbook directly to the agent that will execute them. |
| 4 · Build sequence | Ordered steps with effort estimate (≈hours), dependencies, and "definition of done" per step. | So "build playbook" doesn't mean "vague TODO list". You can start work after reading. |
| 5 · Cost estimate | Monthly cost split by 🔵 LOCAL (sunk · already paid via VPS) and ☁ CLOUD (per-API / per-month). Worst-case at 100× usage. | Mental-test #4 (Economics) baked in from idea zero. |
| 6 · Risks & unknowns | 3–5 named risks with likelihood and mitigation. Explicit "what we don't know yet" list. | Prevents the playbook from feeling more confident than it should. |
Every component in the architecture section gets one flag. The flag is the whole point of V0.1.
Postgres in a container, Coolify-managed services, n8n flows, files on disk, the cron that runs daily backups. Sunk cost — no per-call billing. Sovereignty stays with you.
Anthropic API calls, GitHub repos, Linear issues, Stripe payments, S3-compatible backup target, third-party SMTP. Per-call billing. Scales without your hardware.
Local Ollama with Anthropic API failover, on-VPS embedding model with cloud fallback for spikes, local SMTP relay routed through SES on volume. The pragmatic default for variable load.
The Operator Profile is the JSON Mihwar reads as static context on every call. It's pre-loaded for Ahmed; future operators (NMO consultants, then clients in V2) will get their own.
| Section | Examples | Update cadence |
|---|---|---|
| Infrastructure | Hostinger VPS · Coolify · Traefik · Postgres available · Redis available · WireGuard admin | Annual or on stack change |
| Cloud APIs | Anthropic API · OpenAI fallback · GitHub · Linear · n8n · Hostinger DNS | On API key rotation |
| Agents available | The agents in your Apex roster — PM · Productizer · VPS Admin · Dev-1 · Dev-2 · Data Sci · Cyber · HR · Marketing | On agent roster change |
| Personal preferences | Stack defaults (Next.js · FastAPI · Postgres) · auth library · deploy tool | Whenever taste changes |
| Constraints | VPS RAM cap · cost cap per project per month · regions allowed · sovereign-cloud requirement | Annual |
| Existing assets | Sibling products on the same VPS (e.g. Apex, n8n) · their networks · domains owned · wildcard cert availability | On infrastructure change |
| Layer | What | Flag |
|---|---|---|
| Frontend | Single Next.js page · textarea + submit · renders the returned playbook as styled HTML | 🔵 Local · runs in your VPS container |
| Backend | One Next.js API route or FastAPI endpoint · receives idea + reads Operator Profile from disk · calls Anthropic API · returns structured playbook | 🔵 Local app · ☁ Anthropic call |
| LLM | Claude Sonnet 4.6 · structured output (JSON schema) · prompt-cached static prefix (system + Operator Profile) | ☁ Cloud (Anthropic API) |
| Storage | Operator Profile lives in a single JSON file on disk · past playbooks saved as HTML files in a folder · no database | 🔵 Local |
| Auth | Behind WireGuard / IP allowlist — Ahmed only · no login UI · no tenancy logic | 🔵 Local |
| Observability | Per-call log to a JSON file: model · input tokens · output tokens · cache_read_tokens · cost_usd · request_id · timestamp · idea hash | 🔵 Local |
| Cost ceiling | Hard cap: ≤$0.50 per playbook · monthly soft alert at $20 spend (it'll never get close) | ☁ Anthropic billing |
cache_read_tokens > 0 on call #2, tighten the prompt to enforce the 6-section output.https://mihwar.nmopartners.com/v01 behind WireGuard. Add a tiny TOC of past playbooks. Decide whether V1 is worth building based on whether you used V0.1 daily.ai_calls table in V1's Postgres.Mihwar (محور · "pivot, axis") is a two-phase platform. Phase 1 is a single-operator web app that turns 3-week AI consulting discoveries into 3-day Blueprint deliverables for NMO Partners. Phase 2 is a SaaS where clients run their own AI visioning and roadmaps inside the same engine. The same axis turns — the operator changes.
Mihwar is the operating system for an AI consulting practice. In Phase 1 it is the private cockpit Ahmed and the NMO team use to run client engagements: a five-stage workflow that takes a vague client wish and produces a $25,000 Blueprint deliverable in a working week, grounded in a curated catalog of vendors, models, and patterns and in the client's own infrastructure inventory. In Phase 2 the same engine is exposed to clients directly so they can self-serve AI visioning and roadmaps inside their own organisations, paying NMO a subscription for the platform and the catalog. The Phase 1 codebase is built so the Phase 2 pivot is a deployment change, not a rewrite.
Every implementation decision in this masterplan is checked against four questions. They run through every section that follows.
Would this still work at 100× current load?
How could this be abused by a hostile actor?
Could this be investigated at 2am, six months from now?
Affordable at 100× usage? What's the cost per user per month?
Why AI consulting projects stall in KSA right now, and the single observation that turns Mihwar from "another workshop tool" into a defensible product.
A typical AI use-case discovery in a KSA enterprise takes 3–6 weeks. Stakeholders are scattered across IT, business units, security, procurement, vendors. Information arrives in WhatsApp threads, email PDFs, three different SharePoint tenants and a printed spreadsheet from a DBA. The consultant spends 60% of the engagement chasing data dictionaries and license terms, not designing the system.
By the time the inventory is "good enough", the senior architect picks tools from memory. The recommendation is rarely written down beside the alternatives that were rejected. Six months later when the build runs into trouble, no one remembers why Snowflake was chosen over BigQuery — and there is no audit trail to consult.
Most engagements end with a 60-slide PowerPoint plus a Word document. CTOs forward them to a procurement committee, who can't navigate them, can't search them, can't share fragments without re-formatting, and can't verify whether the architecture has been validated against the actual environment. The artifact is dead on arrival.
Mihwar is not a chatbot for architects. It is an interviewing instrument. Stage 1 sharpens the use case. Stage 2 conducts the inventory. Both stages structurally refuse to advance until the inputs to Stage 3 are complete. Stage 3 — architecture synthesis — is fast precisely because Stages 1 and 2 made it possible. Most AI consulting tools start at Stage 3 and skip the discovery; that is exactly why their outputs feel hallucinated.
2026 is the loudest year in KSA AI consulting history. Mihwar's job is to be the most differentiated voice in the room — not the loudest.
Saudi Arabia has declared 2026 the Year of AI. Concretely:
| Competitor | Strength | Weakness Mihwar exploits |
|---|---|---|
| Big Four (Deloitte, EY, PwC, KPMG) | Brand, regulatory comfort, large delivery teams | Slow, expensive, generic decks, junior delivery on senior pitch |
| BCG / Bain / McKinsey | Strategy chops, board-level access | $300k+ floor, no implementation grounding, no KSA-localised vendor view |
| Local SI consultancies | Relationships, ministry pre-quals | Body-shop economics, no productised IP, no AI-specific differentiation |
| Boutique AI shops (regional / overseas) | Technical depth | No Arabic delivery, no PDPL fluency, no in-region presence |
| "AI strategy" SaaS tools | Cheap, fast | Generic catalog, not grounded in client's actual stack, no consultant orchestration |
Mihwar plus NMO occupies a specific gap: a senior, KSA-fluent consultant team backed by a productised workflow that produces a verifiable, interactive deliverable in 7 days for a $25k anchor price. No Big Four competes there because their cost structure forbids it. No SaaS competes there because they have no senior consultant. No body-shop competes there because they have no productised IP.
Vision-2030 entities, ministries, regulators.
Banks, logistics, retail, healthcare, family offices.
When Mihwar opens to clients directly (Phase 2), the addressable market widens substantially: every mid-market enterprise that does not need a consultant in the room but does need a structured visioning process becomes a buyer. Pricing shifts from engagement-based to per-seat or per-Blueprint subscription. NMO captures consultancies as a meta-tier — small AI shops who license the Mihwar engine and the catalog and use it inside their own client engagements.
How Mihwar is sold, what it costs the client, and how it makes NMO defensibly profitable.
| Tier | Deliverable | Price | Cycle | Margin |
|---|---|---|---|---|
| Tier 1 · Blueprint | Bilingual interactive HTML Blueprint signed off by client CTO. One 90-min walkthrough. | $15–30k | 1–2 weeks | ≥75% |
| Tier 2 · Blueprint + Playbook | Tier 1 plus 6-week build plan, risk register, vendor short-list, RFP-ready spec. | $30–60k | 2–3 weeks | ≥65% |
| Tier 3 · End-to-end engagement | Tier 2 plus orchestrated build (NMO Apex agents or partner squad). | $120k+ | 3–9 months | 30–50% on build portion |
This anchors the value of discovery, separates it from build risk, and makes Tier 1 feel reasonable. Never quote a Tier 3 price first. It triggers procurement scrutiny that the engagement isn't sized for.
When Mihwar becomes a SaaS, pricing shifts. The Blueprint becomes a unit of work the customer self-produces; NMO charges for access to the engine and the catalog.
| Phase 2 plan | Audience | Price target | What's included |
|---|---|---|---|
| Starter | Single AI champion at a mid-market enterprise | $1,200/mo or $9,600/yr | 1 workspace, 3 Blueprints/yr, premium catalog read-only, EN/AR |
| Team | 5-seat AI office | $3,500/mo | 5 workspaces, unlimited Blueprints, custom branding, SSO |
| Consultancy | Boutique AI shops licensing Mihwar for their clients | $25k/yr + per-Blueprint | White-label, multi-client workspaces, customer-private catalog tier, NMO catalog as premium |
| Enterprise | Large org with strict residency / SSO needs | Custom | Dedicated tenant in-region, BYO IDP, audit export, contractual residency |
Mihwar serves four distinct user roles. The product treats each one differently. Phase 1 is built for the first three; Phase 2 adds the fourth.
One codebase, two operating modes. Phase 1 is the consulting cockpit operated by NMO. Phase 2 is the self-serve platform operated by clients. The same engine drives both — the difference is who holds the steering wheel.
| Capability | Phase 1 use | Phase 2 inheritance |
|---|---|---|
| Five-stage workflow | NMO consultant runs it | Self-serve user runs it with embedded coaching |
| Catalog | NMO's curated knowledge base | Premium tier (NMO) + customer-private tier |
| Blueprint format | $25k deliverable | Self-produced artifact |
| Multi-tenant data layer (RLS) | One tenant: NMO | Many tenants: subscribers |
| Org Infrastructure Profile | Captured per engagement, reused on repeat | Captured per organisation, drives every Blueprint they make |
| aiproxy + AI economics discipline | Cost control across few engagements | Cost discipline at scale; per-tenant budget caps |
| Audit log | Per-user actions for NMO team | Compliance trail for regulated subscribers |
Phase 2 development starts when any one of the following becomes true:
Mihwar's core mechanic. Five sequential stages, each producing a versioned artifact, each unlocking the next. The Architecture Gate between Stages 2 and 3 is the rule that earns Mihwar its existence.
| Stage | Mode | Duration | Output | AI model |
|---|---|---|---|---|
| 1 · Ideation Lab | Live workshop · Socratic AI | 60–90 min | Sharpened use case (1-pager) | Claude Sonnet |
| 2 · Discovery | Hybrid live + async forms | 2–3 elapsed days | Infrastructure inventory | Haiku for filtering · Sonnet for synthesis |
| ⚑ Architecture Gate · Stage 3 locked until Stage 2 is signed off | ||||
| 3 · Architecture | AI synthesis · consultant edits | ~1 day | Use Case Blueprint | Sonnet (extended thinking) |
| 4 · Playbook | Optional · Tier 2+ only | ~1 day | Build plan · risks · vendors · RFP spec | Sonnet |
| 5 · Handoff | Compile · present · export | 90-min walkthrough | Final HTML Blueprint deliverable | — |
Each stage is a panel in the Mihwar UI with three sub-panels:
Concretely: when the consultant tries to advance to Stage 3, the system checks Stage 2 completeness against the use case category. Missing critical fields ("nobody has told us where the data is") block advance with a specific, actionable list. The consultant cannot bypass this from the UI; they would have to edit the database directly to override.
A 60–90 minute live conversation that turns a vague client wish ("we want to use AI in our call centre") into a sharp, scoped use case with measurable success criteria. Socratic AI interrogates ambiguity until consultant and client agree on what they're actually building.
Typically the first or second meeting with a new client. The CTO has expressed interest, may have a fuzzy idea of what they want, and needs the consultant to help them sharpen it. The Lab can also be skipped if the client arrives with a fully-scoped use case (rare) — they get a discount for not needing it.
Mihwar runs the Lab through six question phases. The AI generates the specific questions in context, but they always probe these dimensions:
| # | Dimension | The question behind the question |
|---|---|---|
| 1 | The pain | What specific operational pain are we removing? Not "improving efficiency" — "reducing first-call resolution time from 14 minutes to under 6 minutes." |
| 2 | The user | Who is the human in the loop? Internal employee? External customer? Regulated principal? |
| 3 | The current state | How is this done today? With what tools, by whom, at what cost? Sketch the unhappy path. |
| 4 | The success metric | If we did this perfectly, what number moves and by how much? Who measures it? |
| 5 | The blast radius | What happens if the AI is wrong 5% of the time? 20%? Tolerable / catastrophic? |
| 6 | The first-mile constraints | Who has the data? Who has the budget? Who must approve? |
The Lab uses Claude Sonnet (latest) with a system prompt that turns it into a Socratic interviewer. Behaviour rules:
As the conversation progresses, the artifact panel renders a structured Use Case 1-pager:
USE CASE: [name]
PAIN: [one sentence]
USER: [persona, role, jurisdiction]
TODAY: [current process, cost, owner]
TARGET: [metric, baseline, goal, by when]
BLAST: [tolerable failure modes, intolerable failure modes]
INPUTS: [what data is needed, who owns it]
DECISION-OWNER: [who signs off the build]
OUT-OF-SCOPE: [explicit non-goals]
When the consultant is satisfied, they hit "Sign off Stage 1". The 1-pager is frozen as v1. If they re-open later, edits create v2, v3, etc. — never overwrite. This becomes the input to Stage 2's question-set tailoring.
For self-serve Phase 2 users, Stage 1 needs more scaffolding: example 1-pagers from the catalog ("see how a contact-centre AI was scoped"), inline tooltips that explain each dimension, and a "show me a strong answer" affordance on each prompt. The schema doesn't change — just the surface.
The infrastructure inventory. The most labour-intensive stage and the one most clients hate. Mihwar's job is to make it bearable, structured, partially async — and to refuse to advance until it's actually complete.
In a traditional engagement, Stage 2 takes 3–6 weeks. It's where consultants chase stakeholders for data dictionaries, screenshots of dashboards, license confirmations, GPU specs. It's where projects stall.
Mihwar compresses to 2–3 elapsed days by:
| Domain | What we capture |
|---|---|
| Data sources | Warehouses (Teradata, Snowflake, BigQuery), lakes (S3, ADLS), operational DBs, file shares, SaaS APIs, Excel sprawl. License terms. Volume. Freshness. Owner. |
| Compute | Cloud accounts, on-prem servers, GPU clusters, Kubernetes, VPS providers, edge devices. Capacity. Region. Procurement model. |
| Identity & access | IDP (Entra, Okta, custom), SSO state, MFA coverage, service-account hygiene, secret stores. |
| Network & perimeter | VPN, ZTNA, private endpoints, egress controls, region restrictions, SAMA / NCA controls applicable. |
| Existing AI/ML | Models in production, vendors used, licensing, evaluation discipline, MLOps maturity. |
| Compliance | PDPL, SAMA, NCA ECC, sector-specific (healthcare, education). Data classification scheme. |
| People | Sponsors, decision owners, champions, blockers. Skill availability. |
| Budget & procurement | Approved spend envelope. Procurement vehicle (direct, RFP, framework). Vendor preferences. |
| Constraints | Residency, on-prem mandates, vendor exclusions, contractual SLA shape, audit cadence. |
For each async question, Mihwar generates a single-use form link, scoped to the question, time-limited (default 7 days), bound to the recipient's email and IP-logged. The link looks like:
https://mihwar.nmopartners.com/async/01HV7Z9K3J5XPQ8WMY4N6T2RES
Recipients land on a clean, branded page with one or two questions, an "I don't know — ask X" escape, and a submit button. No login required. Submissions stream back into the consultant's Stage 2 panel.
secrets.token_urlsafe(16) not uuid4 when used for auth). Single-use: marked consumed on first valid submission. Time-bound: hard expiry at 7 days, rejected at the API layer. IP-logged for audit. Form pages return generic errors on invalid/expired tokens, never leak whether the token existed. See Client Security & PDPL.The AI maintains a running gate-check: which Stage 3 architecture decisions can be made given current Stage 2 inputs? The consultant sees this as a live readiness meter, with the specific blocking questions named:
Stage 3 readiness: 76% · 4 questions remain blocking
✓ Data residency captured
✓ Identity provider captured
✗ GPU availability — pending response from CloudOps (sent 3 days ago)
✗ PDPL classification of customer voice transcripts — pending Legal
✗ SAMA AI governance applicability — async link expired, resend?
✗ Production traffic peak — async link sent today
Self-serve users don't have a consultant orchestrating Stage 2. Mihwar must:
The AI proposes a complete reference architecture for the use case, grounded entirely in the client's actual infrastructure (Stage 2) and the curated catalog. The consultant reviews, edits, and signs off the result.
The AI is given:
The AI is forbidden from:
Synthesis is asynchronous. The consultant clicks "Generate Architecture v1"; the request lands in a background queue (BullMQ-equivalent on Redis). A worker:
cache_control.stage_artifacts v1.Total elapsed time: typically 60–120 seconds. The consultant sees a "thinking…" beam during synthesis and reads the result when it lands.
The consultant can:
Optional but high-margin. Adds detailed build planning, risk register, vendor short-list, and reference repositories. Sold as Tier 2+ pricing. The Playbook is what a buy-side procurement officer actually reads.
Many engagements stop at Stage 3 — the client signs off the Blueprint, takes it to their finance committee, comes back later for the build. Mihwar respects that — Stage 4 is opt-in and adds days, not hours.
When clients do want Stage 4, they're typically committed to building and need the planning rigour. They're paying $30–60k for the Blueprint+Playbook combo and they expect a deliverable they can hand to a build team.
| Risk | Likelihood | Impact | Mitigation | Owner |
|---|---|---|---|---|
| Anthropic API quota tightened mid-build | Med | High | Multi-region key, fallback to second model family | NMO platform lead |
| Customer voice transcripts contain PHI under MoH classification | High | High | Pre-classify sample, redact pipeline before LLM, legal sign-off Week 1 | Client legal + NMO |
For government engagements, the RFP spec is the keystone. It mirrors the Blueprint structurally but reformats it as a procurement document: scope of work, deliverables, milestones, acceptance criteria, evaluation matrix, security clauses (NCA-ECC, PDPL), and pre-qualified vendor categories. The client's procurement team can lift it into their tender platform with minimal editing.
The final stage. Compiles the Blueprint, generates the proposal/scope document if relevant, and exports the deliverable.
Mihwar supports a presentation mode — full-screen, larger fonts, navigable page-by-page. The consultant shares the Blueprint screen, walks the client CTO through each section, answers questions, captures any final adjustments. Adjustments create a v(n+1) without invalidating the original signed manifest.
The workspace doesn't disappear. Mihwar retains it indefinitely (subject to data retention policy). NMO can:
The Blueprint is the deliverable. Everything in Mihwar exists to produce it. This page specifies exactly what it looks like, how it's structured, and why each design choice matters.
| § | Section | Content |
|---|---|---|
| 0 | Cover | Client logo, project name, date, NMO logo, version, document classification |
| 1 | Executive Summary | One-page overview. The CFO reads only this. |
| 2 | Use Case Definition | The Stage 1 1-pager, formatted |
| 3 | Current State | Stage 2 inventory, summarised — what they have today |
| 4 | Proposed Architecture | Diagram, component manifest, rationale |
| 5 | Data & Agent Flow | How information and decisions move through the system |
| 6 | Trade-Offs & Alternatives | What we considered and rejected, and why |
| 7 | Compliance & Risk | PDPL / SAMA / NCA reading. Risk register summary. |
| 8 | Build Playbook (Tier 2+) | Plan, vendors, RFP spec |
| 9 | Glossary | Plain-language definitions of every acronym used |
| A | Manifest | Versions, signoffs, catalog hash, signature |
{
"blueprint_id": "01HV8XQGT7K5R2W3M9N6P8Y4ZS",
"client": "Tadawul",
"project": "Customer Voice AI",
"version": "1.0",
"generated_at": "2026-05-07T14:32:18.420Z",
"engagement_id": "eng-0042",
"tenant_id": "nmo-001",
"stages": {
"stage_1": {"version": 3, "signed_off_by": "ahmed@nmopartners.com",
"signed_at": "2026-05-03T10:14:00Z"},
"stage_2": {"version": 5, "signed_off_at": "2026-05-05T16:22:00Z"},
"stage_3": {"version": 2, "signed_off_at": "2026-05-06T09:08:00Z"}
},
"catalog_snapshot_hash": "sha256:7c4b8d…",
"signing_key_id": "mihwar-prod-2026-05",
"signature": "ed25519:0x4f2a1c9b…"
}
Mihwar's grounding source. A curated, opinionated reference of vendors, models, frameworks, and patterns — maintained by NMO, used by the AI for every recommendation.
If the AI is allowed to recommend any vendor based on its training data, three things go wrong:
The catalog solves all three. It's NMO's opinionated knowledge base, evolving with every engagement.
The catalog is organised around a 10-layer reference architecture covering the full AI stack. Every catalog entry attaches to one or more layers. The same atlas is used in Stage 3's auto-generated diagram.
| Entity | Fields |
|---|---|
| Vendor | name, layers (1–10), region availability, KSA presence (none / partner / direct), pricing model, NMO opinion (rating 1–5 + notes), known limits, partner contacts, last reviewed |
| Model | name, family, provider, context window, cost/M input, cost/M output, languages (incl. AR strength), strengths, weaknesses, NMO opinion |
| Framework | name, layer, license, language, maturity, NMO opinion, when-to-use, when-to-avoid |
| Pattern | name, problem solved, components used, reference repo, used in N past engagements, success notes |
| Constraint | type (PDPL, SAMA, NCA, on-prem-only, etc.), description, implication for architecture |
| Question | domain, question text (EN + AR), category, async/live default, depends-on Stage-1 fields, gating Stage-3 decisions |
The catalog is seeded from two sources:
Seed target for V1: 80–120 vendors, 30 models, 20 frameworks, 12 patterns, 30 constraints, 150 questions. Within 6 months of operation: 200+ entries, quarterly review cycle.
Stage 2 asks 30+ questions about the client's stack. Most of those answers don't change between engagements with the same client. The Org Profile captures them once, persists them at the organisation level, and pre-populates every future Blueprint — both inside Phase 1 and across Phase 2.
Today, when NMO does a second engagement with a returning client, the consultant manually re-enters 80% of Stage 2 — same Snowflake, same Entra tenant, same SAMA-registered subsidiary, same procurement rules. The client wonders why they're answering the same questions twice. Phase 2 makes this unbearable: a self-serve user shouldn't face a 30-question infrastructure quiz on every Blueprint they generate.
The Org Profile is a structured, versioned document attached to a tenant (Phase 2) or to a client entity within NMO's tenant (Phase 1). It mirrors the Stage 2 taxonomy:
| Section | Examples | Update cadence |
|---|---|---|
| Identity & tenant | Legal entity, sector, regulator(s), HQ region, employee count, AR/EN preference | Annual or on change |
| Data platform | Warehouses, lakes, ETL tooling, BI tools, classification scheme | Per use case unless changed |
| Compute & cloud | Cloud accounts, regions, Kubernetes, GPU access, on-prem footprint | Quarterly |
| Identity & security | IDP, MFA coverage, ZTNA, secret stores, SOC, incident response shape | Annual |
| Compliance posture | PDPL applicable, SAMA registered, NCA-ECC tier, sector controls (MoH, MoE) | Annual or on regulatory change |
| Procurement | Approved vendor list, RFP framework, procurement vehicle, budget cycle | Annual |
| AI maturity | Models in production, MLOps state, AI champion, governance committee | Per use case |
| Constraints | Data residency mandates, vendor exclusions, on-prem-only systems, sovereign-cloud requirement | Annual or on change |
When a returning client starts a new engagement:
Org Profile is versioned in the same pattern as stage_artifacts: every meaningful update creates an immutable version row with author + timestamp. Stage 2 inventories link to the specific Profile version they were derived from — so re-reading a Blueprint a year later shows what the world looked like then, not now.
The Org Profile is the central artifact of Phase 2. A self-serve user fills it once at onboarding (with a guided wizard), then every Blueprint they generate inherits it. Profile review becomes an annual event — pushed by Mihwar with email reminders. Without the Profile concept, Phase 2 is unusable; with it, the second Blueprint a customer generates feels effortless.
Five containers on a single VPS, joined to a private Docker network. Egress to Anthropic strictly through the aiproxy. Boring, well-understood, swap-safe.
| Container | Role | Port | Notes |
|---|---|---|---|
mihwar-web | Next.js 14 frontend (SSR + SSE) | 3000 internal | Renders the workspace UI & the Blueprint viewer |
mihwar-api | FastAPI backend | 8000 internal | Auth, data, async-link issuing, all business logic |
mihwar-worker | arq async worker | — | Stage 3 synthesis, embeddings, scheduled jobs |
mihwar-aiproxy | LiteLLM gateway | 4000 internal | Single egress to Anthropic + Voyage; cost meter; cache |
mihwar-redis | Redis 7 | 6380 internal | Queue, session store, rate-limit counters, response cache |
mihwar-postgres | Postgres 16 + pgvector | 5435 internal | Persistent state. Nightly backup. RLS enforced. |
All containers run on a private Docker network mihwar_net. Only Caddy (managed by Coolify) is exposed to the public internet on 80/443. Postgres and Redis ports are never bound to host or to the public Apex network.
mihwar-web.mihwar-api with the user's session cookie.contextvars), generates or accepts request_id.request_id + user_id + tenant_id in payload) and returns 202 + job ID. Worker consumes, calls aiproxy, persists result.The VPS firewall (UFW) restricts outbound traffic to:
api.anthropic.com, api.voyageai.com — from aiproxy only.Anything else is denied by default. This kills two attack classes at once: data exfiltration via a compromised container, and prompt-injection-driven outbound calls.
tenant_id and the work surfaces are stateless except for Postgres + Redis. See Multi-Tenancy.17 tables. Multi-tenant from day one. Versioned artifacts. Immutable audit log. Designed so Phase 2 doesn't require a rewrite.
| Table | Purpose |
|---|---|
tenants | The org owning a Mihwar instance. V1 has exactly one row (NMO). Phase 2 has many. |
users | People who can log in. Belongs to a tenant. |
sessions | Login sessions. Cookie-bound, expiry-tracked, regenerated on login, IP-bound (soft). |
service_principals new | Non-user callers: aiproxy, worker, async-form-submitter, cron. Each has its own credential type. |
clients | The end-customer organisation NMO is consulting for. Belongs to a tenant. Owns Org Profiles. |
org_profiles new | The persistent infrastructure profile of a client. Versioned. Field-level encrypted at rest for sensitive sections. |
workspaces | One per client engagement. The unit of work. References the Org Profile version it started from. |
workspace_members | Which users have access to which workspace, at what role. |
stage_artifacts | The output of each stage, per workspace. Versioned: every signoff creates a new immutable row. |
messages | Conversational log per stage — every AI exchange, every consultant entry. Linked to request_id. |
catalog_entries | Vendors, models, frameworks, patterns, constraints. Tenant-scoped (Phase 2 supports tier system). |
questions | Discovery question bank. Tenant-scoped, multilingual. |
async_links | Per-stakeholder async form URLs. Time-limited, single-use, IP-logged. |
async_responses | Answers submitted via async links. |
blueprints | Compiled Blueprint exports. Stored as both structured JSON and rendered HTML, with manifest hash. |
audit_log | Immutable. Every privileged action — signoffs, edits, exports, recommendations. |
ai_calls | Every aiproxy call: input/output tokens, cache hits, model, cost, workspace, request_id, latency. |
All artifacts (stage_artifacts, blueprints, org_profiles) follow the same versioning pattern:
version column auto-increments per parent.signed_by + signed_at on the row that becomes "current".parent_version link for diffing.*_draft column or table; only frozen on signoff.-- every business table has tenant_id with NOT NULL
ALTER TABLE workspaces ADD COLUMN tenant_id UUID NOT NULL
REFERENCES tenants(id);
CREATE INDEX idx_workspaces_tenant ON workspaces(tenant_id);
-- row-level security enforced at the DB layer
ALTER TABLE workspaces ENABLE ROW LEVEL SECURITY;
CREATE POLICY ws_tenant_isolation ON workspaces
USING (tenant_id = current_setting('app.tenant_id')::uuid);
-- API sets the session-local var on every request
SET LOCAL app.tenant_id = '01HV8Z…';
(tenant_id) first, then (tenant_id, workspace_id) compound.messages(workspace_id, stage, created_at DESC) — chat history retrieval.stage_artifacts(workspace_id, stage, version DESC) — load latest version fast.audit_log(tenant_id, actor_id, created_at DESC) — operator Logs page queries.ai_calls(tenant_id, created_at DESC), plus (tenant_id, feature, created_at DESC) — cost dashboards.async_links(token_hash) unique — single lookup on form load.catalog_entries.embedding for RAG retrieval.(created_at, id) < cursor. OFFSET on a 5M-row audit_log will scan from the start every page; keyset stays O(log n). See Observability Logs page.Alembic. Every migration declares its indexes with CONCURRENTLY for tables expected to grow past 100k rows (messages, ai_calls, audit_log). Migrations are reviewed in PR before being applied — no auto-apply on deploy.
Every choice annotated with why. The bias is toward boring, well-documented, swap-safe technology — the kind a future contributor will thank us for.
| Choice | Why |
|---|---|
| Python 3.12 | The AI ecosystem is Python-native. Anthropic SDK, vector DBs, embeddings — all Python-first. |
| FastAPI | Modern async framework, OpenAPI auto-gen, Pydantic-driven request validation. |
| SQLModel + SQLAlchemy 2.0 | One model serves database + API. No drift between schema and types. |
| Alembic | Mature schema migration. Boring on purpose. |
| asyncpg | Fastest Postgres driver in Python. |
| arq | Lightweight Redis-backed task queue. Idempotency keys, retries, DLQ. |
| Anthropic SDK (Python) | First-party. Streaming, tool use, prompt caching, extended thinking. |
| LiteLLM | The aiproxy. Single egress, model swap, cache, cost. |
| structlog | Structured JSON logs with auto context. See Observability. |
| OpenTelemetry SDK | Traces. Quiet in V1, ready for distributed in V3. |
| Choice | Why |
|---|---|
| Next.js 14 (App Router) | Server components reduce JS shipped to browser. Perfect for the Blueprint viewer. |
| TypeScript | Catches errors at build time. Required for a multi-month codebase. |
| Tailwind CSS | Utility-first. Lets the LLM (Claude Code) write consistent components without designing from scratch each time. |
| shadcn/ui (selected) | Composable, accessible. Lifted into the repo, not added as a dependency. |
| Zod | Shared validation between client and server. Pydantic models at the API end, Zod schemas at the form end, both generated from the same source. |
| SWR | Client-side caching for read endpoints. Optimistic updates for the workshop UI. |
| Choice | Why |
|---|---|
| Postgres 16 | RLS, JSONB, generated columns, extensions. The default for everything. |
| pgvector + HNSW | Catalog has <5k entries — pgvector handles it well at this scale. Phase 2 may justify a dedicated vector store; pgvector is the right starting point. |
| Voyage AI embeddings | Strong multilingual including Arabic. Paid API, kept behind aiproxy. |
| Redis 7 | Queue, session, rate-limit, response cache. One tool, four jobs. |
| Choice | Why |
|---|---|
| Hostinger KVM VPS | Predictable cost, root access, KSA-adjacent regions. Sufficient for V1 throughput. |
| Coolify | Self-hosted deployment platform. Git-driven deploys, rollbacks, env management. |
| Caddy (managed by Coolify) | Automatic TLS, HSTS, CSP injection. |
| Docker Compose | Five containers, one VPS. Kubernetes is overkill at this scale. |
| UFW | Outbound allowlist, default deny. |
| Cloudflare | DNS, DDoS shield, optional country-restriction rules. |
| Bahrain S3-compatible object storage | Encrypted backup target, separate region. |
Microservices are a horizontal-scaling pattern. Mihwar V1 has one tenant and a handful of users. Microservices would buy nothing and cost weeks of build time, more failure modes, harder local development. The shape of "5 containers, one VPS" lets us ship the workflow in 6 weeks. Phase 2 may eventually warrant horizontal scaling — but that's a graduation move, not a starting point.
V1 has one tenant. Phase 2 may have hundreds. The data model and security boundaries are designed today so the V3 pivot is a deployment change, not a rewrite.
The cost of doing it now is one column on a few tables and one Postgres feature (RLS). The cost of doing it later is months of refactoring while engagements are paused.
| Level | Status | Where it lives | What it gives |
|---|---|---|---|
| 1 · Schema-aware | V1 | tenant_id column on every business table; index leads with it | Cheap query scoping; trivial to add |
| 2 · Row-level security | V1 | Postgres RLS policies use app.tenant_id session var | DB enforces tenant isolation even if app has a bug |
| 3 · Tenant context plumbing | V1 | FastAPI dep extracts tenant from session, sets SET LOCAL app.tenant_id per request | Application layer is incapable of cross-tenant queries by accident |
| 4 · Per-tenant DEK | V1 for sensitive fields | KMS-wrapped data encryption keys, one per tenant | Field-level encryption for Org Profile sensitive sections; tenant deletion = key deletion |
| 5 · Schema-per-tenant | Phase 2 enterprise tier | Dedicated schema per tenant, switched via search_path | Stronger isolation for regulated subscribers |
| 6 · DB-per-tenant | Phase 2 sovereign tier | Dedicated Postgres instance per tenant, deployed in-region | Hard residency, full backup separation |
Every CI run executes a "tenant fence test": create two tenants, two users, two workspaces. Authenticate as user-A. Try to read user-B's workspace, message, audit log, blueprint. Assert 404 (not 403 — 403 leaks the existence of the resource). The test fails the build if any cross-tenant read returns data.
When a Phase 2 tenant cancels and confirms erasure:
tenant_id-scoped rows are hard-deleted in a single transaction.tenant_id into the JWT and trust it client-side. The client never names its own tenant. The server resolves session_id → user_id → tenant_id on every request and uses the server-resolved value. Trusting client-supplied tenant IDs is one of the top sources of multi-tenant data leaks.Mihwar handles client data — sometimes sensitive infrastructure inventories, sometimes regulated information. Security is not a sprinkle on the end; it's a structural choice baked into the architecture.
Mihwar must defend against, in order of likelihood:
<document> block, never as system prompt), tool-use isolation, output filtering before any tool invocation.secrets.token_urlsafe(32) — 256 bits of entropy. Stored as SHA-256 hashes in the DB.Every API call identifies its caller before any work. Caller types are explicit and disjoint, each with its own credential mechanism:
| Actor type | Credential | Where it lives | Example |
|---|---|---|---|
user | Session cookie (Argon2id-derived) | Browser, httpOnly | Ahmed running a Lab |
service | Service token (random ≥256-bit) | Container env, never logged | Worker calling api |
agent | Tool-use token, scoped per call | Issued per-job by API | aiproxy-driven tool call |
webhook | HMAC-signed payload | Signing secret rotated quarterly | Async form submission |
cron | Service token, restricted to cron paths | Coolify env | Nightly catalog re-embed |
Verified identity is attached to the request context (contextvars) and used for every downstream check. Permission is checked against the verified caller, never against client-supplied identifiers. Rate limit applies per verified identity, not IP alone.
default-src 'self'; img-src 'self' data:; style-src 'self' 'unsafe-inline' fonts.googleapis.com; font-src fonts.gstatic.com; connect-src 'self' — no unsafe-eval, no unsafe-inline scripts. Nonces for inline if absolutely needed.X-Content-Type-Options: nosniff, Referrer-Policy: strict-origin-when-cross-origin, X-Frame-Options: DENY, Permissions-Policy tightly restricted.Server / X-Powered-By.•••• •••• until explicitly revealed; reveal is audit-logged.The Saudi Personal Data Protection Law applies whenever Mihwar processes personal data of KSA residents. Key obligations:
Untrusted input — pasted client docs, async form responses, third-party content — is treated as data, not instruction:
<document index="1">…</document>) in the prompt. The system prompt instructs the model to treat block content as data only.uv.lock or poetry.lock for Python; pnpm-lock.yaml for JS).pip-audit + pnpm audit + trivy fs+image + gitleaks. HIGH or CRITICAL fails the build.Production errors return: {"error":"internal","reference":"ERR-7K2P9X"}. The reference ID maps server-side to the full stack trace + request_id + tenant_id + user_id. Stack traces, paths, and schema info never reach the client. The Logs page lets the operator look up any reference ID in 5 seconds.
We give clients $30k consulting on AI architecture security. We will not run a sloppy host. This page is the rigour we apply to Mihwar itself — what we lock down, how we patch, where the keys live, what the backups look like, who responds when something breaks.
root. SSH key-only, password auth disabled.fail2ban with bans on auth failure, allowed-from-IP list for the operator's static IPs (with break-glass procedure documented).127.0.0.1 on the host, exposed to other containers via the Docker network only.| Secret | Where it lives | Rotation |
|---|---|---|
| Anthropic API key | aiproxy env (Coolify-injected) | Quarterly + on suspicion |
| Voyage API key | aiproxy env | Quarterly |
| Postgres superuser | Coolify-managed, never in repo | Annual |
| App DB user | Coolify env, least-privilege | Annual |
| Session signing secret | Coolify env, ≥256-bit | Quarterly |
| HMAC webhook secret (async forms) | Coolify env | Quarterly |
| Blueprint signing key (Ed25519) | aiproxy env, archived versions kept for verification | Annual |
| KMS master key | External KMS (DigitalOcean / Hetzner / cloud-managed) | Annual + on suspicion |
| Per-tenant DEK | KMS-wrapped in DB; plaintext only in app memory at request time | On tenant request or annually |
| Backup encryption passphrase | Offline copy in a 1Password vault + sealed envelope physically held | Annual |
Never in source. A pre-commit hook (gitleaks) and CI scan reject any push that looks like a secret. The .env.example file is committed with placeholder values; the real .env is gitignored and lives only on the VPS via Coolify.
pg_dump, encrypted client-side with the backup passphrase, shipped to off-region object storage. 30-day retention.openssl enc -d dry-run weekly via cron; alert on failure.:latest.--privileged.trivy image scans every image at build time; HIGH/CRITICAL fails the deploy.main: required reviews, required status checks (lint, types, tests, vuln scan, gitleaks).A documented runbook in /srv/mihwar/runbooks/incident.md on the VPS itself (so it's available even if the website is down). Phases:
| Scenario | RPO | RTO | Procedure |
|---|---|---|---|
| VPS lost (provider outage) | ≤24h (last backup) | ≤4h | Provision new VPS via Terraform-recipes (kept in repo); Coolify recovery; restore latest backup; rotate all secrets; validate. |
| Database corruption | ≤1h (WAL) | ≤2h | PITR to last clean point; replay missed work from messages log + audit log; client notification if signoffs invalidated. |
| Anthropic API outage | — | — | aiproxy fails open with a "synthesis temporarily unavailable" UI message. Background queue retains jobs; resumes on recovery. |
| Key compromise | — | ≤30 min to rotate | Runbook drives rotation: Anthropic key, session signing, KMS keys (with re-wrap), HMAC, Ed25519 signing. |
| Single-passphrase compromise | — | ≤10 min | Force logout all sessions, rotate passphrase + TOTP, audit-log review for unexpected actions. |
/health on api & web; Caddy probes them every 30s. Down for >2 min → Pushover alert to Ahmed.Mihwar is built on Claude. Claude is the most expensive line item in the operating cost. The discipline that keeps a $25k Blueprint at 75% margin in Phase 1 — and makes a $1,200/mo subscription affordable in Phase 2 — is on this page.
Before any feature ships, we model: cost per call × calls per Blueprint × Blueprints per month. The targets:
| Stage | Calls / Blueprint | Avg cost / call | Cost contribution |
|---|---|---|---|
| Stage 1 · Lab (Sonnet · streaming) | ~30 turns | $0.05–$0.12 | ~$1.50–$3.50 |
| Stage 2 · Discovery filtering (Haiku) | ~5 calls | $0.01–$0.03 | ~$0.10 |
| Stage 2 · Async prompt drafting (Haiku) | ~10 calls | $0.01 | ~$0.10 |
| Stage 3 · Synthesis (Sonnet · ext. thinking) | 1–3 generations | $3–$8 | ~$5–$20 |
| Stage 4 · Playbook generation | 1–2 generations | $2–$5 | ~$3–$8 |
| Embedding catalog reads (Voyage) | ~50 lookups | $0.0005 | ~$0.03 |
| Total per Blueprint (target) | $10–$32 |
At a $25k Blueprint, AI cost is ≤0.13% of revenue. The discipline below is what keeps it there.
Two-tier routing throughout. Haiku handles: discovery question filtering, async prompt drafting, glossary expansion, classification (is this an inventory question or a use-case question?), single-turn lookups, simple tool selection. Sonnet handles: Lab interviewing, architecture synthesis, Playbook generation, anything where reasoning quality matters. Never use Opus unless an unsolved-for-Sonnet workload appears — and that becomes a separate budgeted decision.
The catalog snapshot, the house style guide, and the system prompt for each stage are cached via cache_control: ephemeral. Order: stable → variable. Verify on call #2+ that cache_read_input_tokens > 0; if zero, the prefix is drifting (timestamp, random tool order, mutable preamble). Hit rate target: ≥80% for repeated within a 5-min cache TTL.
messages = client.messages.create(
model="claude-sonnet-4-6",
system=[
{"type":"text","text":HOUSE_STYLE,
"cache_control":{"type":"ephemeral"}},
{"type":"text","text":CATALOG_SNAPSHOT, # ~50k tokens
"cache_control":{"type":"ephemeral"}},
{"type":"text","text":STAGE3_PROMPT,
"cache_control":{"type":"ephemeral"}},
],
messages=conversation_history,
max_tokens=4096,
)
# log: input_tokens, cache_read_tokens, cache_creation_tokens
50% off — used for: nightly catalog re-embedding, retroactive question generation when the catalog changes, eval runs against past Blueprints to spot regressions, scheduled summarisation of long workspace histories. Anything tolerating >seconds latency.
max_tokens on every call. Stage 1 turn: 1024. Stage 3 synthesis: 8192. Async draft: 256.Hash (model, prompt, tools, temperature) → cache the response in Redis for hours when the prompt is non-personalised (catalog-only Q&A, glossary expansions). Semantic cache for near-duplicate queries (cosine ≥ 0.95) — not used in V1 but designed-for. Pre-compute predictable queries on a schedule (e.g. "expand each catalog entry into a one-paragraph summary" — done in Batch API, served from cache).
Before any LLM call, the question: is there a regex / SQL aggregation / classical-ML / rules path that gets us to the answer 100×–10,000× cheaper? Examples in Mihwar:
Any tool-using flow caps max_iterations (default 10) AND max_tokens_per_session (default 30k). Tool selection is done by Haiku where possible. Tool results are cached. Independent tool calls are parallelised. The loop refuses on hitting a budget rather than spending unbounded.
Every aiproxy call writes a row to ai_calls:
{
"request_id": "01HV…",
"tenant_id": "nmo-001",
"user_id": "ahmed",
"workspace_id": "ws-0042",
"feature": "stage3.synthesis",
"model": "claude-sonnet-4-6",
"input_tokens": 52800,
"cache_read_tokens": 50100,
"cache_creation_tokens": 0,
"output_tokens": 4200,
"latency_ms": 38400,
"cost_usd": 0.279,
"cache_hit_rate": 0.949
}
The operator Logs page (see Observability) includes a Cost view: per-feature bar chart for the last 30 days, per-tenant ranking, anomaly highlights (a tenant burning 10× their normal rate). Drilling in shows the calls behind any bar.
Phase 2 changes the math: many tenants, lower revenue per Blueprint, more risk of pathological usage. Discipline tightens:
Every product Mihwar produces ships with an operator Logs page on day one. We hold ourselves to the same standard: when a client says "something happened on Tuesday at 3pm", an operator can reconstruct it in 60 seconds.
Logs are not for grep'ing on the day of an incident. Logs are the system's memory. Mihwar's logs let an operator at 2am, six months from now, answer: which user did what, with what data, when, with what result, and what did the system do downstream?
Every line of every service is structured JSON, one event per line, with this envelope:
{
"timestamp": "2026-05-07T14:32:18.420Z",
"level": "info",
"service": "mihwar-api",
"env": "prod",
"event": "stage.signoff",
"message": "Stage 2 signed off",
"request_id": "01HV8Z9K3J5XPQ8WMY4N6T2RES",
"tenant_id": "nmo-001",
"user_id": "ahmed",
"actor_type": "user",
"session_id_hash": "sha256:7c4b…",
"ip": "91.193.x.x",
"user_agent": "Mozilla/5.0 …",
"workspace_id": "ws-0042",
"stage": 2,
"version": 5,
"duration_ms": 14
}
user_id — stable internal ID, never email. Explicit null with reason for unauthenticated paths.tenant_id — required on every request/job line. No exception.actor_type — user | service | agent | webhook | cron | system.request_id — generated at the edge (Caddy via X-Request-Id if present, else minted by api). Propagates to every downstream call.session_id_hash — for grouping a user's actions in a session without exposing the raw token.| Event class | Examples | Reason |
|---|---|---|
| Auth events | auth.login.success, auth.login.failure, auth.logout, auth.mfa.enrolled, auth.token.refresh, auth.lockout | Forensic reconstruction of who-was-where. |
| Sensitive reads | org_profile.read with field list, blueprint.export | PDPL audit trail for personal/regulated data access. |
| Writes | Stage signoffs, profile updates with compact diff of changed fields | Reconstruct what changed when a client disputes a recommendation. |
| External calls | aiproxy.call with model, status, latency, retries, cost | Cost forensics; vendor incident correlation. |
| Jobs | job.enqueued, job.started, job.succeeded, job.failed, job.dead_lettered | "Why did Stage 3 never finish?" answered in 5s. |
| Errors | Stack trace + reference ID + tenant + user + request_id | Map a client's "ERR-7K2P" reference back to root cause. |
| Async link events | async.issued, async.opened, async.submitted, async.expired | Forensics on form-based data submissions. |
Two-layer defence. Field-name blocklist (password, token, secret, cookie, authorization, api_key, plus tenant-specific entries) recursively replaces values with ***REDACTED***. Value-pattern scrubbing catches credit-card / JWT / AWS-key shapes regardless of field name. Unit tests assert that a known sensitive payload never reaches the sink intact — these tests fail the build.
Every Mihwar product has a Logs page from V1 — including Mihwar itself. Operator UI features:
user_id / tenant_id / request_id / event class / level / service / time range.request_id" — joins every line across services into a chronological view.user_id in last N hours" — for forensic and support workflows.ERR-… code, see the stack + context.logs:read for general; logs:read:sensitive for sensitive-read events; logs:export for CSV export with audit-log entry per export.| Class | Hot retention | Cold retention |
|---|---|---|
| App logs (info) | 30 days | Compressed off-host for 90 days, then deleted |
| Errors / warns | 90 days | Off-host for 1 year |
| Audit log (auth, permissions, sensitive reads, admin) | 1 year hot | Indefinite cold storage with integrity hashing |
| ai_calls | 90 days raw | Aggregated (per-feature daily) kept indefinitely |
| Debug logs | ≤7 days, off in prod by default | — |
OpenTelemetry SDK is wired in V1 but quiet. Spans are created for: HTTP request, DB query, aiproxy call, queue job. Exporter is configured but pointed at a dev sink. When Phase 2 demands distributed tracing (e.g. dedicated DB tier triggers cross-host calls), turning on Tempo / Honeycomb / cloud trace is a config change, not a code change.
From empty Hostinger directory to first signed-off engagement Blueprint in six calendar weeks. Built primarily by Claude Code with Ahmed reviewing and steering.
Mihwar is built in vertical slices: each week ends with something demoable, not a half-finished horizontal layer. By Week 2 there's a working login and a working Lab. By Week 4 there's a working full Stage 1 → Stage 3 → Blueprint export. Weeks 5 and 6 are polish, AR localisation, and dogfooding on a real client engagement.
mihwar.nmopartners.com on Hostinger.mihwar_net with Docker Compose: postgres, redis, aiproxy, api skeleton, web skeleton, worker.Demo at end of week: Ahmed logs in to an empty workspace UI on a real domain, healthcheck green, all containers running, backups firing nightly.
cache_read_input_tokens > 0 by call 2).Demo: Ahmed runs a full Lab session with a real client, produces a 1-pager.
/async/{token}, response capture.Demo: a Stage 2 inventory completed across two live answers + three async-form submissions, with Stage 3 cleanly unlocked.
Demo: empty workspace → Stage 1 → Stage 2 → Stage 3 → click "Compile Blueprint" → bilingual HTML opens. Logs page shows the entire chain by request_id.
Demo: a full Tier 2 engagement walked end-to-end with all five stages, bilingual export, signed manifest verifiable.
Demo: First $25k Blueprint shipped. Mihwar is real.
Five sequential prompts covering the build. Each prompt is self-contained and is run inside a single Claude Code session. Run them in order. After each one, review the diff, commit, and proceed.
Before running any prompt:
*.nmopartners.com resolves to the VPS.ANTHROPIC_API_KEY, VOYAGE_API_KEY, ADMIN_PASSPHRASE (Argon2id-hashed at boot), SESSION_SIGNING_SECRET (≥256-bit), HMAC_WEBHOOK_SECRET, BLUEPRINT_SIGNING_KEY (Ed25519 private), KMS_MASTER_KEY_ID./srv/mihwar/.Arcahmed93/mihwar (private), with branch protection on main.gitleaks + ruff + mypy + biome.Scaffold the project, set up the database schema, implement single-passphrase auth, get Mihwar deployable on Hostinger via Coolify.
You are building Mihwar, a private consulting cockpit for AI use-case
discovery. This prompt scaffolds the project, sets up the database schema,
implements single-passphrase auth, and gets Mihwar deployable on Hostinger
via Coolify.
# DEPLOYMENT TARGET
- Hostinger KVM VPS, working directory /srv/mihwar/
- Subdomain mihwar.nmopartners.com (DNS already resolves to the VPS)
- Reverse proxy + TLS managed by Caddy via Coolify
- Containers on a NEW Docker network called mihwar_net (do NOT join apex_net)
# SIX CONTAINERS
1. mihwar-postgres — Postgres 16 + pgvector, volume mihwar_pg_data,
port 5435 internal only
2. mihwar-redis — Redis 7, port 6380 internal only
3. mihwar-aiproxy — LiteLLM proxy, routes claude-* via Anthropic and
voyage-* via Voyage. Port 4000 internal only
4. mihwar-api — Python 3.12 / FastAPI / SQLModel / asyncpg / arq client,
port 8000 internal only
5. mihwar-worker — same image as api, runs arq worker
6. mihwar-web — Next.js 14 (App Router) / TypeScript / Tailwind / shadcn,
SSR, port 3000 internal only
# FOUNDATIONAL RULES
- Pin every image by digest. No :latest.
- Containers run as non-root. Read-only root FS where possible.
- App DB user is least-privilege; migrations run as a separate role.
- gitleaks pre-commit hook in repo. CI runs trivy + pip-audit + pnpm audit.
- Structured JSON logging from line one (structlog in Python; pino in Node).
- request_id middleware on api: accept X-Request-Id, else mint ULID.
- Caller-identity context: contextvars carrying user_id, tenant_id,
actor_type, request_id. Every log line emits these via a structlog processor.
# SCHEMA (17 tables — see masterplan p-data)
Generate the SQLModel definitions and an initial Alembic migration.
Every business table has tenant_id NOT NULL with an index leading on it.
Enable Postgres RLS on every business table; policies use
current_setting('app.tenant_id')::uuid.
# AUTH
Single-passphrase login with Argon2id (memory_cost=65536, time_cost=3).
TOTP enrolment endpoint (issues secret + QR via otpauth URL, stored encrypted
under the per-tenant DEK). Session cookies: httpOnly, Secure, SameSite=Strict,
8h sliding. Sessions stored as SHA-256 hash of token.
Account lockout: 5 failures in 15min → 15min cooldown, exponential.
# CALLER IDENTITY
service_principals table seeded with:
- svc:worker (token in env, used for worker→api calls)
- svc:aiproxy (token in env, used for api→aiproxy calls)
- svc:cron (used for nightly jobs)
- webhook:async-form (HMAC verifier for /async/* submissions)
# DELIVERABLES
- /srv/mihwar/docker-compose.yml
- /srv/mihwar/api/ (FastAPI app, models, migrations, auth, identity)
- /srv/mihwar/web/ (Next.js scaffold, login page, theme toggle)
- /srv/mihwar/aiproxy/ (LiteLLM config, env)
- /srv/mihwar/worker/ (arq worker entrypoint)
- /srv/mihwar/.env.example with placeholders
- /srv/mihwar/Caddyfile (TLS, HSTS, CSP, security headers)
- /srv/mihwar/runbooks/ (incident.md, backup-restore.md, key-rotation.md)
- README.md with one-command bootstrap
- A green CI run, an opening commit, and a green Coolify deploy.
# DONE WHEN
Visiting https://mihwar.nmopartners.com presents the login page,
correct passphrase + TOTP yields an empty workspace UI, healthcheck
endpoint returns 200, structured JSON logs flow with request_id +
tenant_id + user_id on every authenticated line, and a sample
async-form GET returns a generic 404 for an unknown token.
Build Stage 1 of the Mihwar workflow: the Ideation Lab.
# UI
- Workspace shell with persistent sidebar (workspace list,
current workspace, stage navigator).
- Stage 1 panel: three sub-panels — chat (left), 1-pager artifact (right),
signoff bar (bottom).
- Theme toggle, EN/AR toggle.
# CHAT
- Streaming via SSE from /api/v1/workspaces/{ws}/stages/1/messages.
- Each turn enqueues a synchronous (not background) Sonnet call via aiproxy
with prompt-caching on the system prompt + house style.
- Verify cache_read_tokens > 0 by turn 2; log it.
- Cap max_tokens at 1024 per turn.
- Persist every turn in messages with request_id, user_id, tenant_id.
# SYSTEM PROMPT (cached)
"You are a Socratic AI use-case interviewer for NMO Partners… [full prompt
in /srv/mihwar/api/prompts/stage1.md]"
# ARTIFACT (1-PAGER)
- Live-rendered structured object: USE_CASE, PAIN, USER, TODAY, TARGET,
BLAST, INPUTS, DECISION_OWNER, OUT_OF_SCOPE.
- Updated incrementally as the conversation progresses (the AI emits
structured updates which the renderer applies).
- Versioned on signoff. signoff button calls /api/v1/.../sign with a
confirmation modal.
# CATALOG
Seed catalog_entries with 10 sample entries (provided in seed.json).
Stage 1 doesn't query the catalog yet; it's used in Stage 3.
# DONE WHEN
Ahmed runs a 30-turn Lab against a sample use case ("AI for our customer
voice line") and ends with a frozen v1 1-pager. Logs page shows every turn
joined by request_id. Cost view shows the lab session cost broken out.
Build Stage 2 of the Mihwar workflow plus the Org Profile foundation.
# ORG PROFILE
- New table org_profiles, versioned, tenant-scoped, linked to clients.
- Field-level encryption (AES-256-GCM) for sensitive sections using a
per-tenant DEK. DEK created on tenant creation, KMS-wrapped, stored as
ciphertext in tenants.dek_wrapped. App decrypts in-memory per request.
- Settings UI under /workspace/{ws}/profile to edit; versioned on save.
- Display masking by default; reveal explicit, audit-logged.
# STAGE 2
- Discovery taxonomy seeded in questions table (CSV + script).
- "Filter questions" Haiku call: given the Stage 1 1-pager + Org Profile
baseline, return the ~30 questions that need fresh answers.
- Stage 2 panel shows question list grouped by domain, each with:
status (unasked / answered / sent-async / awaiting / blocking),
inline answer, "send as async" button.
- /async/{token} endpoint:
- validates token (single-use, time-limited, tenant-scoped)
- renders a clean form with the one or two questions
- HMAC-signs submissions
- rate-limited
- generic 404 on invalid/expired
- Readiness meter computed server-side; Stage 3 unlock blocked until
blocking-set is empty (or consultant explicitly overrides with reason
captured in audit_log).
# DONE WHEN
A Stage 2 round-trip works end-to-end: 5 questions answered live,
3 async links sent and submitted, readiness reaches 100%, Stage 3
unlocks. Org Profile updated from Stage 2 deltas with confirmation.
Logs page shows: async.issued, async.opened, async.submitted events
per token. No sensitive value appears in any log line.
Build Stage 3 synthesis, the Blueprint compiler, and the operator Logs page.
# STAGE 3
- arq job stage3.synthesise:
inputs = stage1_artifact, stage2_inventory, org_profile, catalog_rag(top_K=12)
flow = aiproxy → claude-sonnet-4-6 with extended thinking, prompt-cached
catalog snapshot. max_tokens=8192. Streams progress to the api which
forwards via SSE.
- Output: structured JSON manifest with components, data-flow nodes,
trade-offs, alternatives, open-questions, compliance-overlay.
- Auto-render SVG layered diagram + data-flow diagram from manifest.
- stage_artifacts.v1 stored on completion. Re-runs create v2, etc.
# BLUEPRINT
- /api/v1/workspaces/{ws}/blueprint/compile job:
takes latest stage_artifacts, renders to a single HTML file using
/srv/mihwar/web/templates/blueprint.html (server-side render with
inlined CSS and inlined SVG). Manifest signed with Ed25519.
- Stored in blueprints table; downloadable + viewable in-browser.
# LOGS PAGE
At /admin/logs (gated by logs:read permission):
- Filters: time range, user_id, tenant_id, request_id, event class, level.
- "Trace request_id": joins all events with that request_id from api +
worker + aiproxy logs into a chronological timeline.
- "Trace user_id": last N hours of all events.
- "Trace error reference": paste ERR-… → the full stack trace + context.
- Cost view: ai_calls aggregated per feature / tenant / user / day.
- Export to CSV (capped 10k rows; logs:export permission; audit-logged).
# DONE WHEN
A workspace progresses cleanly from empty → 1-pager → inventory → synthesis
→ Blueprint HTML download → walkthrough mode. The Logs page reconstructs
every step. Stage 3 synthesis costs < $10 per run with cache hit rate
> 80%.
Wrap up Mihwar V1: Stage 4 (Build Playbook), full Arabic localisation,
Blueprint manifest signing, and dogfooding hooks.
# STAGE 4
- Five outputs: 6-week build plan, risk register, vendor short-list,
reference repos pointer (Tier-3 only), RFP spec (optional).
- Tier flag on workspace controls which outputs are produced.
- Each output is editable in the UI before signoff.
# AR LOCALISATION
- Translate the UI shell using Mihwar AR pack (provided).
- RTL layout for AR mode (logical CSS properties; no physical
margin-left/right).
- Blueprint render in AR uses Amiri for body, Plus Jakarta for
numerals/code; the manifest carries language tags.
# MANIFEST SIGNING
- On Blueprint compile, the manifest JSON is canonicalised
(RFC 8785 JCS), hashed (SHA-256), signed with Ed25519
using BLUEPRINT_SIGNING_KEY. Public key embedded for offline
verification.
# DOGFOOD HOOKS
- /admin/feedback inline form for Ahmed to log paper-cuts during
the first real engagement; entries auto-tagged with the workspace
and request_id at the moment.
# DONE WHEN
A full Tier-2 engagement runs end-to-end, EN and AR Blueprints both
render correctly, manifest verifies via the embedded public key, and
the engagement Blueprint is the first $25k delivery.
Mihwar runs as a single-operator service with a small team layered in over time. This is the day-to-day playbook: deploys, on-call, change windows, customer-facing incidents.
CREATE INDEX CONCURRENTLY + chunked backfills. Never apply in the middle of a Stage 3 synthesis.vYYYY.MM.DD-HHmm-sha. Coolify retains last 5 deploys for one-click rollback.Ahmed is on call 24/7 in V1. The job: respond to alerts within 2 hours during the working day, 4 hours overnight. Pager is Pushover on a personal device.
| Symptom | First 5 minutes | Resolution path |
|---|---|---|
| 500s spiking | Check Logs page → top error references for the spike window. Identify offending endpoint. | Rollback if regression; hotfix if data shape; communicate if external dependency. |
| aiproxy cost spike | Logs cost view → which feature, which tenant, which user. | If runaway loop: kill jobs, raise max_iterations floor. If catalog cache miss: fix cache_control. If legitimate: confirm with consultant. |
| Worker DLQ filling | Logs page → DLQ events → root cause for the type of job failing. | Fix and replay. If transient (Anthropic 5xx), wait + retry from DLQ. |
| Backup didn't fire | Cron status, disk, backup target reachability. | Trigger manually. If recurring, ticket runbook fix. |
| Suspected key leak | Rotate the suspected key in Coolify (single command). Force logout all sessions. | Audit log review for the exposure window. Communicate per DR table. |
| Async link mis-issued (wrong recipient) | Revoke the token via POST /admin/async/revoke. Confirm not consumed. | Re-issue to correct recipient. Audit trail captured. |
| Customer "I can't see my Blueprint" | Logs by user_id → most recent compile event → status. | If failed: reproduce in staging; if version mismatch: re-compile. |
main goes through a PR with at least one reviewer (Ahmed reviews Claude Code's PRs; another consultant reviews Ahmed's, when one exists).For active engagement clients, communication is direct (Ahmed → CTO). For Phase 2 customers, a status page (status.mihwar.app) is published from V1's Day 1 even though it has nothing on it; this normalises the surface for when it matters.
When NMO hires consultant #2, the handoff:
If V1 succeeds, Mihwar evolves from a consultant's cockpit into a self-serve platform clients run themselves. This section sketches what that looks like — the product, the billing, the go-to-market — so V1 architecture stays compatible.
Three triggers, any one of which validates the pivot:
Until then, V1 stays disciplined. Phase 2 too early kills the consulting margin.
| Concept | Phase 1 | Phase 2 |
|---|---|---|
| Tenants | 1 (NMO) | Many (each subscriber org) |
| Auth | Single passphrase + TOTP | SSO (OIDC, SAML), invite-only first, public sign-up later |
| Billing | Engagement invoices (manual) | Stripe, per-seat or per-Blueprint, with metered overage |
| Catalog | NMO's, used internally | NMO premium tier (read-only, paid) + customer-private tier (writeable) |
| Templates | Hard-coded NMO branding | Per-tenant theming, custom logos, optional white-label |
| Consultant role | Drives every engagement | Optional 2-hour expert-review at Tier; otherwise self-serve |
| Operator | Ahmed | Customer admin per tenant + NMO meta-admin |
| Support | Email + WhatsApp | In-app chat, knowledge base, ticketed |
Estimating from scratch when triggers fire:
Total: ~13 weeks (≈3 months) for Phase 2 v1, assuming V1 architecture has held the line. Funded by ~5 V1 engagements at $25k.
What changes about the experience when a client — not a senior NMO consultant — drives the workflow. The engine stays the same; the surfaces around it must compensate for the absence of the consultant in the room.
In Phase 1, the consultant interprets, refines, pushes back. In Phase 2, the user is alone with the AI. Without compensation, three failure modes appear:
Each prompt in Stage 1 ships with three affordances:
The Profile is the engine of the self-serve experience. A user who's been on Mihwar 6 months has a Profile that pre-fills 70%+ of every Stage 2 they touch. The third Blueprint they make takes a third of the time of the first.
The gate adapts when the consultant isn't there. Instead of "go ask your DBA", Mihwar offers:
For Tier "Team" and above, the user can pay for a 2-hour NMO expert review of their Blueprint draft before signoff. The reviewer reads the Blueprint, leaves margin notes, has a 30-minute call with the user, signs off the result with NMO's seal. This is the bridge between self-serve and consulting — and a high-margin upsell.
| Role | Permissions |
|---|---|
| Owner | Workspace admin, billing, can invite, can delete. |
| Editor | Run stages, edit artifacts, request signoff. |
| Reviewer | Read-only access plus comment on artifacts. |
| Contributor | Limited access — fill assigned Stage 2 slices, no Stage 3 access. |
| NMO Reviewer (paid) | External NMO consultant invited for Tier expert review. |
Phase 2 plans, what each includes, and the metering that makes them work without runaway cost.
| Plan | Price | Audience | Includes |
|---|---|---|---|
| Starter | $1,200/mo or $9,600/yr | Single AI champion at a mid-market enterprise | 1 workspace · 3 Blueprints/yr · premium catalog read-only · EN/AR · email support · standard branding |
| Team | $3,500/mo | 5-seat AI office | 5 seats · unlimited Blueprints (within fair-use cost cap) · custom branding · SSO · 1 expert review/qtr included · priority support |
| Consultancy | $25k/yr + per-Blueprint | Boutique AI shops licensing Mihwar for their clients | White-label · multi-client workspaces · customer-private catalog tier · NMO catalog as premium · API access · per-Blueprint metering ($150 each beyond 50/yr included) |
| Enterprise | Custom (from $80k/yr) | Large org with strict residency / SSO / audit needs | Dedicated tenant in-region · BYO IDP · audit export · contractual residency · SLA · dedicated support |
Two meters, both implemented atomically in Redis:
cost_usd. Soft warn at 80% of plan-implied budget; hard refuse at 110% (the 10% headroom prevents incidents from a clean compile failing for a single dollar).| Plan | Expected Blueprints/yr | AI cost | Other cost | Margin at sticker |
|---|---|---|---|---|
| Starter | 3 | ~$60–$100 | ~$120 (infra share, support) | ~97% |
| Team | ~20 | ~$400–$700 | ~$1,800 (incl. 1 expert review) | ~94% |
| Consultancy | 50–150 | ~$2,500–$5,000 | ~$3,000 (white-label support) | ≈75–80% |
| Enterprise | varies | varies | dedicated infra share + named CSM | ≈60–70% |
Margins look generous — they assume V1's AI economics discipline survives. Without prompt caching, batch usage, two-tier model selection and tenant cost caps, those numbers degrade fast. See AI Economics.
How Mihwar opens to the public when the trigger fires. A 12-week launch plan from "trigger met" to "first 20 paying tenants."
| Channel | Phase 2 fit | Effort | CAC ceiling |
|---|---|---|---|
| NMO existing pipeline | Highest — warm, in-market | Low | ~10% of ARR |
| LinkedIn thought leadership (Ahmed) | Direct — KSA AI champions follow | Med | ~15% of ARR |
| SDAIA / Vision-2030 conferences | Strong for government tier | High | ~25% of ARR |
| Boutique AI shops (Consultancy plan) | Two-sided lever — they bring their clients | Med | ~30% of ARR |
| Public docs & SEO | Long compounding — start day 1 | Med | ~5% of ARR |
| Paid ads | Avoid in V1 — low intent | — | — |
| Metric | Month 3 target | Month 12 target |
|---|---|---|
| Paying tenants | 20 | 120 |
| ARR | $200k | $1.5M |
| Activation rate (sign-up → first Blueprint) | 40% | 60% |
| Time-to-first-Blueprint | ≤14 days | ≤7 days |
| Net revenue retention | — | ≥110% |
| Avg cost-per-Blueprint | ≤$30 | ≤$25 |
| NPS (customer survey) | ≥40 | ≥55 |
| Phase 1 → Phase 2 cannibalisation | <10% engagement loss | 0% (Phase 1 is now upsell) |
Phase 2 isn't a replacement for Phase 1 — it's a complement. Mihwar's full motion at year-2 looks like:
What "Mihwar is working" actually means, measured in numbers Ahmed can read off a dashboard. Lead measures predict business outcomes; lag measures confirm them.
Mihwar's metrics fall into three tiers. Tier 1 is the only one Ahmed checks daily. Tier 2 is reviewed weekly. Tier 3 is the quarterly retrospective.
| Metric | V1 target | Why it matters |
|---|---|---|
| Engagements signed per quarter | 5+ by Q3 | The revenue line. Below 3 means Mihwar isn't shifting deals. |
| Blueprint price realised (avg) | $20k+ | Below this, NMO is competing on price not on quality. |
| Conversion: Blueprint → Build | ≥30% | The most important number. Mihwar's whole thesis. Below 20% means Blueprints aren't selling next-stage work. |
| Margin per Blueprint | ≥65% | Engagement P&L test. Below this, the tool isn't compressing time enough. |
| NPS from Blueprint recipients | ≥50 | Survey delivered 30 days after Blueprint signoff. Drives word-of-mouth referrals. |
| Metric | V1 target | Why |
|---|---|---|
| Time-to-Blueprint | ≤7 working days | The core promise. Engagements that overrun erode the value proposition. |
| Stage 2 → Stage 3 cycle time | ≤4 days | Discovery is the bottleneck Mihwar exists to fix. Trend down. |
| Catalog growth rate | +5 entries/month | Compounding IP. Stagnant catalog means Mihwar isn't learning. |
| aiproxy cost per Blueprint | ≤$30 (P50), ≤$60 (P95) | Margin discipline; verifies prompt caching, two-tier model, batch. |
| aiproxy cache hit rate | ≥80% | Direct verification of AI Economics discipline. |
| Stage signoff rework rate | <15% | Stages reopened after signoff. Above 15% means the AI's output isn't trustworthy. |
| Async response rate | ≥70% within 7 days | If async forms aren't being filled, Stage 2 stalls. |
| Uptime (rolling 30 days) | ≥99.5% | Engagements get cancelled by 12-hour outages. |
| Backup success rate | 100% | Anything below 100% is a Sev-2 incident. |
| 5xx rate | <0.5% | Above this, the Logs page becomes the daily site. |
What could go wrong, ranked by likelihood × impact, with concrete mitigations. The discipline of writing risks down is half the mitigation.
The CTO wants the architecture deck now and is impatient with Stage 2 questions. This is the #1 expected friction point.
Mitigation: Sales script up-front: "We do discovery before architecture. That's not negotiable. It's why our deliverable doesn't fall apart in your procurement committee." If a client truly won't do Stage 2, NMO walks. The gate is the product.
Stage 2 captures what the client believes their environment looks like. Reality occasionally diverges. Architecture lands, build starts, surprise.
Mitigation: Stage 2 captures source-of-truth pointers (DBA name, dashboard URL) for every claim. Every architecture component cites its inventory source. Trade-offs section explicitly flags assumptions. Build phase starts with a 1-day "validate Stage 2" sprint.
Below 20%, the whole productisation thesis weakens.
Mitigation: 30-day post-Blueprint follow-up call mandatory. Common conversion blockers tracked, fed back into Stage 4 templates and the catalog. NPS survey identifies dissatisfaction before it becomes lost revenue.
A bug in a query lets one tenant see another's data. In Phase 1 this is one bug; in Phase 2 it ends the company.
Mitigation: RLS at DB layer + tenant_id in every app query (defence in depth). Cross-tenant fence test in CI on every commit. Schema-per-tenant for Enterprise tier. Manual audit of every new query that joins multiple workspace_id values. See Multi-Tenancy.
Stage 1 is mid-conversation; Stage 3 is mid-synthesis; Anthropic returns 5xx for an hour.
Mitigation: aiproxy retries with exponential backoff. Background jobs survive transient failures via DLQ. Stage 1 chat shows a "service temporarily unavailable" banner without losing draft state. Multi-region key support designed-for; failover provider candidacy reviewed quarterly.
A bug or pathological prompt drives 10× expected aiproxy spend.
Mitigation: Hard tenant + per-feature daily caps in aiproxy. Cost-spike alert at $10/h sustained. Logs cost view drives same-day diagnosis. Budget gate on agentic loops. See AI Economics.
Client pastes a contract; the contract carries a hidden instruction to exfiltrate data via a tool call.
Mitigation: Untrusted input wrapped in delimited blocks. Tools require human-in-the-loop confirmation for any side effect. Outbound allowlist blocks unauthorised destinations. Output guardrail rejects unexpected tool calls. See Client Security.
Over 3–6 months the Blueprint voice creeps toward generic LLM tone — exclamation marks, "I'd love to help!", emoji.
Mitigation: Banned-phrases filter in aiproxy rejects offending output. Quarterly Blueprint review by Ahmed catches subtle drift. House-style prompt versioned and updated based on observed regressions.
Vision-2030 procurement vehicles favour mega-vendors; boutique consultancies are squeezed out of preferred-supplier lists.
Mitigation: Speed of Mihwar's deliverable (7 days) gives NMO an entry point that mega-vendors cannot match. The Consultancy Phase 2 plan turns the squeeze into an opportunity (boutiques license Mihwar). Government tier pursued via Tier 2 Playbook + RFP spec deliverables.
Self-serve Phase 2 erodes the perceived value of $25k consulting engagements.
Mitigation: Phase 2 priced for the segment that wouldn't have engaged a $25k consultant anyway. Concierge codes preserve the engagement margin for NMO-introduced clients. Expert-review upsell within Phase 2 funnels into Phase 1 work.
Mihwar is one Ahmed. Anything happening to Ahmed is an existential risk.
Mitigation: Hire consultant #2 by Q3 (operational redundancy). Documented runbooks for every system surface. Backup passphrases sealed-envelope held by trusted party. Insurance review.
New PDPL implementing regulation tightens residency or processing rules.
Mitigation: Multi-tenancy levels 5 + 6 (dedicated schema / dedicated DB / in-region) designed-for. aiproxy abstracts model provider. Phase 2 sovereign tier ready as escape hatch for regulated clients.
This list is reviewed quarterly. Risk status (likelihood × impact) is re-rated. New risks added; resolved risks moved to an archive. Any risk that goes up in either dimension drives a same-quarter mitigation plan, not a "we'll think about it" slot.
The masterplan is real only when the first action is taken. This page lists, in order, the concrete actions Ahmed takes in the first working week to turn this document into a running app.
mihwar.nmopartners.com for Phase 1; reserve mihwar.app for Phase 2.Arcahmed93/mihwar. Apply branch protection on main./srv/mihwar/.https://mihwar.nmopartners.com. Login page renders. Auth works. Healthcheck green.