Architecture Proposal — Phase 1 Discovery

Pearl Platform

Test Environment Architecture Proposal

A comprehensive walkthrough of the Pearl platform, why a fully isolated test environment is critical, and the architecture we recommend — designed for MessageDirect's technical leadership.

Version 2.0 Date 6 March 2026 Status Phase 1 — Discovery Complete

Begin the Story ↓ Phase 1 Deliverables 📚 Implementation Guide & Videos 🏗️ Terraform Exploration Guide

Scroll to explore

01

What Is Pearl?

Pearl is the enterprise-grade Telephone Answering Service (TAS) platform powering MessageDirect — a leading UK 24/7 virtual receptionist and contact centre business.

Core Mission

Enable call centre operators to answer phone calls on behalf of hundreds of subscribing client companies — capturing caller details, recording messages, triggering escalations via SMS/email/push, and providing clients with a self-service portal to view messages, manage rotas, and pay invoices.

📞

24/7 Contact Centre

Operators answer calls around the clock on behalf of client companies using dynamic answering scripts

💬

Message Handling

Capture caller details, record messages, and escalate to the right contact via SMS, email, or push

🌐

Client Self-Service

108+ portal pages for clients to view messages, manage rotas, search callers, and handle billing

💳

Billing & Payments

Automated billing lifecycle — usage tracking, invoice generation, card & DD payments, Xero accounting sync

🤖

AI & Voice

AI chatbots, voice assistants (ElevenLabs/Twilio), speech analytics, and GPT-powered QC scoring

📊

Multi-Brand

Operates MessageDirect, JAM, Answer.co.uk, Argyll, VirtuallyThere — all from one platform

The Core Flow: How a Call Becomes a Message

sequenceDiagram participant Caller participant Genesys as Genesys Cloud CX participant Pearl as Pearl Web App participant Op as Operator Browser participant DB as SQL MI (17 DBs) participant Esc as Escalation Engine participant Client as Client (SMS/Email) Caller->>Genesys: Dials client's number Genesys->>Pearl: DDI lookup (GET /exposed/genesys_ddilookupv2.aspx) Pearl->>DB: Lookup PhysicalDDIs → Company Pearl-->>Genesys: JSON routing instructions Genesys->>Op: Route call to operator Pearl->>Op: Screen pop via Totem (long-poll) Note over Op: Operator sees greeting script,
data fields, special instructions Op->>Pearl: Save message (caller, body, "call for") Pearl->>DB: Write to Messages, Callers, ScreenInits Pearl->>Esc: PutIntoDispatchQueue() Esc->>DB: Resolve rota → contacts Esc->>Client: Deliver via SMS/Email/Push

02

Platform at Scale

A mature, organically-grown platform handling significant operational complexity.

0 Database Tables Across 17 databases on SQL MI

0 Service Endpoints /exposed/ internal API pages

0 Background Jobs /utilityservices/ job pages

0 Admin Tools /Tools/ admin pages

0 Client Portal Pages /usercontrolpanel/ self-service

0 External Integrations From Stripe to Genesys to AI

0 Application Components Web apps, workers, services

0 Hours / 7 Days Platform never sleeps

03

Current Production Architecture

The production layout powering 24/7 operations today.

flowchart TB subgraph Internet["Internet / Users"] Operators["Call Centre Operators
(24/7)"] Clients["Client Portal Users"] CTI["Genesys Cloud CX
(Telephony)"] end subgraph AzureProd["Azure Production Environment (UK South)"] subgraph WebTier["Web Tier — IIS Servers"] Pearl3["pearl3.private.pearl
(IIS Web Server 1)"] Pearl4["pearl4.private.pearl
(IIS Web Server 2)"] end subgraph InternalSvc["Internal Services Layer"] PearlInternal["pearlinternal.private.pearl
pearl-webservices-azure
+ utility-server"] Memcached["memcached.private.pearl:11211
(Distributed Cache)"] end subgraph Workers["Background Workers"] QueueProc["queue-processor-azure
(Job Queue HTTP Executor)"] SystemCheck["system-checker
(Health Monitor + Alerting)"] AISpooler["ai-spooler
(6-Lane AI QC Spooler)"] Totem["totem-2-cloud-nosql
(Long-Poll Notification Socket)"] end subgraph DBTier["Database Tier"] SQLMI["Azure SQL Managed Instance
(Business Critical)
17 databases • 489+ tables
2 SQL accounts (pearl, utility)"] end end subgraph ExtSvc["External Services (22 Integrations)"] direction LR S3["Amazon S3"] Solr["Apache Solr
(5 Search Cores)"] Stripe["Stripe"] GC["GoCardless"] SagePay["SagePay"] Mailgun["Mailgun"] SMSGw["SMS Gateways
(MediaBurst, MessageBird, ClickSend)"] Zoho["Zoho Desk/CRM"] Twilio["Twilio"] AzureAI["Azure OpenAI"] EL["ElevenLabs"] Xero["Xero Accounting"] BQ["BigQuery"] end Operators --> Pearl3 Operators --> Pearl4 Clients --> Pearl3 CTI --> PearlInternal Pearl3 --> SQLMI Pearl4 --> SQLMI Pearl3 <--> Memcached Pearl4 <--> Memcached Pearl3 --> PearlInternal PearlInternal --> SQLMI PearlInternal --> S3 PearlInternal --> Solr QueueProc --> SQLMI QueueProc --> PearlInternal SystemCheck --> SQLMI AISpooler --> PearlInternal Totem <--> Pearl3 PearlInternal --> Stripe PearlInternal --> GC PearlInternal --> Mailgun PearlInternal --> SMSGw PearlInternal --> Zoho PearlInternal --> AzureAI PearlInternal --> Twilio PearlInternal --> EL PearlInternal --> Xero PearlInternal --> BQ

04

Technology Stack

The confirmed technology landscape powering every layer of Pearl.

Runtime

.NET Framework 4.8

Languages

VB.NET (~95%), C#

Web Framework

ASP.NET Web Forms ("Web Site" model)

UI Library

Telerik RadControls for ASP.NET AJAX

Database

Azure SQL MI (Business Critical)

Caching

Memcached (BeIT client, port 11211)

Session

SQL Server Session (5-hour timeout)

Authentication

ASP.NET Forms Auth (cookie) + 2FA/TOTP

Search

Apache Solr (5 cores)

Azure Region

UK South (London)

ASP.NET "Web Site" Compilation Model

This is not a Web Application project — the folder structure is the project. App_Code/ is auto-compiled at runtime via JIT. Source .vb and .aspx files are deployed directly to the server. Pre-compilation uses aspnet_compiler.exe for production.

05

Application Components — Deep Dive

8 distinct components, each with unique runtime characteristics.

Component	Type	Framework	Role	Database Access
pearl-azure	ASP.NET Web Forms	.NET 4.8	Main UI — operators, admins, client portal. 321+ exposed endpoints, 304+ admin tools, 108+ portal pages	All 17 databases
pearl-webservices-azure	ASP.NET Web App	.NET 4.8	Background services — 278+ utility job pages, billing, stats, search indexing, AI QC endpoints, job scheduler	All 17 databases
utility-server	ASP.NET Web Forms (3 sub-apps)	.NET 4.8	PCI-isolated payments portal (Stripe), Xero accounting sync, multi-brand reporting	PearlBilling, PearlData, PearlOperations
queue-processor-azure	WinForms (.exe)	.NET 4.8	Job queue worker — claims rows from Process_JobQueue, executes HTTP calls with turn-based coordination	PearlQueues, PearlData, PearlBilling, PearlLog
system-checker	WinForms (.exe)	.NET 4.8	Health monitoring — ICMP ping, TCP, HTTP probe, SQL query, disk space checks with transition-based alerts	Checking, PearlOperations, PearlData
ai-spooler	WinForms (.exe)	.NET 4.8.1	AI QC spooler — 6-lane conveyor belt for speech analytics, round-robin distribution, 55s backoff on empty	Via HTTP to pearl-webservices
totem-2-cloud-nosql	Console App (Socket Server)	.NET 3.5	Real-time browser notifications via long-poll. /register, /poll, /notify protocol. All state in-memory	None (in-memory only)
alpha-code-generator	WinForms (.exe)	.NET 4.8	Batch generator for unique 9-char alphanumeric codes (base-31 encoding)	FreeAlphaCodes table

Core Business Logic Modules (App_Code)

The backbone of Pearl's logic — VB.NET classes auto-compiled at runtime.

⚙️

PearlOperations.vb

~557 KB — Screen XML, message processing, DDI management, screen pop, real-time signalling

Core Engine

🖥️

PearlControls.vb

~338 KB — Dynamic UI generation from XML config. Renders answering screens, data grids, forms

UI Renderer

👤

PearlUserManagement.vb

~220 KB — User CRUD, login, permissions, shift tracking, password management

Identity

🏢

PearlCompanyManagement.vb

~153 KB — Client onboarding, company config, setup wizards

Clients

🔔

PearlEscalation.vb

~97 KB — Escalation rules, notification routing, on-call rota resolution

Dispatch

💳

PearlPayments.vb

~90 KB — Stripe, SagePay, GoCardless — gateway integrations & payment processing

Billing

06

How Data Flows Through Pearl

Five interconnected data flows that power the entire platform.

flowchart LR subgraph CallFlow["1. Call-to-Message Flow"] direction TB CF1["Genesys CTI Event"] --> CF2["DDI Lookup"] CF2 --> CF3["Screen Pop via Totem"] CF3 --> CF4["Operator Captures Message"] CF4 --> CF5["Save to PearlData"] CF5 --> CF6["Escalation Queue"] CF6 --> CF7["SMS / Email / Push"] end subgraph BillingFlow["2. Billing Flow"] direction TB BF1["Message Saved"] --> BF2["BillItem Created"] BF2 --> BF3["Rate Calculation"] BF3 --> BF4["Invoice Generation"] BF4 --> BF5["Stripe / GoCardless / SagePay"] BF5 --> BF6["Xero Sync"] end subgraph QueueFlow["3. Background Job Flow"] direction TB QF1["Job Scheduler
(5-min cycle)"] --> QF2["Process_JobQueue"] QF2 --> QF3["Queue Processor Claims"] QF3 --> QF4["HTTP Execution"] QF4 --> QF5["Result Logged"] end subgraph AIFlow["4. AI QC Flow"] direction TB AF1["Message Created"] --> AF2["AI Spooler Fetches"] AF2 --> AF3["6-Lane Round-Robin"] AF3 --> AF4["Speech Analytics"] AF4 --> AF5["GPT-4o Scoring"] end subgraph RTFlow["5. Real-Time Flow"] direction TB RF1["State Change"] --> RF2["/notify to Totem"] RF2 --> RF3["Match Subscribed Sessions"] RF3 --> RF4["Return Script to Browser"] end

07

17 Databases at a Glance

Azure SQL Managed Instance (Business Critical) — the data backbone.

PearlData 165 tables — Core ops data

PearlUsers 79 tables — Users & companies

PearlBilling 76 tables — Invoices & payments

PearlQueues 39 tables — Job queues & dispatch

PearlLog 30 tables — Audit & access logs

PearlSwitch 30 tables — DDI & call routing

PearlAnalysis 23 tables — QC & text analysis

PearlArchive 22 tables — Historic archival

PearlOperations 22 tables — Screen setups & config

PearlSearch 3 tables — Solr sync tokens

SMSBroadcast SMS delivery spool

Messages Message storage (legacy)

MSGView Message viewing portal

LookupDBs Reference & postcodes

ASPNET Session state

ASPStateInMemory Legacy OLTP (disabled)

Checking Health check definitions

Database Access Pattern

Two SQL accounts: pearl (main apps — web & workers) and utility (utility-server & system-checker). Cross-database queries use 3-part naming. The ConfigStrings table in PearlOperations holds all connection strings, API keys, and feature flags — the central configuration hub.

08

22 External Integrations

Every external dependency Pearl relies on — from telephony to AI.

☎️ Telephony & Voice

Genesys Cloud CX Primary CTI — DDI routing, call stats, screen pops, speech analytics

Twilio Programmable voice for AI assistants

ElevenLabs Text-to-speech / conversational AI

💳 Payments

Stripe Primary card processor — Checkout + auto-pay

GoCardless Direct debit collections

SagePay/Opayo PAYG card payments (Answer.co.uk)

📨 Communications

Mailgun Transactional email delivery

MediaBurst (Route 21) SMS provider with failover

MessageBird (Route 22) SMS provider

ClickSend (Route 23) SMS provider

🧠 AI & Analytics

Azure OpenAI GPT-4o-mini for QC scoring

Genesys Speech Analytics Transcript + sentiment

BigQuery Analytics data export

🔧 Business Tools

Xero Accounting — invoice & payment sync

Zoho Desk + CRM Support & customer management

Amazon S3 Backups, recordings, AI QA archives

Apache Solr Full-text search (5 cores)

09

The Pain Points

A complex, mission-critical platform with no isolated test environment. Every change is a risk to the 24/7 production service.

⚠️

No Test Isolation

All development and testing happens against or very near production. Every deploy risks the live 24/7 service that operators and clients depend on around the clock.

🔗

22 Live Integrations at Risk

A test against the wrong config could trigger real Stripe charges, send SMS to real customers, or disrupt live Genesys call routing. No safety net exists.

🔄

No Repeatable Regression

Cannot wipe and rebuild a clean test state. No way to validate that a change doesn't break any of the 489+ tables, 321+ endpoints, or 278+ background jobs.

🔒

PII Exposure & GDPR Risk

Any test data access risks exposing real customer PII — names, phone numbers, billing details, message content. No masking or anonymisation layer exists.

🏗️

Legacy Architecture Constraints

.NET Framework 4.8 with WinForms workers, raw sockets (.NET 3.5 Totem), and hardcoded IPs — not cloud-native, cannot use modern PaaS services without refactoring.

📋

No Release Process

Deployments are robocopy-based file syncs with no approval gates, no rollback mechanism, no audit trail. Manual and error-prone.

The Bottom Line

Every code change, database migration, or configuration update is deployed directly to production with no safety net. For a 24/7 contact centre handling calls for hundreds of client companies, this is an unacceptable operational risk that must be resolved.

10

Client's Request for Proposal

MessageDirect issued an RFP to design and deliver a secure, fully isolated, repeatable test environment. The RFP can only be formally responded to once Phase 1 (Discovery) is finalised.

1

Safe Releases

Develop, deploy, and validate changes without any risk to production

2

Repeatability

Wipe and rebuild the test environment and reload test data on demand

3

Full Isolation

Complete isolation from production systems and data; private-only connectivity

4

Test Data

Clean, anonymised dataset (no production PII) with weekly refresh procedure

5

Scalability

Spin up multiple test environments per feature branch with minimal overhead

💰

Budget

GBP £25,000 total cap (discovery + implementation)

📅

Timeline

Test environment ready by start to mid May 2026

🔐

Compliance

ISO 27001 aligned + GDPR data controls

📄

Deliverables

IaC, CI/CD, runbooks, SOPs, handover walkthrough

11

Architecture Options Evaluated

We evaluated 4 architecture options against Pearl's specific constraints.

★ SELECTED

3-VM Split + Azure SQL MI (General Purpose)

VM1 — Web Tier: IIS (pearl-azure + pearl-webservices + utility-server) + Memcached + Solr

VM2 — Worker Tier: queue-processor + system-checker + ai-spooler + totem-2-cloud-nosql

VM3 — Build/Dev: GitHub Actions runner + MSBuild 17 + .NET 4.8 SDK + restore tools

DB: Azure SQL MI (General Purpose, 4 vCores) — all 17 databases masked

Estimated implementation: 425h (incl. 20% buffer)

✅ Pros

Mirrors production layout — reliable test results
Zero code changes needed
Clear role separation (web vs worker vs build)
Familiar ops model (Windows Server + IIS)
Maps cleanly to IaC (Bicep/ARM)

⚠️ Trade-offs

Highest VM count of viable options
Moderate running cost (~£700/mo at 24/7)

2-VM + Azure Functions Hybrid

VM1 = Web + Memcached

VM2 = GitHub runner

Workers converted to Azure Functions

DB = Azure SQL MI (General Purpose)

Estimated implementation: 665h (incl. 20% buffer)

✅ Pros

Lower VM cost
Functions scale automatically

❌ Why Not

Requires refactoring WinForms workers to Functions
queue-processor uses in-process timer/state with turn-based coordination
ai-spooler uses 6-thread conveyor model with round-robin
totem uses raw .NET 3.5 sockets — incompatible with Functions
Significant engineering effort — explicitly out of RFP scope

Single VM — Everything on One Box

All components on one VM: web + workers + runner + Memcached + Solr

Totem as Azure Function (requires refactor)

Estimated implementation: 390h (incl. 20% buffer)

✅ Pros

Cheapest option (~£400/mo)

❌ Why Not

No isolation between web/worker/build processes
Resource contention — build jobs starve web tier
Doesn't replicate production topology
Test results unreliable for production prediction

4-VM Full Separation

VM1 = Web only

VM2 = Workers only

VM3 = Totem + Memcached + Solr

VM4 = Build runner

Estimated implementation: 460h (incl. 20% buffer)

✅ Pros

Maximum isolation per role

❌ Why Not

Over-engineered for a test environment
Highest cost (~£900/mo) — exceeds budget tolerance
Extra VM provides marginal benefit for testing

12

Weighted Comparison

5 criteria. 4 options. One clear winner.

Production Fidelity (25%)

A

B

C

D

Cost Efficiency (25%)

A

B

C

D

Time to Deliver (20%)

A

B

C

D

Operational Simplicity (15%)

A

B

C

D

Scalability (15%)

A

B

C

D

Option A — 3-VM

0

★ WINNER

Option D — 4-VM

0

Option C — Single

0

Option B — Hybrid

0

13

Why 3-VM Split Is the Right Answer

The architecture is dictated by Pearl's actual runtime constraints.

1

No Code Changes Required

WinForms workers (queue-processor, system-checker, ai-spooler) are architecturally bound to the Windows desktop runtime. Totem uses raw .NET 3.5 sockets. Converting to Azure Functions would be a major rewrite — explicitly out of the RFP scope.

2

Mirrors Production Topology

3-VM layout replicates the actual production separation: web tier (IIS), internal services tier (workers), and a dedicated build server. Test results reliably predict production behaviour.

3

Fastest Path to Delivery

Deploy existing compiled binaries via robocopy — the current deployment method. No new toolchain, no recompilation model, no replatforming. Ship in weeks, not months.

4

Operationally Familiar

Windows Server 2022 + IIS + Windows Services. The team already knows how to operate, troubleshoot, and deploy this stack. Zero learning curve.

5

Budget Appropriate

The recommended estate is budgeted at about £2.39k/month based on the current Azure calculator export. That figure is higher than a simple lab because it includes the controls that make the environment credible: SQL Managed Instance, the 3-VM role split, secure access, outbound control, monitoring, backup, and security services. It is still the right shape of spend because it funds safe delivery and testing rather than forcing risky shortcuts.

6

IaC-Ready for Repeatability

3 VMs + SQL MI + networking maps cleanly to Bicep/ARM templates. Entire environment can be torn down and rebuilt from code — meeting the RFP's repeatability requirement.

14

Target Architecture

The complete test environment design — fully isolated from production.

flowchart TB subgraph DevAccess["Developer / QA Access"] Dev["Developer Workstation"] QA["QA Tester"] end subgraph GitHub["GitHub"] Repo["GitHub Repository"] Actions["GitHub Actions CI/CD"] end subgraph AzureTest["Azure Test Subscription (UK South)"] subgraph HubVNet["Hub VNet"] Bastion["Azure Bastion
(Secure RDP only)"] FW["Azure Firewall
(Approved outbound only)"] end subgraph SpokeVNet["Test Spoke VNet"] VM1["VM1 — Web Tier
D4s v5 • 4 vCPU • 16 GB
IIS + Memcached + Solr"] VM2["VM2 — Worker Tier
D2s v5 • 2 vCPU • 8 GB
Queue Proc + System Check
AI Spooler + Totem"] VM3["VM3 — Build / Restore Tier
D2s v5 • 2 vCPU • 8 GB
GitHub Runner + MSBuild
Restore + Masking Tools"] TestSQL["Test SQL MI (GP, 4 vCores)
17 masked databases"] Blob["Azure Blob Storage
(Prod backup staging)"] KV["Azure Key Vault"] end end subgraph ProdRO["Production (Read Only)"] ProdSQL["Prod SQL MI
(Weekly backup source)"] end subgraph Sandboxes["Approved Sandboxes"] GS["Genesys Sandbox"] ST["Stripe Test"] GC["GoCardless Sandbox"] MG["Mailgun Sandbox"] S3T["S3 Test Bucket"] end Dev -->|"Bastion RDP"| Bastion QA -->|"Bastion RDP"| Bastion Bastion --> VM1 & VM2 & VM3 Repo --> Actions Actions -->|"Self-hosted runner"| VM3 VM3 -->|"Deploy"| VM1 & VM2 ProdSQL -->|"Weekly .bak to Blob"| Blob Blob -->|"Restore + mask"| VM3 VM3 -->|"Restore to"| TestSQL VM1 --> TestSQL VM2 --> TestSQL KV --> VM1 & VM2 & VM3 VM1 -. outbound via firewall .-> FW VM2 -. outbound via firewall .-> FW FW -. controlled egress .-> GS & ST & GC & MG & S3T VM2 -->|"HTTP jobs"| VM1 VM1 -. "Totem notify" .-> VM2

Portable diagram asset: target-architecture.png

How This Architecture Works End to End

The design uses a private hub-and-spoke Azure layout so administrator access, application workload, and outbound internet traffic are controlled separately. Azure Bastion is the only RDP entry point, Azure Firewall is the single outbound checkpoint, and the spoke VNet hosts the actual Pearl workload across VM1 for IIS and local cache/search, VM2 for background workers, and VM3 for build, restore, and masking automation.

The single test SQL Managed Instance stores all 17 masked databases used by the environment. Production never connects directly to the test estate; it only places weekly backup files into Blob Storage, and VM3 restores, masks, and validates those backups before VM1 and VM2 use them. Azure Key Vault keeps the environment secrets out of the servers, and every external dependency is redirected to sandboxes such as Genesys, Stripe, GoCardless, Mailgun, and the test S3 bucket so the platform behaves like production without touching live customer data, live payments, or live telephony.

15

Phase 1 — Discovery & Planning

Phase 1 is the foundation. The RFP response to the client cannot be submitted until Phase 1 is decided and finalised. This is where we confirm everything about the current system, size the target, and commit to the plan.

🔍

Review Production VM Setup

Confirm current IIS configuration, server roles, installed components, Windows features, and service accounts on pearl3, pearl4, pearlinternal

🗄️

Review SQL Server & Backup Size

Measure actual database sizes for all 17 databases. Confirm Business Critical tier specifics. Estimate .bak sizes for backup/restore pipeline

📦

Review Runtime Dependencies

Catalog all .NET Framework versions (.NET 4.8, 4.8.1, 3.5), Telerik licence requirements, NuGet packages, Bin/ DLLs, and third-party assemblies

⚙️

Identify Environment Configs

Map all ConfigStrings entries, web.config connection strings, hardcoded IPs (10.0.0.12, 10.0.1.44), hostnames, and file paths that need repointing

🔒

Define Network & Security

Design hub-spoke topology, subnet addressing (10.1.x.x hub, 10.2.x.x spoke), NSG rules, Azure Bastion access, firewall egress whitelist

☁️

Confirm Azure Sizing

Finalise VM SKUs, SQL MI tier and vCores, storage requirements, region (UK South). This sizing recommendation drives the cost model.

Phase 1 Deliverables

📋

Architecture Diagram

Current production setup documented with all components, connections, and dependencies mapped

📐

Azure Sizing Recommendation

Finalised SKUs, vCores, storage tiers — the basis for the cost model and RFP response

⚠️

Risk Assessment Summary

All identified risks with likelihood, impact, and proposed mitigations

16

Azure Sizing Recommendation

The sizing recommendation is the key Phase 1 output — it determines the cost model and drives the RFP response to the client.

VM1 — Web Tier

D4s v5

vCPUs: 4 RAM: 16 GB OS Disk: 128 GB Premium SSD (P10) Data Disk: 256 GB Premium SSD (P15) OS: Windows Server 2022 Datacenter

Why This Size?

IIS hosts 3 web applications — pearl-azure (321+ endpoints + 304+ admin tools + 108+ portal pages), pearl-webservices-azure (278+ job pages), and utility-server (3 sub-apps)
Memcached requires ~2-4 GB RAM — distributed cache serving all web requests
Apache Solr requires ~1-2 GB RAM — 5 search cores (messageanalytics, callers, faqs, elements, search)
ASP.NET Web Forms JIT compilation — first-request compilation of App_Code/ modules (PearlOperations.vb is 557 KB alone) is CPU-intensive
4 vCPUs provide headroom for concurrent IIS requests, Solr indexing, and cache operations

VM2 — Worker Tier

D2s v5

vCPUs: 2 RAM: 8 GB OS Disk: 128 GB Premium SSD (P10) OS: Windows Server 2022 + .NET 3.5 Feature

Why This Size?

queue-processor — timer-based poll loop, claims batches from Process_JobQueue, executes HTTP calls. Low CPU, moderate memory
ai-spooler — 6 concurrent worker threads + fetcher thread. Each thread holds one HTTP connection. Moderate parallel I/O
system-checker — 10-second timer loop running health checks (ICMP, TCP, HTTP, SQL). Low resource usage
totem-2-cloud-nosql — .NET 3.5 socket server handles long-poll connections. In-memory state only. Needs .NET 3.5 Framework feature enabled
128 GB OS disk — gives room for service logs, queue files, patching, and safe operating headroom over a minimal image-only disk
2 vCPUs sufficient — workers are I/O-bound (HTTP calls, SQL queries), not CPU-bound

VM3 — Build / Restore Tier

D2s v5

vCPUs: 2 RAM: 8 GB OS Disk: 128 GB Premium SSD (P10) Data Disk: 128 GB Premium SSD (P10) OS: Windows Server 2022 Datacenter

Why This Size?

GitHub Actions self-hosted runner — runs CI/CD workflows triggered by repo events
MSBuild 17 + .NET 4.8 SDK — compiles all 7 components plus aspnet_compiler.exe precompilation
PowerShell restore tooling — downloads .bak files from Blob, runs RESTORE DATABASE commands, executes masking scripts
Dedicated 128 GB data disk — keeps restore staging files, build artefacts, and backup downloads away from the OS volume
2 vCPUs adequate — builds run sequentially (not parallel), restore is I/O-bound

Test SQL Managed Instance

General Purpose

vCores: 4 Storage: 256 GB Tier: General Purpose (not Business Critical) Compute: PAYG

Why General Purpose (Not Business Critical)?

Production uses Business Critical for HA (Always On, low latency) — test doesn't need this
General Purpose costs ~60% less than Business Critical for equivalent vCores
4 vCores handles functional testing workload (not performance testing)
Supports all 17 databases with cross-database queries (3-part naming)
256 GB storage is the approved baseline for the first estimate and leaves operational room for the weekly restored test set

Supporting Resources

Resource	SKU / Config	Justification
Azure Blob Storage	Hot tier, LRS, ~500 GB	Weekly backup staging — 4 weekly copies of all 17 databases with 28-day retention
Azure Bastion	Standard SKU	Secure RDP to all VMs — no public IPs, no VPN needed. Audit-logged access
Azure Firewall	Standard SKU	Egress filtering — allowlist-only outbound to sandbox endpoints. Prevents accidental production contact
Azure Key Vault	Standard	All connection strings, API keys, secrets. Managed identity access. Versioned secret rotation
Azure Monitor + Log Analytics	Per-GB ingestion	Centralised logging, alerting, diagnostics, and support visibility across VM, SQL, firewall, and security events
Microsoft Defender for Cloud	Servers + SQL + Storage + Key Vault	Continuous vulnerability and threat monitoring so the test estate does not become the weak security point
Azure Backup	3 protected VMs, LRS	Fast recovery path for failed releases, broken configurations, or accidental deletion during testing
Azure Update Manager	3 managed servers	Automated patching to keep the Windows estate current without manual server-by-server maintenance

16b

Recommended Azure Cost Estimate

This is the client-ready cost estimate for the recommended build only: the approved 3-VM split, one Azure SQL Managed Instance, and the supporting Azure services needed to keep the environment secure, recoverable, and properly isolated from production.

Calculator Baseline

£2,387.62/month

Based on the exported Microsoft Azure Pricing Calculator estimate in UK South, Pay-As-You-Go, dated 16 April 2026. This is the current safe planning baseline for the recommended environment.

£28,651.44 per year Recommended Option A only Pricing export included

What This Budget Actually Buys

A production-shaped test platform with separate web, worker, and build / restore roles instead of one overloaded all-in-one server.
Safe administration through Bastion and a private-only SQL path, so the environment is not exposed directly to the internet.
Controlled outbound traffic through Azure Firewall, which stops the test environment from accidentally touching live payment, SMS, telephony, or email systems.
Operational resilience through backup, monitoring, patching, and security scanning so the team can test confidently without creating a support burden.

Monthly Baseline

£2,387.62/mo

Current calculator export total for the full recommended environment.

Recommended budget line

Annual View

£28,651.44/yr

Useful for annual planning, internal approval, and client budget framing.

Main Cost Drivers

£1,864.51/mo

Firewall, SQL Managed Instance, and the 3 VMs together account for about 78% of the total.

Pricing Calculator Reference

This estimate was taken from the official Microsoft Azure Pricing Calculator. The screenshot below is included as the visual source reference used for the client-facing cost breakdown. Source: azure.microsoft.com/en-us/pricing/calculator/

Microsoft Azure Pricing Calculator screenshot showing the recommended Pearl test environment estimate

Reference screenshot from the Microsoft Azure Pricing Calculator showing the estimate basis used for this recommendation. Included here to show the origin of the pricing numbers presented in this section.

Azure Service	Recommended Baseline	Monthly Cost	Purpose in Plain English	Why This Is Recommended
VM1 — Web Tier	D4s v5, 4 vCPU, 16 GB RAM, 128 GB P10 OS disk, 256 GB P15 data disk	£240.95	Runs the Pearl websites, internal web services, cache, and search. In simple terms, this is the front door that serves pages, handles requests, and keeps the user-facing side responsive.	This is the one server that needs the most headroom because it carries IIS, Memcached, and Solr together. The chosen size is large enough for realistic testing without paying for a production-scale machine.
VM2 — Worker Tier	D2s v5, 2 vCPU, 8 GB RAM, 128 GB P10 OS disk	£129.44	Runs the background jobs that users do not see directly, such as queue processing, health checks, AI spooler activity, and the Totem notification service.	Keeping this work off the web server protects test fidelity and matches how the live platform behaves. The smaller VM is enough because these services are mostly waiting on I/O rather than using heavy CPU.
VM3 — Build / Restore Tier	D2s v5, 2 vCPU, 8 GB RAM, 128 GB P10 OS disk, 128 GB P10 data disk	£147.39	Handles builds, deployments, backup downloads, database restores, and masking scripts. This is the engineering workbench for the environment.	Builds and restore jobs can be noisy and storage-heavy. Giving them their own server avoids slowing down test activity on VM1 and VM2 and makes the environment easier to support.
Azure SQL Managed Instance	General Purpose, 4 vCores, 256 GB storage	£660.15	Stores all 17 Pearl databases and provides the SQL features the application expects, including cross-database behaviour that simpler database services do not handle well.	Managed Instance is the right fit because it behaves much more like the current SQL estate. General Purpose keeps compatibility while avoiding the higher Business Critical price that test does not need.
Azure Blob Storage	General Purpose v2, Hot, LRS, 500 GB	£8.15	Acts as the landing zone for weekly backup files, restore artefacts, and a small amount of supporting automation content.	It is the cheapest practical way to keep several weekly copies ready for refresh work. This is a low-cost, high-value part of the design.
Azure Bastion	Standard SKU	£159.29	Provides secure browser-based admin access to the Windows servers without exposing remote desktop ports or public IPs on the VMs.	This is recommended because it sharply reduces the attack surface and gives a cleaner, more defensible security story for the client.
Azure Firewall	Standard SKU	£686.58	Works as the outbound checkpoint for the whole estate. It decides what the environment is allowed to contact outside Azure.	This is what keeps the test environment truly isolated. It prevents accidental calls to live payment gateways, live SMS routes, live email flows, or production-only destinations.
Azure Key Vault	Standard tier	£2.39	Stores passwords, connection strings, API keys, and certificates in one controlled vault instead of scattering secrets across servers and config files.	It is recommended because it removes one of the most common causes of security drift: secrets copied into scripts, notes, or server folders.
Azure Monitor + Log Analytics	Workspace, diagnostics, alerting, application insights	£140.23	Collects logs, metrics, alerts, and diagnostic events from the servers, SQL, firewall, and supporting Azure services.	Without it, every issue becomes a manual investigation across multiple machines. With it, the team gets one place to troubleshoot and prove what happened.
Microsoft Defender for Cloud	3 protected servers, SQL, storage, Key Vault	£29.91	Continuously checks the environment for vulnerabilities, missing controls, and suspicious security events.	Recommended because a weaker test environment can become the easiest route into the wider estate. This keeps the security posture close to production.
Azure Backup	3 protected VMs, LRS backup storage	£96.61	Creates recoverable restore points for the servers so the team can roll back quickly if a deployment or test change breaks the environment.	Recommended because rebuilding multiple Windows servers manually is slow and expensive. Backup reduces recovery time and lowers operational risk.
Azure Update Manager	3 managed servers	£11.29	Automates Windows patching so the environment stays current without someone having to maintain each server by hand.	Recommended because patching is often skipped on test platforms first. This keeps the estate supportable and avoids preventable security issues.
Microsoft Support Plan	Azure support coverage	£75.24	Gives access to Microsoft support engineers if there is a platform problem with Azure networking, SQL Managed Instance, or service-level issues.	Recommended because some Azure issues cannot be solved from inside the project team alone. It shortens outage time and gives a formal escalation path.
Total Recommended Monthly Cost	Recommended test environment baseline	£2,387.62	Annual view: £28,651.44. This is the current calculator-backed baseline for the proposed test estate.

Where The Spend Sits

Networking — Firewall + Bastion £845.87

Database — SQL Managed Instance £660.15

Compute — VM1, VM2, VM3 £517.78

Monitoring — Azure Monitor £140.23

Management — Backup + Update Manager £107.90

Support — Microsoft support plan £75.24

Security — Defender + Key Vault £32.30

Total £2,387.62

The biggest cost items are the firewall, the database tier, and the three servers themselves. That is expected: isolation, SQL compatibility, and role separation are the three design choices that make this environment useful rather than just cheap.

Important Planning Notes

This estimate is the current calculator export, not a discounted contract rate. It is suitable for planning and client presentation, but not a commercial quote.

The VM lines in the export were costed at full monthly runtime. Once business-hours auto-shutdown is applied, day-to-day compute spend should land lower than this baseline.

SQL Managed Instance stays on all the time. That is why its cost remains one of the dominant line items even if the VMs are scheduled down overnight.

Potential optimisations exist later. Azure Hybrid Benefit or reserved pricing could reduce costs once the client commits to the estate long term, but those savings should not be assumed up front.

Conclusion

The recommended test environment is worth funding because it removes live operational risk, not because it is the cheapest possible build. At about £2.39k per month, MessageDirect gets a secure, production-shaped platform where releases, restores, integration changes, and troubleshooting can happen safely before anything reaches the live Pearl service.

The cost is driven by the controls that make the environment credible: proper SQL compatibility, secure private access, outbound isolation, and recoverability. Those are exactly the controls the client is buying, and they are what turn this from a simple lab into a safe engineering platform.

16c

Azure Billing & Cost Control

Complete guide to Azure payment options, subscription types, and resource-level cost controls — with direct answers on what can be paused and what cannot.

Sources: Azure Cost Management docs · VM billing states

Direct client answer

Can this Azure estate be paused instead of paying Pay-As-You-Go all the time?

Partly, yes. The 3 virtual machines can be stopped and deallocated, which removes their compute runtime charge. However, Azure does not provide a single pause button for the whole estate. Several platform services continue billing while they remain deployed — especially SQL Managed Instance, Azure Firewall, Bastion, storage, backup retention, monitoring, and support cover.

Best short-term saving: auto-shutdown and deallocation of VM1, VM2, and especially VM3 outside working hours.
Main always-on cost: Azure SQL Managed Instance at £660.15/month in the current model.
Main fixed infrastructure cost: Azure Firewall and Bastion at £845.87/month combined while deployed.
Best long-term savings: Azure Hybrid Benefit, 1-year or 3-year reservations, and rightsizing once real usage is measured.

Important Terminology

In Azure, the word "subscription" has two distinct meanings:

Account-level subscription — the commercial agreement type (Pay-As-You-Go, Enterprise Agreement, CSP, etc.) that governs how Microsoft bills the customer.
Resource purchase model — the pricing method applied to a specific resource (Reserved Instance, Savings Plan, Azure Hybrid Benefit, etc.).

For this estate, the resource purchase model is the more important cost-control lever. Both categories are documented comprehensively below.

These are the commercial agreement types under which an Azure subscription can be created. They affect billing relationship, discount eligibility, and governance — but they do not change the technical behaviour of individual resources.

Current baseline

Pay-As-You-Go (PAYG)

No upfront commitment. Resources are billed monthly at list price. This is the model used in the current estimate.

Azure docs →

Recommended to evaluate

Microsoft Customer Agreement (MCA) / Azure Plan

The modern replacement for older PAYG. Provides access to Azure savings plans and reservations with a cleaner billing experience.

Azure docs →

For larger organisations

Enterprise Agreement (EA)

Volume licensing agreement typically for 500+ seats. Provides upfront monetary commitment with significant discounts, centralised billing, and reserved instance pricing.

Azure docs →

Partner-managed

Cloud Solution Provider (CSP)

Azure purchased through a Microsoft partner who manages billing, support, and provisioning. Pricing set by the partner; can include bundled services.

Azure docs →

Strong fit for test environments

Pay-As-You-Go Dev/Test

Discounted subscription type for non-production workloads. Removes Windows Server licence charges on VMs and provides reduced rates on several services. Requires Visual Studio subscription.

Azure docs →

Licence-linked

Visual Studio / MSDN Subscriber

Monthly Azure credits for Visual Studio subscribers. Suited for individual dev/test use, not production workloads. Enterprise subscribers receive up to £115/month in credits.

Azure docs →

EA-linked

Enterprise Dev/Test

Non-production subscription under an Enterprise Agreement. Same Dev/Test discounts but governed through the EA portal. Requires EA with Visual Studio subscribers.

Azure docs →

Onboarding / evaluation

Azure Free Trial

£150 credit for 30 days plus 12 months of free-tier services. Not suitable for a standing test environment, but useful for initial proof-of-concept.

Azure docs →

Academic / learning

Azure for Students

$100 credit per year, no credit card required. For verified students and educators only. Not applicable to commercial test environments.

Azure docs →

Special programmes

Azure Sponsorship

Credits provided through Microsoft programmes such as BizSpark, Imagine, or event sponsorships. Time-limited and usage-capped.

Azure docs →

Partner benefit

Microsoft Partner Network (MPN)

Monthly Azure credits for Microsoft partners (Action Pack, Silver, Gold, or Solutions Partner). Useful for internal demos and development.

Azure docs →

These are the pricing mechanisms applied to individual Azure resources. They determine how much each service actually costs — and are the primary lever for cost optimisation on this estate.

Current baseline

Pay-As-You-Go

Default hourly/monthly billing at list price. No commitment, full flexibility, but the most expensive option for predictable workloads.

Azure pricing overview →

Up to 72% saving

Reserved Instances (1-year)

Commit to a specific resource size for 12 months. Applies to VMs, SQL MI, Cosmos DB, and others. Typically 20-40% cheaper than PAYG.

Azure docs →

Up to 72% saving

Reserved Instances (3-year)

Same as 1-year but with a 36-month commitment. Provides the deepest discount but requires confidence the environment will run long-term.

Azure docs →

Flexible commitment

Savings Plan for Compute

Commit to a fixed hourly spend (e.g., £0.50/hr) across any VM family or region. More flexible than Reserved Instances but slightly less discount.

Azure docs →

Strong fit — check eligibility

Azure Hybrid Benefit

Use existing Windows Server or SQL Server licences (with Software Assurance) to reduce Azure VM and SQL MI costs. Can save up to 85% combined with Reserved Instances.

Windows Server docs → · SQL docs →

Non-production discount

Dev/Test Pricing

Reduced rates when running under a Dev/Test subscription. Removes Windows licence surcharge on VMs and gives lower rates on selected PaaS services.

Azure docs →

Not recommended for this estate

Spot VMs

Deeply discounted spare capacity (up to 90% off), but Azure can evict with 30 seconds notice. Unsuitable for the Pearl web, worker, and restore roles.

Azure docs →

Usage-based services

Consumption / Serverless

Pay only for what is consumed — applies to Azure Functions, Logic Apps, Event Grid, and similar. Not directly relevant to the VM-based Pearl architecture.

Azure docs →

Volume ingestion

Commitment Tiers

Fixed daily commitment for services like Log Analytics. Provides per-GB discount when ingestion volume is predictable. Available at 100, 200, 300, 400, 500 GB/day tiers.

Azure docs →

Large-scale storage

Reserved Capacity (Storage)

Pre-purchase storage capacity at a discount. Only cost-effective at very large scale (100+ TB). Not applicable to the current ~8 GB Blob Storage usage.

Azure docs →

Resource-by-Resource Billing Matrix

How each resource in the current £2,387.62/month estimate behaves when you try to stop, pause, or reduce it.

Azure Resource	Current Cost	Applicable Purchase Models	Can It Be Stopped?
VM1, VM2, VM3 Compute runtime	£517.78	PAYG · Reserved Instance · Savings Plan · Azure Hybrid Benefit · Dev/Test pricing	✅ Yes. Stop + deallocate removes compute charges. Auto-shutdown schedulable. Disks, backup retention, monitoring, and Defender still bill.
Managed Disks Attached to VMs	Incl. in VM baseline	PAYG on provisioned disk size	⚠️ No pause. Disk charges continue while the VM is deallocated. Must delete or downgrade to save.
Azure SQL Managed Instance General Purpose, 4 vCores	£660.15	PAYG · Reserved Instance · Azure Hybrid Benefit for SQL	❌ No practical stop-start. Always-on in this design. Savings from reservation, rightsizing, or licence benefit.
Azure Firewall Standard tier	£686.58	PAYG (deployment hours + data processed)	❌ No meaningful pause. Bills while deployed. Delete for long mothball periods only.
Azure Bastion Standard tier	£159.29	PAYG (deployment hours)	❌ No meaningful pause. Bills while deployed. Same approach as Firewall.
Blob Storage LRS, Hot tier	£8.15	PAYG · Lifecycle tiering (Hot/Cool/Archive)	⚠️ No pause concept. Use lifecycle policies and Cool/Archive tiers for cost reduction.
Monitor + Log Analytics	£140.23	PAYG ingestion · Commitment tiers	⚠️ Tuneable. Reduce diagnostics, shorten retention, disable verbose collection.
Backup + Update Manager	£107.90	PAYG per protected instance	⚠️ Partial. Stop new backups, but retained recovery points still incur storage cost.
Support Plan	£75.24	Monthly subscription	⚠️ Downgrade only. Can be cancelled or moved to Basic (free) tier.
Security: Key Vault + Defender	£32.30	PAYG per operation / per protected resource	⚠️ Disableable. Defender can be turned off, but not recommended for security posture.

✅ Best Immediate Savings

Enforce business-hours auto-shutdown for VM1 and VM2. Keep VM3 off by default — run it only when builds, restores, or masking jobs are needed. This targets the part of the estate that genuinely can be paused.

🔒 Costs That Stay Fixed

SQL MI (£660), Firewall (£687), and Bastion (£159) total £1,506/month and do not pause. They are the compatibility and isolation controls this environment requires.

📉 Best Long-Term Discounts

Azure Hybrid Benefit + 1-year Reserved Instances for SQL MI and VMs are the strongest levers. Dev/Test subscription pricing is also a strong fit if Visual Studio licences exist.

Cost-Cutting Options: What Could Be Removed?

This focuses on the resources already in the current estimate. It distinguishes where cost can realistically be cut, what risk that introduces, and how that risk could be controlled if the client chooses to reduce scope.

Resource / Control	Potential Saving	Risk If Removed or Reduced	Mitigation
VM3 runtime hours Build / restore VM	Medium	Low risk if managed properly. Build, restore, and masking jobs will not be instantly available outside planned windows.	Keep VM3 powered off by default, publish a startup runbook, and schedule restore/build windows ahead of testing cycles.
Azure Monitor / Log Analytics volume Diagnostics + retention	Low to Medium	Low operational risk. Reduced visibility into faults, slower troubleshooting, and less historical evidence.	Keep core health, security, and deployment logs; reduce verbose diagnostics only; shorten retention for non-critical data first.
Backup retention depth Recovery point storage	Low to Medium	Moderate resilience risk. Fewer restore points mean less ability to recover from older defects or operator mistakes.	Agree a minimum recovery objective, retain enough points for weekly rollback needs, and document the reduced recovery window explicitly.
Support plan downgrade Standard to Basic	~£75/month	Low technical risk, moderate support risk. Slower or no Microsoft escalation path if an Azure platform issue occurs.	Only downgrade if the client is comfortable relying on internal support and partner support during test-only operations.
Defender coverage scope Security plan reduction	Low	Moderate security risk. Reduced threat detection, weaker posture reporting, and less evidence for security review.	Only reduce on clearly non-sensitive components, and keep compensating controls such as NSGs, Key Vault, audit logs, and patching in place.
Azure Bastion Admin access layer	~£159/month	High access risk. Removing Bastion weakens the secure remote-access model and pushes the team toward less controlled admin access patterns.	Only remove if the environment is shut down for an extended period. Recreate Bastion before resuming active engineering or testing.
Azure Firewall Outbound control + segmentation	~£687/month	High security and compliance risk. Loss of central egress filtering and reduced confidence that test traffic cannot reach unintended external targets.	Not recommended while the environment is active. If ever removed, replace with an agreed alternative control set and re-run the security design review.
Azure SQL Managed Instance Core database platform	~£660/month	High platform risk. Removing it would remove the core database layer and effectively take the environment out of service.	Do not remove if the test environment is expected to remain usable. Cost reduction here should come from rightsizing, reservation, or licence benefit instead.
Core VM estate (VM1 / VM2) Web + worker runtime	High, but disruptive	High service risk. Removing or permanently stopping core VMs breaks the application runtime and prevents meaningful test execution.	Do not remove during active project use. Restrict savings to scheduled shutdown outside business hours rather than removal.

Safest Cuts

VM3 runtime scheduling, log-volume tuning, support-plan downgrade, and carefully reduced backup retention are the lowest-risk savings available inside the current design.

Conditional Cuts

Defender scope reduction and Bastion removal are possible only if the client explicitly accepts weaker security or access posture for a period of time.

Not Recommended To Remove

SQL MI, Azure Firewall, and the core VM runtime are foundational. Removing them is less a cost optimisation and more a decision to partially or fully suspend the environment.

📚 Azure Documentation References

All statements in this section are backed by official Microsoft documentation:

VM billing states — learn.microsoft.com/…/virtual-machines/states-billing

Reserved Instances — learn.microsoft.com/…/reservations/save-compute-costs-reservations

Savings Plan for Compute — learn.microsoft.com/…/savings-plan/savings-plan-compute-overview

Azure Hybrid Benefit (Windows) — learn.microsoft.com/…/azure-hybrid-benefit

Azure Hybrid Benefit (SQL) — learn.microsoft.com/…/azure-sql/azure-hybrid-benefit

SQL MI pricing model — learn.microsoft.com/…/managed-instance/pricing-model

Spot VMs — learn.microsoft.com/…/virtual-machines/spot-vms

Azure Firewall pricing — azure.microsoft.com/…/pricing/details/azure-firewall

Azure Bastion pricing — azure.microsoft.com/…/pricing/details/azure-bastion

Dev/Test pricing — azure.microsoft.com/…/pricing/dev-test

Microsoft Customer Agreement — learn.microsoft.com/…/understand/mca-overview

Enterprise Agreement — learn.microsoft.com/…/ea-portal-enrollment-invoices

17

Database Backup & Restore Strategy

Dual-mode data pipeline: Testing Mode for regression and Debug Mode for real-world investigation.

Source: RFP Section 5.2

CLEAN & DETERMINISTIC

Testing Mode — Regression Testing

The database is wiped and reloaded with a known synthetic seed dataset, ensuring repeatable, deterministic regression test results every time.

DROP all user databases on Test SQL MI
CREATE fresh databases from schema-only scripts (versioned in Git)
EXECUTE seed data scripts — synthetic companies, operators, callers, DDIs, messages
CLEAR Memcached + Solr indexes
REBUILD Solr indexes from seed data
VALIDATE + notify — "Testing Mode ready"

✅ Key Characteristics

Deterministic — Same data every time for reliable assertions
Fast — Schema + seed scripts in minutes (no large .bak downloads)
Zero PII risk — All data is synthetic, no masking needed
Versioned — Seed scripts in Git alongside application code

Debug Mode — Anonymised Production Data

Loads anonymised production data to investigate real-world issues that cannot be reproduced with synthetic data. Weekly automated pipeline.

sequenceDiagram participant ProdMI as Prod SQL MI participant Blob as Azure Blob Storage participant VM3 as VM3 (Restore Tool) participant TestMI as Test SQL MI Note over ProdMI: Weekly backup (Sat 02:00 UTC) ProdMI->>Blob: BACKUP DATABASE TO URL
17 databases → .bak files Blob->>Blob: Encrypted at rest (SSE)
Retention: 4 weekly copies Note over VM3: Saturday 06:00 UTC VM3->>Blob: Download .bak files via SAS token VM3->>TestMI: RESTORE DATABASE
(all 17 databases sequentially) Note over VM3: Post-restore masking VM3->>TestMI: Execute T-SQL masking scripts
(PII anonymisation per database) VM3->>TestMI: Repoint ConfigStrings to test endpoints VM3->>TestMI: Validate row counts + key queries VM3->>VM3: Log + email notification

✅ Key Characteristics

Real-world data shape — Actual distributions, edge cases
Anonymised — All PII masked via T-SQL scripts
Weekly refresh — Automated Saturday pipeline
Approval-gated — Requires team lead to switch modes

Restore Tool — Mode Selection

A single CLI tool supports both modes: --mode testing (clean + seed) or --mode debug (backup + restore + mask). Both modes repoint ConfigStrings to test endpoints and validate data integrity before marking the environment ready.

PearlData

Callers, CallerHistory

Name, Phone, Email, Address

Faker-generated UK data

PearlUsers

Users, CompanyContacts

Name, Email, Phone, Password

Hashed/randomised

PearlBilling

Invoices, Payments

CustomerName, BankDetails, CardRefs

Synthetic replacement

Messages / SMS

MessageContent, SMSSpoolOutgoing

CallerName, Phone, MessageText, Mobile

Faker replacement + anonymised

PearlLog

Various log tables

PII embedded in log payloads

Truncated/replaced

✅ No Masking Required (config/reference data only)

PearlOperationsPearlSwitchPearlAnalysisPearlSearch LookupDBsASPNETChecking

18

Integration Safety Model — All 22 Services

Every external integration safely sandboxed, stubbed, or disabled. Full inventory from third-party dependencies index.

flowchart LR subgraph TestVM1["Test VM1 (Web Tier)"] Pearl["Pearl Test Instance"] end subgraph Sandbox["Sandbox Mode ✅ (5)"] G["Genesys Cloud CX"] ST["Stripe (sk_test)"] GCS["GoCardless"] SP["SagePay Simulator"] XR["Xero"] end subgraph Disabled["Disabled ⛔ (9)"] EL["ElevenLabs"] SMS1["MediaBurst"] SMS2["MessageBird"] SMS3["ClickSend"] BQ["BigQuery"] ZD["Zoho Desk/CRM"] TW2["Twitter (X)"] TP["Trustpilot"] SX["Sinerix"] end subgraph Local["Local / Isolated (5)"] Solr["Solr (localhost)"] MC["Memcached (localhost)"] TM["Totem (VM2)"] AI["Azure OpenAI (test)"] TW["Twilio (test creds)"] end subgraph Storage["Separate Resource (1)"] S3["S3 Test Bucket"] end Pearl --> G Pearl --> ST Pearl --> GCS Pearl --> SP Pearl --> XR Pearl -.->|"disabled"| EL Pearl -.->|"disabled"| SMS1 Pearl -.->|"disabled"| SMS2 Pearl -.->|"disabled"| SMS3 Pearl -.->|"disabled"| BQ Pearl -.->|"disabled"| ZD Pearl -.->|"disabled"| TW2 Pearl -.->|"disabled"| TP Pearl -.->|"disabled"| SX Pearl --> Solr Pearl --> MC Pearl --> TM Pearl --> AI Pearl --> TW Pearl --> S3

Complete Integration Inventory

Source: 53-third-party-dependencies-index.md + Phase 1 codebase analysis

☎️ Telephony & Voice (3)

Genesys Cloud CX → Sandbox org Risk: Test triggers live call routing, screen pops hit production operators

Twilio → Test credentials Risk: Real voice calls placed, real SMS sent, charges incurred

ElevenLabs → Disabled Risk: AI voice calls to real numbers, TTS API credit consumption

💳 Payments (3)

Stripe → Test mode (sk_test) Risk: Real credit card charges, production webhooks contaminated

GoCardless → Sandbox Risk: Real DD mandates created, customer bank accounts debited

SagePay/Opayo → Simulator Risk: Real card payments via Answer.co.uk brand

📨 Communications (4)

Mailgun → Sandbox domain Risk: Real emails to customers — notifications, invoices, alerts

MediaBurst (Route 21) → Disabled Risk: Real SMS to customer mobiles

MessageBird (Route 22) → Disabled Risk: SMS failover fires to real numbers

ClickSend (Route 23) → Disabled Risk: Third SMS failover route sends to real numbers

🧠 AI & Analytics (3)

Azure OpenAI → Isolated instance Risk: QC scoring pollutes production analytics, API credits consumed

Genesys Speech Analytics → Sandbox org Risk: Results written to production Genesys, corrupting real QC data

BigQuery → Disabled Risk: Test data exported to production dataset, corrupts analytics

🔧 Business Tools (4)

Xero → Sandbox Risk: Invoices in production Xero, reconciliation corrupted

Zoho Desk/CRM → Disabled Risk: Test tickets in production Zoho, CRM stats corrupted

Twitter (X) → Disabled Risk: Test posts to production company accounts

Trustpilot → Disabled Risk: Review invitations sent to real customers

🗄️ Storage & Infrastructure (5)

Amazon S3 → Separate bucket Risk: Test writes to prod S3, backups/recordings overwritten

Apache Solr → Local (localhost:8983) Risk: Test indexing corrupts production search indexes

Memcached → Local (localhost:11211) Risk: Test cache writes corrupt production cache state

Totem → Test instance (VM2) Risk: Screen pops sent to production operator browsers

Sinerix → Disabled Risk: E-signature requests to real sessions

Integration Safety Summary0Sandbox
0Disabled
0Local / Isolated
0Separate Resource
0Deprecated (No Action)

19

CI/CD Pipeline & Infrastructure as Code

GitHub Actions with self-hosted runner, Terraform for multi-environment spawning, approval gates and rollback.

Source: RFP Section 5.4 & 5.5

Terraform — Multi-Environment Spawning

Per the RFP requirement, Terraform is the primary IaC tool for its superior multi-environment capabilities. A single terraform apply with variable overrides can provision N parallel test environments — e.g., one per feature branch or test cycle.

terraform/
├── modules/
│   ├── networking/      # Hub-Spoke VNets, NSGs, Bastion, Firewall
│   ├── compute/         # VM1 (Web), VM2 (Worker), VM3 (Build)
│   ├── database/        # SQL MI (General Purpose)
│   ├── security/        # Key Vault, RBAC, Managed Identities
│   ├── monitoring/      # Log Analytics, Azure Monitor, Audit
│   └── storage/         # Blob Storage, SAS policies
├── environments/
│   ├── test-01/         # Primary test environment
│   ├── test-02/         # Feature branch environment
│   └── test-N/          # N-th on-demand environment
├── main.tf              # Root module composition
├── variables.tf         # Parameterised config
└── backend.tf           # Azure Blob remote state

flowchart TB subgraph DevSub["Developer"] D1["Push / Pull Request"] end subgraph Terraform["Terraform IaC"] TF1["terraform plan"] TF2["terraform apply"] TF3["Provision N environments"] end subgraph GHActions["GitHub Actions Workflows"] subgraph BuildWF["build.yml"] B1["Checkout code"] B2["NuGet restore"] B3["MSBuild all 7 components"] B4["aspnet_compiler precompile"] B5["Package artefacts"] end subgraph DeployWF["deploy-test.yml (manual)"] D2["🔒 Approval gate"] D3["Select target env (test-01..N)"] D4["Stop IIS + services"] D5["Backup → _rollback/"] D6["Robocopy artefacts → VMs"] D7["Start IIS + restart"] D8["Health check"] end subgraph DBMode["db-mode-switch.yml"] DB1["Select: testing / debug"] DB2["Execute restore tool"] DB3["Validate + notify"] end subgraph RollbackWF["Rollback"] R1["Restore _rollback/"] R2["Restart + notify"] end end D1 --> B1 B1 --> B2 --> B3 --> B4 --> B5 B5 --> D2 D2 --> D3 --> D4 --> D5 --> D6 --> D7 --> D8 D8 -->|"Failure"| R1 R1 --> R2 TF1 --> TF2 --> TF3

Deployment Strategy

Web Apps

Robocopy artefacts to IIS physical paths — mirrors current production method

Rollback: Previous build backed up to _rollback/

Workers

Stop Windows Service → copy binaries → restart service

Rollback: Same backup/restore approach

Database Modes

Testing mode: clean + seed scripts. Debug mode: backup/restore + mask. Switchable via pipeline.

Rollback: Re-run mode switch to restore state

Infrastructure

Terraform plan → apply with approval. Entire environment from code. Multi-env via workspaces.

Rollback: terraform destroy + terraform apply

20

Security Controls — ISO 27001 Aligned

18 controls covering network, identity, data, and audit layers.

Source: RFP Section 5.6

Required

Subscription Isolation

Separate Azure subscription for test environment

Required

No Public IPs

All VM access via Azure Bastion only — no exposed endpoints

Required

NSG Micro-Segmentation

Per-subnet NSGs with least-privilege port rules

Required

Egress Filtering

Azure Firewall with allowlisted outbound only

Required

No Prod Connectivity

No VNet peering to production subscription — air gap

Required

Secrets in Key Vault

All connection strings and API keys in Key Vault

Required

Managed Identities

System-assigned MI for Key Vault & Blob access

Required

RBAC Least Privilege

Custom role definitions per persona (see RBAC section)

Enhanced

PIM for Admin

Just-in-time elevation for dangerous operations

Required

Disk Encryption

ADE (BitLocker) with Customer-Managed Keys

Required

SQL MI TLS

Encrypt=True, TrustServerCertificate=False

Required

Audit Logging

Dedicated PearlAudit database + Log Analytics

Enhanced

Auto-Shutdown

VMs off 19:00-07:00 + weekends (business hours)

Required

Resource Tagging

Environment=Test, Project=Pearl enforced tags

Enhanced

Azure Policy

Enforce: no public IPs, required tags, region lock

Required

Data Masking

PII anonymisation executed on every Debug Mode restore

Required

GDPR Lifecycle

Documented retention, purpose limitation, access controls

Required

Key Rotation

90-day secret rotation, 365-day CMK rotation via Key Vault

20b

Audit Logging & Trail System

Dedicated PearlAudit database with separate tables for every category of change — complete, tamper-resistant audit trail.

Source: RFP Section 9.2 Option D

flowchart TB subgraph Apps["Application Layer"] PA["pearl-azure"] PW["pearl-webservices"] WK["Workers"] end subgraph Collector["Audit Collector"] AC["Structured Logging API"] end subgraph AuditDB["PearlAudit Database (Dedicated)"] T1["Audit_ChangeLog
INSERT / UPDATE / DELETE"] T2["Audit_AccessLog
Login, page views, API calls"] T3["Audit_ConfigChanges
ConfigString modifications"] T4["Audit_SecurityEvents
RBAC changes, auth failures"] T5["Audit_IntegrationLog
External API calls (sanitised)"] T6["Audit_DeploymentLog
CI/CD events + artefact hashes"] T7["Audit_DataAccessLog
PII access tracking"] T8["Audit_SystemEvents
Infrastructure changes"] end subgraph Export["Long-Term Storage"] LA["Azure Log Analytics
90 days hot / 365 days archive"] end PA --> AC PW --> AC WK --> AC AC --> T1 AC --> T2 AC --> T3 AC --> T4 AC --> T5 AC --> T6 AC --> T7 AC --> T8 T1 --> LA T4 --> LA T6 --> LA

📝

SQL Triggers

State-changing tables from the audit scope below write INSERT/UPDATE/DELETE events to Audit_ChangeLog with before/after JSON values.

🌐

Application Middleware

Global.asax captures page access and API call events with user identity, IP address, and correlation IDs

🔗

Integration Wrapper

All external API calls logged with PII-sanitised payloads, response codes, and timing data

🚀

CI/CD Hooks

GitHub Actions posts deployment events via secure webhook — artefact hashes, approver identity, timestamps

Which Tables Need Direct Audit Coverage?

Not all 489 tables receive synchronous SQL triggers. The direct audit scope focuses on the 28 high-risk tables that can change configuration, permissions, customer data, financial records, queue execution, or regulated communication content. The rest remain covered through access logs, integration logs, deployment logs, and system events unless later discovery promotes them into the trigger scope.

Source DB	Tables in direct audit scope	Primary audit route	Why in scope
PearlOperations	`ConfigStrings`	SQL trigger → Audit_ConfigChanges	Controls API keys, endpoints, feature flags, and runtime behaviour.
PearlUsers	`Users`, `Permissions`, `LoginLogs`, `Companies`, `CompanyInfo`, `CompanyContacts`, `Rotas`, `Shifts`	SQL trigger → Audit_ChangeLog / Audit_SecurityEvents	Identity, tenancy, rota, and permission changes determine who can access the system and how escalations are routed.
PearlData	`Messages`, `Callers`, `CallerHistory`, `ScreenInits`, `PhysicalDDIs`	SQL trigger → Audit_ChangeLog / Audit_DataAccessLog	Core message-taking tables hold caller PII, operator-entered content, screen state, and DDI routing context.
PearlQueues	`DispatchQueue`, `Process_JobQueue`, `Process_MachineStates`, `JobSchedules`	SQL trigger → Audit_ChangeLog	These tables control background execution, dispatch timing, and worker coordination.
PearlBilling	`Invoices`, `BillItems`, `Payments`	SQL trigger → Audit_ChangeLog / Audit_IntegrationLog	Financial correctness, payment actions, and accounting exports depend on these records.
Messages	`MessageContent`	SQL trigger → Audit_DataAccessLog	Legacy message body storage contains customer communication content.
SMSBroadcast	`SMSSpoolOutgoing`	SQL trigger → Audit_IntegrationLog	Outbound customer communications need clear send intent and change history.
PearlLog	`PageAccessLogs`, `APILogs`, `ProcessLogs`, `SecurityExceptions`	Async logging → Audit_AccessLog / Audit_SecurityEvents	Evidence of user journeys, API misuse, worker faults, and suspicious activity.
PearlSwitch	`CallRecordings`	SQL trigger → Audit_DataAccessLog	Call recording metadata is sensitive operational evidence and needs attributable access history.

20c

RBAC Hardening Strategy

Custom Azure role definitions with least-privilege access and Privileged Identity Management.

Source: RFP Section 5.1

Role	Scope	Permissions	Assigned To
Pearl-TestEnv-Admin	Subscription	Full Contributor + KV admin + SQL MI admin	Infra team lead (PIM-gated)
Pearl-TestEnv-Developer	Resource Group	VM Contributor + KV Secret Reader + SQL MI Read/Write	Development team
Pearl-TestEnv-QA	Resource Group	VM Reader + SQL MI Data Reader (read-only)	QA/testing team
Pearl-TestEnv-Deployer	Resource Group	VM Contributor (start/stop) + Blob Reader + KV Secret Reader	GitHub Actions SP
Pearl-TestEnv-DBA	SQL MI	SQL MI Contributor + KV Secret Reader	Database administrators
Pearl-TestEnv-Auditor	Log Analytics	Log Analytics Reader + Audit DB read-only	Compliance / audit

Privileged Identity Management (PIM)

🔓

Admin Elevation

4-hour max duration. Requires tech lead approval. For infrastructure changes and Key Vault management.

🗄️

SQL MI Direct Access

2-hour max duration. Requires DBA lead approval. For emergency database operations only.

🔍

Debug Mode Activation

8-hour max duration. Requires DPO approval. For loading unmasked production data.

20d

Key Vault & Encryption Management

Customer-Managed Keys for all encryption layers with automated rotation policies.

Source: RFP Section 9.2 Option D

💿

VM Disk Encryption

ADE (BitLocker) with Customer-Managed Key (CMK) stored in Key Vault. All VM disks encrypted at rest.

🗄️

SQL MI TDE

Transparent Data Encryption with CMK. SQL MI TDE protector key rotated annually via Key Vault policy.

📦

Blob Storage SSE

Server-Side Encryption with CMK. Backup .bak files encrypted at rest with customer-controlled keys.

🔑

Secret Rotation

90-day rotation for API keys and service principal secrets. 365-day rotation for encryption CMKs. 14-day expiry alerts.

Rotation Schedule

Key / Secret Type	Rotation Period	Method	Alert Threshold
Integration API keys	90 days	Manual rotate + KV version	14 days before expiry
Service principal secrets	90 days	Auto-rotate via KV policy	14 days before expiry
SQL MI connection string	90 days	Auto-rotate via Azure Function	14 days before expiry
Disk encryption CMK	365 days	Auto-rotate via KV policy	30 days before expiry
SQL MI TDE protector	365 days	Auto-rotate via KV policy	30 days before expiry
Blob encryption CMK	365 days	Auto-rotate via KV policy	30 days before expiry

20e

Delivery Plan

RFP-aligned weekly plan targeting MVTE readiness by mid-May 2026.

Source: RFP Section 7

W1

Week 1 — Discovery & Validation

Current-state validation. Full integration inventory (all 22 services). Data strategy selection (dual-mode). Terraform module design.

✅ Deliverables: Discovery report + Integration safety matrix + Agreed design

W2-3

Weeks 2–3 — Environment Build & Isolation

Azure infrastructure via Terraform. Hub-Spoke VNets, VMs, SQL MI, Bastion, Firewall, Key Vault. Initial app deployment via CI/CD. Integration sandbox/stub configuration.

✅ Deliverables: Running test environment + Initial deployment validated

W4

Week 4 — Data Pipeline & Security

Dual-mode data pipeline (testing + debug modes). Weekly refresh automation. RBAC hardening and audit logging system. Runbooks and SOPs.

✅ Deliverables: Data pipeline operational + Runbooks delivered + Security controls verified

W5

Week 5 — Stabilisation & Handover

Stabilisation and defect fixes. Multi-environment spawning verification. Acceptance evidence pack. Handover session and recorded walkthrough.

✅ Deliverables: Acceptance criteria met + Handover complete

21

Implementation Roadmap

425 hours / 53 working days on the selected 3-VM + 1 DB design across 5 weeks. Target: mid-May 2026.

Source: RFP Section 7, 9.2

Discovery

Current-state validation

40h (5 days) ✅ Completed

RFP Option A — MVTE Build

3-VM + 1 DB foundation

105h (13 days)

18h — Hub-spoke network, subnets, Bastion, Firewall, NSGs
22h — VM1 web tier, IIS, Memcached, and Solr baseline
14h — VM2 worker tier and Windows runtime hosting
16h — VM3 build runner, MSBuild, and restore tooling
15h — SQL MI provisioning and private connectivity
20h — CI/CD workflows, rollback path, and smoke test

RFP Option B — Data Pipeline

Restore, mask, and refresh automation

96h (12 days)

22h — Blob intake and restore orchestration
20h — Restore sequencing for 17 databases
28h — PII masking for the six high-risk databases
12h — Testing-mode seed data and debug-mode controls
14h — Weekly refresh automation, validation, and reporting

RFP Option C — Multi-env

Cloneable environment pattern

32h (4 days)

12h — Parameterise IaC for extra 3-VM + 1 DB stacks
6h — Naming, address-space, and DNS conventions
8h — Environment-specific secrets and runner targeting
6h — Provisioning verification and demo of an extra environment

RFP Option D — Security

Hardening and audit evidence

56h (7 days)

14h — Azure RBAC, PIM roles, and admin segregation
12h — Key Vault secret governance and rotation policy
18h — Audit logging database, triggers, and application capture
12h — Monitoring evidence, policy checks, and control validation

Validation

Testing & handover

56h (7 days)

End-to-end deploy + restore test
Smoke test key user journeys
Integration safety verification
Documentation + recorded handover

Buffer

Stabilisation

40h (5 days)

🎯 MVTE Ready — Mid-May 2026 — 425 hours total

22

Hours Estimate

Three-perspective comparison plus the detailed RFP Option A-D breakdown on the selected 3-VM + 1 DB design.

Source: RFP Section 9.2 — Mandatory Table Format

RFP 9.2 — Mandatory Hours Breakdown

Work Package (RFP 9.2)	Original Est.	RFP-Adjusted	Final Proposed
Discovery	24-32h	32-40h	40h
RFP Option A — MVTE Build	64-88h	88-112h	105h
RFP Option B — Data Pipeline	64-80h	80-104h	96h
RFP Option C — Multi-env Capability	0h (not estimated)	24-32h	32h
RFP Option D — Enhanced Security	16-24h	48-64h	56h
Validation & Handover	32-40h	48-64h	56h
Buffer	40h	40h	40h
TOTAL	240-304h	360-456h	425h (53 days)

Estimate Comparison

Original Team (phases.txt) 240-304h

RFP-Adjusted (+ Options C, D) 360-456h

Final Proposed 425h

Why the increase? The uplift comes from standing up the selected 3-VM + 1 DB foundation, adding the restore-and-mask data pipeline, and then layering on multi-environment capability plus the enhanced security controls required by the RFP.

Architecture option totals below include a 20% buffer and are rounded to the nearest 5 hours.

Architecture option totals for topology comparison:

Option A — 3-VM

425h/53 days

Baseline RFP-compliant path

Option B — 2-VM + Functions

665h/83 days

Includes worker refactor to Functions

Option C — Single VM

390h/49 days

Lower infra effort, less isolation

Option D — 4-VM Full

460h/58 days

Extra tiering and deployment complexity

Detailed RFP Option A-D Breakdown on the Selected 3-VM + 1 DB Design

These are work packages delivered on top of the chosen architecture, not alternative topologies. RFP Option A carries the 105h baseline because it creates the private landing zone, provisions VM1, VM2, VM3, wires the single SQL Managed Instance, and proves the first repeatable deployment path.

Sub-activity	Hours	Why it is needed
Hub-spoke network, subnets, NSGs, Bastion, and firewall rules	18h	Create the private landing zone before any VM or SQL resource is attached.
VM1 web tier provisioning and IIS baseline	22h	VM1 hosts `pearl-azure`, `pearl-webservices`, `utility-server`, Memcached, and Solr together.
VM2 worker tier provisioning and service hosting	14h	Keep the Windows workers separate from browser traffic and give them the required runtime dependencies.
VM3 build tier, GitHub runner, and toolchain setup	16h	Provide a dedicated build and restore server for deploys, restores, and masking workflows.
Azure SQL MI GP provisioning, private access, and base configuration	15h	Create the single managed database tier that must hold all 17 masked databases.
CI/CD workflows, initial deployment, rollback, and smoke test	20h	Make the environment usable by proving a repeatable deployment path into VM1 and VM2.
Total	105h	13 days

Sub-activity	Hours	Why it is needed
Backup intake from Blob and restore orchestration	22h	Use production backup files as the only approved path into the test SQL MI.
Restore sequencing for 17 databases and dependency handling	20h	Script restore order, logins, jobs, and cross-database checks inside the single managed instance.
PII masking for the six high-risk databases	28h	Make the shared test data safe before any non-production access is allowed.
Testing-mode seed data and debug-mode controls	12h	Support both repeatable regression data and a governed debugging path on the same estate.
Weekly refresh automation, validation checks, and reporting	14h	Turn refresh into a repeatable operational run rather than a one-off manual restore.
Total	96h	12 days

Sub-activity	Hours	Why it is needed
Parameterise IaC for extra 3-VM + 1 DB stacks	12h	Clone the selected pattern instead of inventing a new topology for each extra environment.
Naming, address-space, and DNS conventions per environment	6h	Keep each environment predictable and isolated when more than one exists.
Environment-specific secrets, runner targeting, and configuration selection	8h	Ensure the build tier can deploy to the right VM and SQL set without leaking credentials.
Provisioning verification and demo of an extra environment	6h	Prove that one additional environment can be created and validated from the template.
Total	32h	4 days

Sub-activity	Hours	Why it is needed
Azure RBAC, PIM roles, and admin segregation	14h	Separate infrastructure, database, and deployment access inside the 3-VM estate.
Key Vault secret governance and rotation policy	12h	Formalise rotation, access scoping, and recovery procedures for the secrets already used by the design.
Audit logging database, triggers, and application capture	18h	Collect evidence of privileged activity, data changes, and release actions across web, worker, and data tiers.
Monitoring evidence, policy checks, and control validation	12h	Provide proof points and alerting so the controls are supportable and reviewable.
Total	56h	7 days

23

Risk Assessment

Key risks identified during Phase 1 discovery. Click to expand.

R1

Hardcoded IPs in Source Code

High Likelihood Medium Impact

▼

Risk: AI spooler uses http://10.0.0.12, reporting uses 10.0.1.44, queue-processor references pearlsqlmi2, totem reads from C:\totemscripts\*.txt

Mitigation: Audit all source for hardcoded IPs/hostnames. Create web.config overrides or hosts file entries on test VMs. Provision required filesystem paths.

R2

Data Masking Incompleteness

Medium Likelihood High Impact

▼

Risk: PII may exist in unexpected columns/tables across 489+ tables.

Mitigation: Data discovery audit before go-live. Deny-by-default approach — mask all string columns in PII-sensitive tables. Review PearlLog payloads.

R3

Integration Credential Leakage

Medium Likelihood High Impact

▼

Risk: Test/sandbox API keys (Stripe, Genesys, etc.) must not leak into production-visible systems.

Mitigation: All secrets stored in Azure Key Vault. No secrets in code, config files on disk, or repository. Managed Identity access only.

R4

SQL MI Backup Size Unknown

Medium Likelihood Medium Impact

▼

Risk: Actual DB sizes not yet measured — restore pipeline duration and storage costs are estimates.

Mitigation: Measure actual sizes during Phase 1 completion. Consider trimming PearlLog/PearlArchive for test. Adjust Blob Storage tier if needed.

R5

Telerik Licence Coverage

Medium Likelihood Medium Impact

▼

Risk: RadControls for ASP.NET AJAX require a valid licence for the test environment.

Mitigation: Verify existing licence covers non-production use. Contact Telerik/Progress if additional licence needed.

R6

Totem .NET 3.5 Dependency

Low Likelihood Low Impact

▼

Risk: totem-2-cloud-nosql targets .NET Framework 3.5 — must be installed as a Windows feature.

Mitigation: Enable .NET 3.5 feature on VM2 via DISM during provisioning script. Also provision C:\totemscripts\ directory with script templates.

R10

Terraform State Corruption

Medium Likelihood High Impact

▼

Risk: Remote state stored in Azure Blob could be corrupted by concurrent operations or manual changes outside Terraform.

Mitigation: Enable state locking via Azure Blob lease. Blob versioning for state file recovery. terraform plan mandatory before apply in CI/CD.

R11

Audit Logging Performance Impact

Medium Likelihood Medium Impact

▼

Risk: SQL triggers on high-volume tables (e.g., PearlLog) could add latency to application transactions.

Mitigation: Triggers only on critical configuration/security tables. High-volume access logging via async middleware, not triggers. Performance test during Phase 4.

R12

Key Rotation Service Disruption

Low Likelihood High Impact

▼

Risk: Automatic key rotation could temporarily break SQL MI TDE or Blob encryption if applications cache old keys.

Mitigation: Key Vault rotation events trigger Azure Function that validates connectivity. Dual-key overlap period (old key retained for 24h). Rotation during maintenance window.

R13

Multi-Environment SQL MI Cost Scaling

Medium Likelihood Medium Impact

▼

Risk: Each additional Terraform workspace spawns a separate SQL MI instance (~£280/mo), which could exceed budget if many environments run simultaneously.

Mitigation: Auto-shutdown policy for non-primary environments. Spawn-on-demand, destroy-after-use workflow. Budget alerts at 80% threshold.

R14

Regression Seed Data Drift

Medium Likelihood Low Impact

▼

Risk: Testing Mode synthetic seed data may diverge from production schema changes over time, causing false test failures.

Mitigation: Seed data scripts version-controlled in repo. Schema diff check included in CI/CD pipeline. Quarterly seed data review process.

R15

Integration Sandbox Availability

Medium Likelihood Medium Impact

▼

Risk: Not all 22 integrations offer sandbox/test environments. Some vendors may charge extra or have limited test APIs.

Mitigation: Integration inventory categorises each service's test strategy. Disabled-by-default for services without sandbox. Confirm sandbox access during Week 1 discovery.

24

Summary & Next Steps

Phase 1 Discovery is complete. The architecture is designed, risks are mapped, and hours estimates follow the RFP 9.2 mandatory format. Once approved, the RFP response can be formalised.

Architecture 3-VM Split + SQL MI GP

Weighted Score 4.15 / 5

Estimated Hours 424h (53 days)

IaC Approach Terraform / Bicep

Monthly Run Cost ~£600-700/mo

Target Ready Date Mid-May 2026

Immediate Next Steps

1

Approve Phase 1 findings and architecture recommendation

2

Measure actual production DB sizes to finalise SQL MI storage tier

3

Confirm Azure subscription and Terraform state backend location

4

Verify Telerik licence covers non-production environment use

5

Confirm integration sandbox access with Genesys, Stripe, Chargebee, OpenAI

6

Define seed data scope for Testing Mode synthetic database content

7

Prioritise audit table implementation — which tables in Week 4 vs later

8

Confirm RBAC personas and PIM approval chains with client IT team

9

Formalise RFP response to client once Phase 1 is signed off

📚 Open Implementation Guide & Video References

Comprehensive step-by-step guide with 25+ embedded video tutorials for every phase

Infrastructure Engineering Team — March 2026