Test Environment Architecture Proposal
A comprehensive walkthrough of the Pearl platform, why a fully isolated test environment is critical, and the architecture we recommend — designed for MessageDirect's technical leadership.
Pearl is the enterprise-grade Telephone Answering Service (TAS) platform powering MessageDirect — a leading UK 24/7 virtual receptionist and contact centre business.
Enable call centre operators to answer phone calls on behalf of hundreds of subscribing client companies — capturing caller details, recording messages, triggering escalations via SMS/email/push, and providing clients with a self-service portal to view messages, manage rotas, and pay invoices.
Operators answer calls around the clock on behalf of client companies using dynamic answering scripts
Capture caller details, record messages, and escalate to the right contact via SMS, email, or push
108+ portal pages for clients to view messages, manage rotas, search callers, and handle billing
Automated billing lifecycle — usage tracking, invoice generation, card & DD payments, Xero accounting sync
AI chatbots, voice assistants (ElevenLabs/Twilio), speech analytics, and GPT-powered QC scoring
Operates MessageDirect, JAM, Answer.co.uk, Argyll, VirtuallyThere — all from one platform
A mature, organically-grown platform handling significant operational complexity.
The production layout powering 24/7 operations today.
The confirmed technology landscape powering every layer of Pearl.
This is not a Web Application project — the folder structure is the project. App_Code/ is auto-compiled at runtime via JIT. Source .vb and .aspx files are deployed directly to the server. Pre-compilation uses aspnet_compiler.exe for production.
8 distinct components, each with unique runtime characteristics.
| Component | Type | Framework | Role | Database Access |
|---|---|---|---|---|
| pearl-azure | ASP.NET Web Forms | .NET 4.8 | Main UI — operators, admins, client portal. 321+ exposed endpoints, 304+ admin tools, 108+ portal pages | All 17 databases |
| pearl-webservices-azure | ASP.NET Web App | .NET 4.8 | Background services — 278+ utility job pages, billing, stats, search indexing, AI QC endpoints, job scheduler | All 17 databases |
| utility-server | ASP.NET Web Forms (3 sub-apps) | .NET 4.8 | PCI-isolated payments portal (Stripe), Xero accounting sync, multi-brand reporting | PearlBilling, PearlData, PearlOperations |
| queue-processor-azure | WinForms (.exe) | .NET 4.8 | Job queue worker — claims rows from Process_JobQueue, executes HTTP calls with turn-based coordination | PearlQueues, PearlData, PearlBilling, PearlLog |
| system-checker | WinForms (.exe) | .NET 4.8 | Health monitoring — ICMP ping, TCP, HTTP probe, SQL query, disk space checks with transition-based alerts | Checking, PearlOperations, PearlData |
| ai-spooler | WinForms (.exe) | .NET 4.8.1 | AI QC spooler — 6-lane conveyor belt for speech analytics, round-robin distribution, 55s backoff on empty | Via HTTP to pearl-webservices |
| totem-2-cloud-nosql | Console App (Socket Server) | .NET 3.5 | Real-time browser notifications via long-poll. /register, /poll, /notify protocol. All state in-memory | None (in-memory only) |
| alpha-code-generator | WinForms (.exe) | .NET 4.8 | Batch generator for unique 9-char alphanumeric codes (base-31 encoding) | FreeAlphaCodes table |
The backbone of Pearl's logic — VB.NET classes auto-compiled at runtime.
~557 KB — Screen XML, message processing, DDI management, screen pop, real-time signalling
Core Engine~338 KB — Dynamic UI generation from XML config. Renders answering screens, data grids, forms
UI Renderer~220 KB — User CRUD, login, permissions, shift tracking, password management
Identity~153 KB — Client onboarding, company config, setup wizards
Clients~97 KB — Escalation rules, notification routing, on-call rota resolution
Dispatch~90 KB — Stripe, SagePay, GoCardless — gateway integrations & payment processing
BillingFive interconnected data flows that power the entire platform.
Azure SQL Managed Instance (Business Critical) — the data backbone.
Two SQL accounts: pearl (main apps — web & workers) and utility (utility-server & system-checker). Cross-database queries use 3-part naming. The ConfigStrings table in PearlOperations holds all connection strings, API keys, and feature flags — the central configuration hub.
Every external dependency Pearl relies on — from telephony to AI.
A complex, mission-critical platform with no isolated test environment. Every change is a risk to the 24/7 production service.
All development and testing happens against or very near production. Every deploy risks the live 24/7 service that operators and clients depend on around the clock.
A test against the wrong config could trigger real Stripe charges, send SMS to real customers, or disrupt live Genesys call routing. No safety net exists.
Cannot wipe and rebuild a clean test state. No way to validate that a change doesn't break any of the 489+ tables, 321+ endpoints, or 278+ background jobs.
Any test data access risks exposing real customer PII — names, phone numbers, billing details, message content. No masking or anonymisation layer exists.
.NET Framework 4.8 with WinForms workers, raw sockets (.NET 3.5 Totem), and hardcoded IPs — not cloud-native, cannot use modern PaaS services without refactoring.
Deployments are robocopy-based file syncs with no approval gates, no rollback mechanism, no audit trail. Manual and error-prone.
Every code change, database migration, or configuration update is deployed directly to production with no safety net. For a 24/7 contact centre handling calls for hundreds of client companies, this is an unacceptable operational risk that must be resolved.
MessageDirect issued an RFP to design and deliver a secure, fully isolated, repeatable test environment. The RFP can only be formally responded to once Phase 1 (Discovery) is finalised.
Develop, deploy, and validate changes without any risk to production
Wipe and rebuild the test environment and reload test data on demand
Complete isolation from production systems and data; private-only connectivity
Clean, anonymised dataset (no production PII) with weekly refresh procedure
Spin up multiple test environments per feature branch with minimal overhead
GBP £25,000 total cap (discovery + implementation)
Test environment ready by start to mid May 2026
ISO 27001 aligned + GDPR data controls
IaC, CI/CD, runbooks, SOPs, handover walkthrough
We evaluated 4 architecture options against Pearl's specific constraints.
5 criteria. 4 options. One clear winner.
The architecture is dictated by Pearl's actual runtime constraints.
WinForms workers (queue-processor, system-checker, ai-spooler) are architecturally bound to the Windows desktop runtime. Totem uses raw .NET 3.5 sockets. Converting to Azure Functions would be a major rewrite — explicitly out of the RFP scope.
3-VM layout replicates the actual production separation: web tier (IIS), internal services tier (workers), and a dedicated build server. Test results reliably predict production behaviour.
Deploy existing compiled binaries via robocopy — the current deployment method. No new toolchain, no recompilation model, no replatforming. Ship in weeks, not months.
Windows Server 2022 + IIS + Windows Services. The team already knows how to operate, troubleshoot, and deploy this stack. Zero learning curve.
The recommended estate is budgeted at about £2.39k/month based on the current Azure calculator export. That figure is higher than a simple lab because it includes the controls that make the environment credible: SQL Managed Instance, the 3-VM role split, secure access, outbound control, monitoring, backup, and security services. It is still the right shape of spend because it funds safe delivery and testing rather than forcing risky shortcuts.
3 VMs + SQL MI + networking maps cleanly to Bicep/ARM templates. Entire environment can be torn down and rebuilt from code — meeting the RFP's repeatability requirement.
The complete test environment design — fully isolated from production.
Portable diagram asset: target-architecture.png
The design uses a private hub-and-spoke Azure layout so administrator access, application workload, and outbound internet traffic are controlled separately. Azure Bastion is the only RDP entry point, Azure Firewall is the single outbound checkpoint, and the spoke VNet hosts the actual Pearl workload across VM1 for IIS and local cache/search, VM2 for background workers, and VM3 for build, restore, and masking automation.
The single test SQL Managed Instance stores all 17 masked databases used by the environment. Production never connects directly to the test estate; it only places weekly backup files into Blob Storage, and VM3 restores, masks, and validates those backups before VM1 and VM2 use them. Azure Key Vault keeps the environment secrets out of the servers, and every external dependency is redirected to sandboxes such as Genesys, Stripe, GoCardless, Mailgun, and the test S3 bucket so the platform behaves like production without touching live customer data, live payments, or live telephony.
Phase 1 is the foundation. The RFP response to the client cannot be submitted until Phase 1 is decided and finalised. This is where we confirm everything about the current system, size the target, and commit to the plan.
Confirm current IIS configuration, server roles, installed components, Windows features, and service accounts on pearl3, pearl4, pearlinternal
Measure actual database sizes for all 17 databases. Confirm Business Critical tier specifics. Estimate .bak sizes for backup/restore pipeline
Catalog all .NET Framework versions (.NET 4.8, 4.8.1, 3.5), Telerik licence requirements, NuGet packages, Bin/ DLLs, and third-party assemblies
Map all ConfigStrings entries, web.config connection strings, hardcoded IPs (10.0.0.12, 10.0.1.44), hostnames, and file paths that need repointing
Design hub-spoke topology, subnet addressing (10.1.x.x hub, 10.2.x.x spoke), NSG rules, Azure Bastion access, firewall egress whitelist
Finalise VM SKUs, SQL MI tier and vCores, storage requirements, region (UK South). This sizing recommendation drives the cost model.
Current production setup documented with all components, connections, and dependencies mapped
Finalised SKUs, vCores, storage tiers — the basis for the cost model and RFP response
All identified risks with likelihood, impact, and proposed mitigations
The sizing recommendation is the key Phase 1 output — it determines the cost model and drives the RFP response to the client.
| Resource | SKU / Config | Justification |
|---|---|---|
| Azure Blob Storage | Hot tier, LRS, ~500 GB | Weekly backup staging — 4 weekly copies of all 17 databases with 28-day retention |
| Azure Bastion | Standard SKU | Secure RDP to all VMs — no public IPs, no VPN needed. Audit-logged access |
| Azure Firewall | Standard SKU | Egress filtering — allowlist-only outbound to sandbox endpoints. Prevents accidental production contact |
| Azure Key Vault | Standard | All connection strings, API keys, secrets. Managed identity access. Versioned secret rotation |
| Azure Monitor + Log Analytics | Per-GB ingestion | Centralised logging, alerting, diagnostics, and support visibility across VM, SQL, firewall, and security events |
| Microsoft Defender for Cloud | Servers + SQL + Storage + Key Vault | Continuous vulnerability and threat monitoring so the test estate does not become the weak security point |
| Azure Backup | 3 protected VMs, LRS | Fast recovery path for failed releases, broken configurations, or accidental deletion during testing |
| Azure Update Manager | 3 managed servers | Automated patching to keep the Windows estate current without manual server-by-server maintenance |
This is the client-ready cost estimate for the recommended build only: the approved 3-VM split, one Azure SQL Managed Instance, and the supporting Azure services needed to keep the environment secure, recoverable, and properly isolated from production.
Based on the exported Microsoft Azure Pricing Calculator estimate in UK South, Pay-As-You-Go, dated 16 April 2026. This is the current safe planning baseline for the recommended environment.
Current calculator export total for the full recommended environment.
Recommended budget lineUseful for annual planning, internal approval, and client budget framing.
Firewall, SQL Managed Instance, and the 3 VMs together account for about 78% of the total.
This estimate was taken from the official Microsoft Azure Pricing Calculator. The screenshot below is included as the visual source reference used for the client-facing cost breakdown. Source: azure.microsoft.com/en-us/pricing/calculator/
| Azure Service | Recommended Baseline | Monthly Cost | Purpose in Plain English | Why This Is Recommended |
|---|---|---|---|---|
| VM1 — Web Tier | D4s v5, 4 vCPU, 16 GB RAM, 128 GB P10 OS disk, 256 GB P15 data disk | £240.95 | Runs the Pearl websites, internal web services, cache, and search. In simple terms, this is the front door that serves pages, handles requests, and keeps the user-facing side responsive. |
This is the one server that needs the most headroom because it carries IIS, Memcached, and Solr together. The chosen size is large enough for realistic testing without paying for a production-scale machine. |
| VM2 — Worker Tier | D2s v5, 2 vCPU, 8 GB RAM, 128 GB P10 OS disk | £129.44 | Runs the background jobs that users do not see directly, such as queue processing, health checks, AI spooler activity, and the Totem notification service. |
Keeping this work off the web server protects test fidelity and matches how the live platform behaves. The smaller VM is enough because these services are mostly waiting on I/O rather than using heavy CPU. |
| VM3 — Build / Restore Tier | D2s v5, 2 vCPU, 8 GB RAM, 128 GB P10 OS disk, 128 GB P10 data disk | £147.39 | Handles builds, deployments, backup downloads, database restores, and masking scripts. This is the engineering workbench for the environment. |
Builds and restore jobs can be noisy and storage-heavy. Giving them their own server avoids slowing down test activity on VM1 and VM2 and makes the environment easier to support. |
| Azure SQL Managed Instance | General Purpose, 4 vCores, 256 GB storage | £660.15 | Stores all 17 Pearl databases and provides the SQL features the application expects, including cross-database behaviour that simpler database services do not handle well. |
Managed Instance is the right fit because it behaves much more like the current SQL estate. General Purpose keeps compatibility while avoiding the higher Business Critical price that test does not need. |
| Azure Blob Storage | General Purpose v2, Hot, LRS, 500 GB | £8.15 | Acts as the landing zone for weekly backup files, restore artefacts, and a small amount of supporting automation content. |
It is the cheapest practical way to keep several weekly copies ready for refresh work. This is a low-cost, high-value part of the design. |
| Azure Bastion | Standard SKU | £159.29 | Provides secure browser-based admin access to the Windows servers without exposing remote desktop ports or public IPs on the VMs. |
This is recommended because it sharply reduces the attack surface and gives a cleaner, more defensible security story for the client. |
| Azure Firewall | Standard SKU | £686.58 | Works as the outbound checkpoint for the whole estate. It decides what the environment is allowed to contact outside Azure. |
This is what keeps the test environment truly isolated. It prevents accidental calls to live payment gateways, live SMS routes, live email flows, or production-only destinations. |
| Azure Key Vault | Standard tier | £2.39 | Stores passwords, connection strings, API keys, and certificates in one controlled vault instead of scattering secrets across servers and config files. |
It is recommended because it removes one of the most common causes of security drift: secrets copied into scripts, notes, or server folders. |
| Azure Monitor + Log Analytics | Workspace, diagnostics, alerting, application insights | £140.23 | Collects logs, metrics, alerts, and diagnostic events from the servers, SQL, firewall, and supporting Azure services. |
Without it, every issue becomes a manual investigation across multiple machines. With it, the team gets one place to troubleshoot and prove what happened. |
| Microsoft Defender for Cloud | 3 protected servers, SQL, storage, Key Vault | £29.91 | Continuously checks the environment for vulnerabilities, missing controls, and suspicious security events. |
Recommended because a weaker test environment can become the easiest route into the wider estate. This keeps the security posture close to production. |
| Azure Backup | 3 protected VMs, LRS backup storage | £96.61 | Creates recoverable restore points for the servers so the team can roll back quickly if a deployment or test change breaks the environment. |
Recommended because rebuilding multiple Windows servers manually is slow and expensive. Backup reduces recovery time and lowers operational risk. |
| Azure Update Manager | 3 managed servers | £11.29 | Automates Windows patching so the environment stays current without someone having to maintain each server by hand. |
Recommended because patching is often skipped on test platforms first. This keeps the estate supportable and avoids preventable security issues. |
| Microsoft Support Plan | Azure support coverage | £75.24 | Gives access to Microsoft support engineers if there is a platform problem with Azure networking, SQL Managed Instance, or service-level issues. |
Recommended because some Azure issues cannot be solved from inside the project team alone. It shortens outage time and gives a formal escalation path. |
| Total Recommended Monthly Cost | Recommended test environment baseline | £2,387.62 | Annual view: £28,651.44. This is the current calculator-backed baseline for the proposed test estate. |
|
The biggest cost items are the firewall, the database tier, and the three servers themselves. That is expected: isolation, SQL compatibility, and role separation are the three design choices that make this environment useful rather than just cheap.
This estimate is the current calculator export, not a discounted contract rate. It is suitable for planning and client presentation, but not a commercial quote.
The VM lines in the export were costed at full monthly runtime. Once business-hours auto-shutdown is applied, day-to-day compute spend should land lower than this baseline.
SQL Managed Instance stays on all the time. That is why its cost remains one of the dominant line items even if the VMs are scheduled down overnight.
Potential optimisations exist later. Azure Hybrid Benefit or reserved pricing could reduce costs once the client commits to the estate long term, but those savings should not be assumed up front.
The recommended test environment is worth funding because it removes live operational risk, not because it is the cheapest possible build. At about £2.39k per month, MessageDirect gets a secure, production-shaped platform where releases, restores, integration changes, and troubleshooting can happen safely before anything reaches the live Pearl service.
The cost is driven by the controls that make the environment credible: proper SQL compatibility, secure private access, outbound isolation, and recoverability. Those are exactly the controls the client is buying, and they are what turn this from a simple lab into a safe engineering platform.
Complete guide to Azure payment options, subscription types, and resource-level cost controls — with direct answers on what can be paused and what cannot.
Sources: Azure Cost Management docs · VM billing states
Partly, yes. The 3 virtual machines can be stopped and deallocated, which removes their compute runtime charge. However, Azure does not provide a single pause button for the whole estate. Several platform services continue billing while they remain deployed — especially SQL Managed Instance, Azure Firewall, Bastion, storage, backup retention, monitoring, and support cover.
In Azure, the word "subscription" has two distinct meanings:
For this estate, the resource purchase model is the more important cost-control lever. Both categories are documented comprehensively below.
How each resource in the current £2,387.62/month estimate behaves when you try to stop, pause, or reduce it.
| Azure Resource | Current Cost | Applicable Purchase Models | Can It Be Stopped? |
|---|---|---|---|
| VM1, VM2, VM3 Compute runtime |
£517.78 | PAYG · Reserved Instance · Savings Plan · Azure Hybrid Benefit · Dev/Test pricing | ✅ Yes. Stop + deallocate removes compute charges. Auto-shutdown schedulable. Disks, backup retention, monitoring, and Defender still bill. |
| Managed Disks Attached to VMs |
Incl. in VM baseline | PAYG on provisioned disk size | ⚠️ No pause. Disk charges continue while the VM is deallocated. Must delete or downgrade to save. |
| Azure SQL Managed Instance General Purpose, 4 vCores |
£660.15 | PAYG · Reserved Instance · Azure Hybrid Benefit for SQL | ❌ No practical stop-start. Always-on in this design. Savings from reservation, rightsizing, or licence benefit. |
| Azure Firewall Standard tier |
£686.58 | PAYG (deployment hours + data processed) | ❌ No meaningful pause. Bills while deployed. Delete for long mothball periods only. |
| Azure Bastion Standard tier |
£159.29 | PAYG (deployment hours) | ❌ No meaningful pause. Bills while deployed. Same approach as Firewall. |
| Blob Storage LRS, Hot tier |
£8.15 | PAYG · Lifecycle tiering (Hot/Cool/Archive) | ⚠️ No pause concept. Use lifecycle policies and Cool/Archive tiers for cost reduction. |
| Monitor + Log Analytics | £140.23 | PAYG ingestion · Commitment tiers | ⚠️ Tuneable. Reduce diagnostics, shorten retention, disable verbose collection. |
| Backup + Update Manager | £107.90 | PAYG per protected instance | ⚠️ Partial. Stop new backups, but retained recovery points still incur storage cost. |
| Support Plan | £75.24 | Monthly subscription | ⚠️ Downgrade only. Can be cancelled or moved to Basic (free) tier. |
| Security: Key Vault + Defender | £32.30 | PAYG per operation / per protected resource | ⚠️ Disableable. Defender can be turned off, but not recommended for security posture. |
Enforce business-hours auto-shutdown for VM1 and VM2. Keep VM3 off by default — run it only when builds, restores, or masking jobs are needed. This targets the part of the estate that genuinely can be paused.
SQL MI (£660), Firewall (£687), and Bastion (£159) total £1,506/month and do not pause. They are the compatibility and isolation controls this environment requires.
Azure Hybrid Benefit + 1-year Reserved Instances for SQL MI and VMs are the strongest levers. Dev/Test subscription pricing is also a strong fit if Visual Studio licences exist.
This focuses on the resources already in the current estimate. It distinguishes where cost can realistically be cut, what risk that introduces, and how that risk could be controlled if the client chooses to reduce scope.
| Resource / Control | Potential Saving | Risk If Removed or Reduced | Mitigation |
|---|---|---|---|
| VM3 runtime hours Build / restore VM |
Medium | Low risk if managed properly. Build, restore, and masking jobs will not be instantly available outside planned windows. |
Keep VM3 powered off by default, publish a startup runbook, and schedule restore/build windows ahead of testing cycles. |
| Azure Monitor / Log Analytics volume Diagnostics + retention |
Low to Medium | Low operational risk. Reduced visibility into faults, slower troubleshooting, and less historical evidence. |
Keep core health, security, and deployment logs; reduce verbose diagnostics only; shorten retention for non-critical data first. |
| Backup retention depth Recovery point storage |
Low to Medium | Moderate resilience risk. Fewer restore points mean less ability to recover from older defects or operator mistakes. |
Agree a minimum recovery objective, retain enough points for weekly rollback needs, and document the reduced recovery window explicitly. |
| Support plan downgrade Standard to Basic |
~£75/month | Low technical risk, moderate support risk. Slower or no Microsoft escalation path if an Azure platform issue occurs. |
Only downgrade if the client is comfortable relying on internal support and partner support during test-only operations. |
| Defender coverage scope Security plan reduction |
Low | Moderate security risk. Reduced threat detection, weaker posture reporting, and less evidence for security review. |
Only reduce on clearly non-sensitive components, and keep compensating controls such as NSGs, Key Vault, audit logs, and patching in place. |
| Azure Bastion Admin access layer |
~£159/month | High access risk. Removing Bastion weakens the secure remote-access model and pushes the team toward less controlled admin access patterns. |
Only remove if the environment is shut down for an extended period. Recreate Bastion before resuming active engineering or testing. |
| Azure Firewall Outbound control + segmentation |
~£687/month | High security and compliance risk. Loss of central egress filtering and reduced confidence that test traffic cannot reach unintended external targets. |
Not recommended while the environment is active. If ever removed, replace with an agreed alternative control set and re-run the security design review. |
| Azure SQL Managed Instance Core database platform |
~£660/month | High platform risk. Removing it would remove the core database layer and effectively take the environment out of service. |
Do not remove if the test environment is expected to remain usable. Cost reduction here should come from rightsizing, reservation, or licence benefit instead. |
| Core VM estate (VM1 / VM2) Web + worker runtime |
High, but disruptive | High service risk. Removing or permanently stopping core VMs breaks the application runtime and prevents meaningful test execution. |
Do not remove during active project use. Restrict savings to scheduled shutdown outside business hours rather than removal. |
VM3 runtime scheduling, log-volume tuning, support-plan downgrade, and carefully reduced backup retention are the lowest-risk savings available inside the current design.
Defender scope reduction and Bastion removal are possible only if the client explicitly accepts weaker security or access posture for a period of time.
SQL MI, Azure Firewall, and the core VM runtime are foundational. Removing them is less a cost optimisation and more a decision to partially or fully suspend the environment.
All statements in this section are backed by official Microsoft documentation:
Dual-mode data pipeline: Testing Mode for regression and Debug Mode for real-world investigation.
Source: RFP Section 5.2
A single CLI tool supports both modes: --mode testing (clean + seed) or --mode debug (backup + restore + mask). Both modes repoint ConfigStrings to test endpoints and validate data integrity before marking the environment ready.
Callers, CallerHistory
Name, Phone, Email, Address
Faker-generated UK dataUsers, CompanyContacts
Name, Email, Phone, Password
Hashed/randomisedInvoices, Payments
CustomerName, BankDetails, CardRefs
Synthetic replacementMessageContent, SMSSpoolOutgoing
CallerName, Phone, MessageText, Mobile
Faker replacement + anonymisedVarious log tables
PII embedded in log payloads
Truncated/replacedEvery external integration safely sandboxed, stubbed, or disabled. Full inventory from third-party dependencies index.
Source: 53-third-party-dependencies-index.md + Phase 1 codebase analysis
GitHub Actions with self-hosted runner, Terraform for multi-environment spawning, approval gates and rollback.
Source: RFP Section 5.4 & 5.5
Per the RFP requirement, Terraform is the primary IaC tool for its superior multi-environment capabilities. A single terraform apply with variable overrides can provision N parallel test environments — e.g., one per feature branch or test cycle.
terraform/
├── modules/
│ ├── networking/ # Hub-Spoke VNets, NSGs, Bastion, Firewall
│ ├── compute/ # VM1 (Web), VM2 (Worker), VM3 (Build)
│ ├── database/ # SQL MI (General Purpose)
│ ├── security/ # Key Vault, RBAC, Managed Identities
│ ├── monitoring/ # Log Analytics, Azure Monitor, Audit
│ └── storage/ # Blob Storage, SAS policies
├── environments/
│ ├── test-01/ # Primary test environment
│ ├── test-02/ # Feature branch environment
│ └── test-N/ # N-th on-demand environment
├── main.tf # Root module composition
├── variables.tf # Parameterised config
└── backend.tf # Azure Blob remote state
Robocopy artefacts to IIS physical paths — mirrors current production method
Rollback: Previous build backed up to_rollback/
Stop Windows Service → copy binaries → restart service
Rollback: Same backup/restore approachTesting mode: clean + seed scripts. Debug mode: backup/restore + mask. Switchable via pipeline.
Rollback: Re-run mode switch to restore stateTerraform plan → apply with approval. Entire environment from code. Multi-env via workspaces.
Rollback:terraform destroy + terraform apply
18 controls covering network, identity, data, and audit layers.
Source: RFP Section 5.6
Separate Azure subscription for test environment
All VM access via Azure Bastion only — no exposed endpoints
Per-subnet NSGs with least-privilege port rules
Azure Firewall with allowlisted outbound only
No VNet peering to production subscription — air gap
All connection strings and API keys in Key Vault
System-assigned MI for Key Vault & Blob access
Custom role definitions per persona (see RBAC section)
Just-in-time elevation for dangerous operations
ADE (BitLocker) with Customer-Managed Keys
Encrypt=True, TrustServerCertificate=False
Dedicated PearlAudit database + Log Analytics
VMs off 19:00-07:00 + weekends (business hours)
Environment=Test, Project=Pearl enforced tags
Enforce: no public IPs, required tags, region lock
PII anonymisation executed on every Debug Mode restore
Documented retention, purpose limitation, access controls
90-day secret rotation, 365-day CMK rotation via Key Vault
Dedicated PearlAudit database with separate tables for every category of change — complete, tamper-resistant audit trail.
Source: RFP Section 9.2 Option D
State-changing tables from the audit scope below write INSERT/UPDATE/DELETE events to Audit_ChangeLog with before/after JSON values.
Global.asax captures page access and API call events with user identity, IP address, and correlation IDs
All external API calls logged with PII-sanitised payloads, response codes, and timing data
GitHub Actions posts deployment events via secure webhook — artefact hashes, approver identity, timestamps
Not all 489 tables receive synchronous SQL triggers. The direct audit scope focuses on the 28 high-risk tables that can change configuration, permissions, customer data, financial records, queue execution, or regulated communication content. The rest remain covered through access logs, integration logs, deployment logs, and system events unless later discovery promotes them into the trigger scope.
| Source DB | Tables in direct audit scope | Primary audit route | Why in scope |
|---|---|---|---|
| PearlOperations | ConfigStrings | SQL trigger → Audit_ConfigChanges | Controls API keys, endpoints, feature flags, and runtime behaviour. |
| PearlUsers | Users, Permissions, LoginLogs, Companies, CompanyInfo, CompanyContacts, Rotas, Shifts | SQL trigger → Audit_ChangeLog / Audit_SecurityEvents | Identity, tenancy, rota, and permission changes determine who can access the system and how escalations are routed. |
| PearlData | Messages, Callers, CallerHistory, ScreenInits, PhysicalDDIs | SQL trigger → Audit_ChangeLog / Audit_DataAccessLog | Core message-taking tables hold caller PII, operator-entered content, screen state, and DDI routing context. |
| PearlQueues | DispatchQueue, Process_JobQueue, Process_MachineStates, JobSchedules | SQL trigger → Audit_ChangeLog | These tables control background execution, dispatch timing, and worker coordination. |
| PearlBilling | Invoices, BillItems, Payments | SQL trigger → Audit_ChangeLog / Audit_IntegrationLog | Financial correctness, payment actions, and accounting exports depend on these records. |
| Messages | MessageContent | SQL trigger → Audit_DataAccessLog | Legacy message body storage contains customer communication content. |
| SMSBroadcast | SMSSpoolOutgoing | SQL trigger → Audit_IntegrationLog | Outbound customer communications need clear send intent and change history. |
| PearlLog | PageAccessLogs, APILogs, ProcessLogs, SecurityExceptions | Async logging → Audit_AccessLog / Audit_SecurityEvents | Evidence of user journeys, API misuse, worker faults, and suspicious activity. |
| PearlSwitch | CallRecordings | SQL trigger → Audit_DataAccessLog | Call recording metadata is sensitive operational evidence and needs attributable access history. |
Custom Azure role definitions with least-privilege access and Privileged Identity Management.
Source: RFP Section 5.1
| Role | Scope | Permissions | Assigned To |
|---|---|---|---|
| Pearl-TestEnv-Admin | Subscription | Full Contributor + KV admin + SQL MI admin | Infra team lead (PIM-gated) |
| Pearl-TestEnv-Developer | Resource Group | VM Contributor + KV Secret Reader + SQL MI Read/Write | Development team |
| Pearl-TestEnv-QA | Resource Group | VM Reader + SQL MI Data Reader (read-only) | QA/testing team |
| Pearl-TestEnv-Deployer | Resource Group | VM Contributor (start/stop) + Blob Reader + KV Secret Reader | GitHub Actions SP |
| Pearl-TestEnv-DBA | SQL MI | SQL MI Contributor + KV Secret Reader | Database administrators |
| Pearl-TestEnv-Auditor | Log Analytics | Log Analytics Reader + Audit DB read-only | Compliance / audit |
4-hour max duration. Requires tech lead approval. For infrastructure changes and Key Vault management.
2-hour max duration. Requires DBA lead approval. For emergency database operations only.
8-hour max duration. Requires DPO approval. For loading unmasked production data.
Customer-Managed Keys for all encryption layers with automated rotation policies.
Source: RFP Section 9.2 Option D
ADE (BitLocker) with Customer-Managed Key (CMK) stored in Key Vault. All VM disks encrypted at rest.
Transparent Data Encryption with CMK. SQL MI TDE protector key rotated annually via Key Vault policy.
Server-Side Encryption with CMK. Backup .bak files encrypted at rest with customer-controlled keys.
90-day rotation for API keys and service principal secrets. 365-day rotation for encryption CMKs. 14-day expiry alerts.
| Key / Secret Type | Rotation Period | Method | Alert Threshold |
|---|---|---|---|
| Integration API keys | 90 days | Manual rotate + KV version | 14 days before expiry |
| Service principal secrets | 90 days | Auto-rotate via KV policy | 14 days before expiry |
| SQL MI connection string | 90 days | Auto-rotate via Azure Function | 14 days before expiry |
| Disk encryption CMK | 365 days | Auto-rotate via KV policy | 30 days before expiry |
| SQL MI TDE protector | 365 days | Auto-rotate via KV policy | 30 days before expiry |
| Blob encryption CMK | 365 days | Auto-rotate via KV policy | 30 days before expiry |
RFP-aligned weekly plan targeting MVTE readiness by mid-May 2026.
Source: RFP Section 7
Current-state validation. Full integration inventory (all 22 services). Data strategy selection (dual-mode). Terraform module design.
✅ Deliverables: Discovery report + Integration safety matrix + Agreed design
Azure infrastructure via Terraform. Hub-Spoke VNets, VMs, SQL MI, Bastion, Firewall, Key Vault. Initial app deployment via CI/CD. Integration sandbox/stub configuration.
✅ Deliverables: Running test environment + Initial deployment validated
Dual-mode data pipeline (testing + debug modes). Weekly refresh automation. RBAC hardening and audit logging system. Runbooks and SOPs.
✅ Deliverables: Data pipeline operational + Runbooks delivered + Security controls verified
Stabilisation and defect fixes. Multi-environment spawning verification. Acceptance evidence pack. Handover session and recorded walkthrough.
✅ Deliverables: Acceptance criteria met + Handover complete
425 hours / 53 working days on the selected 3-VM + 1 DB design across 5 weeks. Target: mid-May 2026.
Source: RFP Section 7, 9.2
Current-state validation
40h (5 days) ✅ Completed3-VM + 1 DB foundation
105h (13 days)Restore, mask, and refresh automation
96h (12 days)Cloneable environment pattern
32h (4 days)Hardening and audit evidence
56h (7 days)Testing & handover
56h (7 days)Stabilisation
40h (5 days)Three-perspective comparison plus the detailed RFP Option A-D breakdown on the selected 3-VM + 1 DB design.
Source: RFP Section 9.2 — Mandatory Table Format
| Work Package (RFP 9.2) | Original Est. | RFP-Adjusted | Final Proposed |
|---|---|---|---|
| Discovery | 24-32h | 32-40h | 40h |
| RFP Option A — MVTE Build | 64-88h | 88-112h | 105h |
| RFP Option B — Data Pipeline | 64-80h | 80-104h | 96h |
| RFP Option C — Multi-env Capability | 0h (not estimated) | 24-32h | 32h |
| RFP Option D — Enhanced Security | 16-24h | 48-64h | 56h |
| Validation & Handover | 32-40h | 48-64h | 56h |
| Buffer | 40h | 40h | 40h |
| TOTAL | 240-304h | 360-456h | 425h (53 days) |
Why the increase? The uplift comes from standing up the selected 3-VM + 1 DB foundation, adding the restore-and-mask data pipeline, and then layering on multi-environment capability plus the enhanced security controls required by the RFP.
Architecture option totals below include a 20% buffer and are rounded to the nearest 5 hours.
Architecture option totals for topology comparison:
Baseline RFP-compliant path
Includes worker refactor to Functions
Lower infra effort, less isolation
Extra tiering and deployment complexity
These are work packages delivered on top of the chosen architecture, not alternative topologies. RFP Option A carries the 105h baseline because it creates the private landing zone, provisions VM1, VM2, VM3, wires the single SQL Managed Instance, and proves the first repeatable deployment path.
Key risks identified during Phase 1 discovery. Click to expand.
Risk: AI spooler uses http://10.0.0.12, reporting uses 10.0.1.44, queue-processor references pearlsqlmi2, totem reads from C:\totemscripts\*.txt
Mitigation: Audit all source for hardcoded IPs/hostnames. Create web.config overrides or hosts file entries on test VMs. Provision required filesystem paths.
Risk: PII may exist in unexpected columns/tables across 489+ tables.
Mitigation: Data discovery audit before go-live. Deny-by-default approach — mask all string columns in PII-sensitive tables. Review PearlLog payloads.
Risk: Test/sandbox API keys (Stripe, Genesys, etc.) must not leak into production-visible systems.
Mitigation: All secrets stored in Azure Key Vault. No secrets in code, config files on disk, or repository. Managed Identity access only.
Risk: Actual DB sizes not yet measured — restore pipeline duration and storage costs are estimates.
Mitigation: Measure actual sizes during Phase 1 completion. Consider trimming PearlLog/PearlArchive for test. Adjust Blob Storage tier if needed.
Risk: RadControls for ASP.NET AJAX require a valid licence for the test environment.
Mitigation: Verify existing licence covers non-production use. Contact Telerik/Progress if additional licence needed.
Risk: totem-2-cloud-nosql targets .NET Framework 3.5 — must be installed as a Windows feature.
Mitigation: Enable .NET 3.5 feature on VM2 via DISM during provisioning script. Also provision C:\totemscripts\ directory with script templates.
Risk: Remote state stored in Azure Blob could be corrupted by concurrent operations or manual changes outside Terraform.
Mitigation: Enable state locking via Azure Blob lease. Blob versioning for state file recovery. terraform plan mandatory before apply in CI/CD.
Risk: SQL triggers on high-volume tables (e.g., PearlLog) could add latency to application transactions.
Mitigation: Triggers only on critical configuration/security tables. High-volume access logging via async middleware, not triggers. Performance test during Phase 4.
Risk: Automatic key rotation could temporarily break SQL MI TDE or Blob encryption if applications cache old keys.
Mitigation: Key Vault rotation events trigger Azure Function that validates connectivity. Dual-key overlap period (old key retained for 24h). Rotation during maintenance window.
Risk: Each additional Terraform workspace spawns a separate SQL MI instance (~£280/mo), which could exceed budget if many environments run simultaneously.
Mitigation: Auto-shutdown policy for non-primary environments. Spawn-on-demand, destroy-after-use workflow. Budget alerts at 80% threshold.
Risk: Testing Mode synthetic seed data may diverge from production schema changes over time, causing false test failures.
Mitigation: Seed data scripts version-controlled in repo. Schema diff check included in CI/CD pipeline. Quarterly seed data review process.
Risk: Not all 22 integrations offer sandbox/test environments. Some vendors may charge extra or have limited test APIs.
Mitigation: Integration inventory categorises each service's test strategy. Disabled-by-default for services without sandbox. Confirm sandbox access during Week 1 discovery.
Phase 1 Discovery is complete. The architecture is designed, risks are mapped, and hours estimates follow the RFP 9.2 mandatory format. Once approved, the RFP response can be formalised.
Comprehensive step-by-step guide with 25+ embedded video tutorials for every phase
Infrastructure Engineering Team — March 2026