← Back to Proposal
Implementation Guide & Video References Phases 2–6

Pearl Test Environment — Complete Implementation Guide

A comprehensive, step-by-step reference with curated video tutorials for every phase of building the Pearl test environment on Azure. Designed for teams new to Azure, CI/CD, GitHub Actions, and Terraform.

25+ Video Tutorials
5 Build Phases
385h Total Effort
~49 Working Days
00

Prerequisites — Before You Start Anything

Before Phase 2 begins, these items must be completed. Think of these as the "tickets to enter" — without them, the build will stall immediately.

Administrative Prerequisites

These are the business and access approvals your team needs before touching any Azure resources. Each one prevents a blocker later.

Technical Measurements Required

Before we build the test database, we need to know how big the current databases actually are. You wouldn't move furniture into a new flat without measuring the doorways first — same principle.

Accounts & Access You'll Need

🔑

Azure Portal Access

An Azure account with at least Contributor role on the test subscription. You'll use portal.azure.com for most provisioning.

🐙

GitHub Account

Admin access to the repository. Needed to register self-hosted runners and create environment protection rules.

💳

Stripe Test Keys

Stripe test-mode API keys (sk_test_xxx). Available from the Stripe dashboard under Developers → API keys.

📧

Mailgun Sandbox

Mailgun sandbox domain and API key. All test emails route through the sandbox so no real emails are sent.

💰

GoCardless Sandbox

GoCardless sandbox token. Prevents any real Direct Debit mandates from being created.

📞

Genesys Sandbox

Genesys Cloud CX sandbox org ID. Test telephony routing without touching real phone lines.

⚠️ Don't Skip the Prerequisites

The most common reason test environment builds stall is missing prerequisites. Azure subscription creation can take days if your organisation requires approval. Sandbox API keys from third parties can take a week or more. Start collecting these now, even before Phase 2 begins.

📚 Related Documentation

LP

Recommended Learning Path

If you're new to Azure, CI/CD, and Infrastructure as Code, follow this learning path before starting the build. Each step builds on the previous one.

1

Understand Cloud & Azure Basics (2–3 hours)

What is a subscription, resource group, region, and VNet? Start with the AZ-900 and Azure overview videos below. Don't worry about memorising everything — just get familiar with the vocabulary.

2

Learn What CI/CD Means (1 hour)

Understand "Continuous Integration" (auto-building code) and "Continuous Deployment" (auto-pushing it to servers). Watch the CI/CD and DevOps concept videos. You don't need to be an expert — just understand the flow.

3

Learn GitHub Actions (2–3 hours)

This is YOUR deployment tool. Understand workflows, jobs, steps, runners, and secrets. The TechWorld with Nana and CoderDave tutorials are excellent starting points.

4

Learn Terraform Basics (3–4 hours)

Terraform lets you describe your entire Azure infrastructure in code files. Instead of clicking through the Azure portal, you write a file that says "create a VNet, create a VM" and Terraform builds it. The freeCodeCamp Terraform + Azure course is perfect.

5

Combine It All — IaC + CI/CD (2 hours)

Watch the Traversy Media DevOps crash course to see how Docker, Terraform, and GitHub Actions work together. This connects the dots between everything you've learned.

📚 Related Documentation

FV

Foundation Videos — Watch These First

These videos cover the core technologies used in this project. Watch them before diving into any phase-specific work. They're ordered from easiest to most advanced.

Azure Cloud Fundamentals

If you've never used Azure before, start here. These videos teach you what Azure is, how resources are organised, and how networking works in the cloud.

AZ-900 | Microsoft Azure Fundamentals Full Course
Adam Marczak — Azure for Everyone Essential Beginner
Azure Full Course — Learn Microsoft Azure in 8 Hours
Edureka Beginner
AZ-900 Ep 10 | Networking — VNet, VPN Gateway, Load Balancer
Adam Marczak — Azure for Everyone Essential Beginner

DevOps & CI/CD Concepts

Before learning specific tools, understand the why. What is DevOps? What does CI/CD actually mean? Why do we automate deployments instead of copying files manually?

What is DevOps? REALLY Understand It
TechWorld with Nana Essential Beginner
DevOps CI/CD Explained in 100 Seconds
Fireship Essential Beginner
DevOps Prerequisites Course — Getting Started with DevOps
freeCodeCamp Beginner
What is Infrastructure as Code? IaC Tools Compared
TechWorld with Nana Essential Beginner

PowerShell Fundamentals

You'll use PowerShell extensively for configuring Windows servers, running database scripts, and automating tasks. If you've never used PowerShell, this is essential viewing.

PowerShell Master Class — PowerShell Fundamentals
John Savill's Technical Training Essential Beginner

📚 Related Documentation

02

Phase 2 — Build the Cloud Infrastructure

This is the foundation. We're building the Azure cloud infrastructure from scratch — a secure private network, three Windows servers, a database, and the "plumbing" that connects everything.

☁️

Phase 2 — Azure Infrastructure Provisioning

Networks, VMs, Database, Key Vault, Storage — built from the ground up

85h ~11 working days

💡 What You're Building (In Plain English)

Imagine building a secure office building. First, you build the walls and security gates (networking). Then you set up the rooms (VMs). Then you install the filing cabinets (database). Then you put locks on the doors and a key cabinet (Key Vault). That's Phase 2.

Step 2.1–2.4: Networking & Security Foundation

What this is: Creating isolated private networks in Azure. Think of a VNet (Virtual Network) as a private office building. A subnet is a floor in that building. A Network Security Group (NSG) is the security guard who checks badges at each floor. Azure Bastion is a secure front door that lets admins in without exposing the building to the public street. Azure Firewall is the security checkpoint that controls what goes out.

1 Create Azure Subscription & Resource Groups

A subscription is like a billing account — all resources you create go inside it. Resource groups are folders that organise related resources together. We create three: one for networking (hub), one for application servers (spoke), and one for database resources (data).

# Create three resource groups in UK South region az group create --name rg-pearl-test-hub --location uksouth --tags Environment=Test Project=Pearl az group create --name rg-pearl-test-spoke --location uksouth --tags Environment=Test Project=Pearl az group create --name rg-pearl-test-data --location uksouth --tags Environment=Test Project=Pearl
  • Subscription created with budget alert at £800/month (prevents surprise bills)
  • Resource groups created with proper tags (so you can track costs and ownership)
  • RBAC roles assigned: Contributor for infra team, Reader for QA team
2 Build the Hub Network (VNet + Bastion + Firewall)

The Hub VNet is the "security checkpoint" network. It contains Azure Bastion (your secure remote desktop gateway — you connect to VMs through this instead of exposing them to the internet) and Azure Firewall (controls what outbound traffic is allowed).

# Hub VNet layout: # Hub VNet: 10.1.0.0/16 # ├── AzureBastionSubnet: 10.1.0.0/26 (Azure Bastion lives here) # └── AzureFirewallSubnet: 10.1.1.0/26 (Azure Firewall lives here) # Create the Hub VNet az network vnet create --name vnet-pearl-hub --resource-group rg-pearl-test-hub \ --location uksouth --address-prefix 10.1.0.0/16 # Create subnets inside the hub az network vnet subnet create --name AzureBastionSubnet --vnet-name vnet-pearl-hub \ --resource-group rg-pearl-test-hub --address-prefix 10.1.0.0/26 az network vnet subnet create --name AzureFirewallSubnet --vnet-name vnet-pearl-hub \ --resource-group rg-pearl-test-hub --address-prefix 10.1.1.0/26

Then deploy Azure Bastion (Standard SKU) and Azure Firewall (Standard SKU). Configure firewall rules to allow outbound access to GitHub, NuGet, Azure Blob Storage, and sandbox API endpoints only. Block everything else by default.

  • Hub VNet created (10.1.0.0/16)
  • Azure Bastion deployed — you can now RDP to VMs securely
  • Azure Firewall deployed with outbound rules for GitHub, NuGet, Blob Storage, and sandbox APIs
3 Build the Spoke Network (Application VNet)

The Spoke VNet is where your actual application servers live. We divide it into subnets (floors) so the web server, worker server, build server, and database are each isolated.

# Spoke VNet layout: # Spoke VNet: 10.2.0.0/16 # ├── snet-web: 10.2.1.0/24 (VM1 — web server) # ├── snet-worker: 10.2.2.0/24 (VM2 — background services) # ├── snet-build: 10.2.3.0/24 (VM3 — build/deploy server) # └── snet-data: 10.2.4.0/24 (SQL Managed Instance) # Connect hub and spoke with VNet Peering # This allows Bastion (in the hub) to reach VMs (in the spoke) # and forces internet traffic through the hub's firewall
  • Spoke VNet created with 4 subnets
  • VNet peering established between Hub ↔ Spoke
  • Route table created: all internet traffic (0.0.0.0/0) → Azure Firewall
4 Configure Network Security Groups (NSGs)

NSGs are like security guards at each floor. They check: "Is this traffic allowed to come in / go out of this subnet?" We create one NSG per subnet with rules that only allow the exact traffic Pearl needs.

NSGKey Inbound RulesKey Outbound Rules
nsg-snet-web Worker→Web (80,443), Build→Web (80,443), Bastion→RDP (3389) Web→DB (1433), Web→Worker (8080), Web→Internet (443 via FW)
nsg-snet-worker Web→Worker (8080), Bastion→RDP (3389) Worker→Web (80,443), Worker→DB (1433)
nsg-snet-build Bastion→RDP (3389) Build→Web (80,443), Build→Worker (*), Build→DB (1433), Build→Internet (443 via FW)
  • One NSG created and attached per subnet
  • Default deny — only explicitly allowed traffic passes
  • Verified: VMs cannot be accessed directly from the internet

Videos for Networking & Azure Infrastructure

Azure Networking — VNet, VPN Gateway, CDN, Load Balancer
Adam Marczak Essential
Azure Active Directory Tutorial | Identity & Access Management
Adam Marczak Intermediate

Step 2.5–2.7: Core Services (Key Vault, Storage, SQL MI)

5 Set Up Azure Key Vault (Password Vault)

Key Vault is like a secure lockbox for passwords and API keys. Instead of storing database passwords in configuration files (insecure!), we store them in Key Vault and let the servers retrieve them securely using their managed identities.

  1. Create Key Vault: kv-pearl-test in rg-pearl-test-spoke
  2. Enable soft-delete (accidentally deleted secrets can be recovered) and purge protection (prevents permanent deletion for 90 days)
  3. Add all the secrets: 17 database connection strings, Stripe test key, GoCardless sandbox token, Mailgun sandbox key, Genesys sandbox org ID, S3 test bucket credentials

💡 Why Not Just Put Passwords in Config Files?

Config files can be accidentally committed to Git, read by anyone with server access, or leaked in error messages. Key Vault centralises secrets, provides access logging (who read what, when), and allows easy rotation without touching server configs.

6 Set Up Blob Storage

Azure Blob Storage is like a cloud hard drive. We use it to store database backup files, restore scripts, and deployment artifacts. Create a storage account called stpearltest with three containers: backup (holds DB backups from production), restore (processed files), and scripts (automation scripts).

  • Storage account created (LRS redundancy, Hot tier, UK South)
  • Lifecycle management: auto-delete backups older than 28 days
  • Encryption at rest enabled (Microsoft-managed keys)
7 Provision SQL Managed Instance

SQL Managed Instance is a fully-managed SQL Server in the cloud. It's compatible with on-premises SQL Server features (which Pearl relies on heavily) but Azure handles patching, backups, and high availability for you. This is where all 17 databases will live.

⚠️ SQL MI Takes 4–6 Hours to Provision

This is the longest single provisioning step. Start this early in the day. Kick off the provisioning, then work on other tasks while Azure sets it up. Configuration: General Purpose tier, 4 vCores, 256 GB storage, public endpoint DISABLED, placed in the snet-data subnet.

Videos for Key Vault, Storage & SQL

Azure Key Vault Tutorial | Secure Secrets, Keys & Certificates
Adam Marczak Essential
Azure Storage Tutorial | Blob, Queue, Table & File Share
Adam Marczak Essential
SQL Course for Beginners (Full Course)
Programming with Mosh Beginner

Step 2.8–2.11: Server Build-Out (3 VMs)

Now we build the three Windows Servers. Each has a specific role. Think of them as three employees in the office, each with a distinct job description.

8 Build VM1 — Web Server

This is the "front desk". It runs IIS (Internet Information Services — Microsoft's web server), hosts the Pearl web application, internal web services, Memcached (a caching layer), and Apache Solr (the search engine). Operators and clients interact with Pearl through this server.

  1. Provision VM: Standard_D4s_v5 (4 vCPU, 16 GB RAM), Windows Server 2022 Datacenter
  2. Place in snet-web subnet, NO public IP (accessed only via Bastion)
  3. Harden OS: enable Windows Update, disable unnecessary features, enable disk encryption (BitLocker), configure Windows Firewall to match NSG rules
  4. Install IIS with ASP.NET 4.5 support:
    Install-WindowsFeature Web-Server, Web-Asp-Net45, Web-ISAPI-Ext, Web-ISAPI-Filter, Web-Mgmt-Console, Web-Http-Redirect
  5. Create 5 IIS sites with correct bindings:
SitePathBindingApp Pool
pearl-azureD:\apps\pearl-azurehttp:80, https:443PearlAppPool (.NET 4.0, Integrated)
pearl-webservicesD:\apps\pearl-webserviceshttp:8081PearlWSAppPool (.NET 4.0, Integrated)
utility-paymentsD:\apps\utility-server\paymentshttp:8082UtilityAppPool (.NET 4.0)
utility-xeroD:\apps\utility-server\xerohttp:8083UtilityAppPool
utility-reportingD:\apps\utility-server\reportinghttp:8084UtilityAppPool
  1. Install and configure Memcached as a Windows service on port 11211
  2. Install Java Runtime + Apache Solr as a Windows service on port 8983
  3. Add hosts file entries pointing to internal IPs
  4. Configure auto-shutdown at 19:00 UTC (saves costs since this is test-only)
9 Build VM2 — Worker Server

This is the "back office". It runs 4 Windows services that do background processing — queue processing (executing scheduled jobs), health monitoring, AI quality scoring, and real-time notifications (Totem). These run 24/7 and don't have a visible web interface.

  1. Provision VM: Standard_D2s_v5 (2 vCPU, 8 GB RAM), Windows Server 2022
  2. Install .NET Framework 3.5 (for Totem — it's built on an older .NET version) and .NET 4.8.1 (for ai-spooler)
  3. Create directory structure and register 4 Windows services:
    sc create QueueProcessor binPath="D:\apps\queue-processor\MessageQueueProcessor.exe" start=auto sc create SystemChecker binPath="D:\apps\system-checker\SystemChecker.exe" start=auto sc create AISpooler binPath="D:\apps\ai-spooler\AISpooler.exe" start=auto sc create TotemServer binPath="D:\apps\totem\Totem2.exe" start=auto
  4. Add hosts file entries mapping internal hostnames to IPs
  5. Configure auto-shutdown at 19:00 UTC
10 Build VM3 — Build/Dev Server

This is the "workshop". It compiles code (MSBuild), runs the GitHub Actions runner (which automates deployments), and hosts the database restore tooling. No end-users ever interact with this server.

  1. Provision VM: Standard_D2s_v5, Windows Server 2022
  2. Install Visual Studio Build Tools 2022 (with MSBuild 17, .NET 4.8 and 3.5 targeting packs, NuGet CLI)
  3. Install Git for Windows
  4. Install and configure GitHub Actions self-hosted runner (detailed in Phase 4)
  5. Create restore tools directory: D:\restore-tools\
  6. Configure auto-shutdown at 19:00 UTC
11 Configure Managed Identities & Key Vault Access

Managed Identity is like giving a VM its own ID card. Instead of storing Key Vault passwords on the VM (which defeats the purpose), the VM uses its identity to authenticate with Key Vault automatically. Azure handles this behind the scenes — no passwords needed.

  1. Enable System-assigned Managed Identity on VM1, VM2, VM3
  2. Grant Key Vault access: VM1 gets Secret Read, VM2 gets Secret Read, VM3 gets Secret Read + Storage Contributor
  3. Test: RDP into each VM, try to read a secret from Key Vault using PowerShell — it should work without providing any credentials

🏁 Phase 2 Checkpoint

Three servers running, database online, network secured — but no application code deployed yet. You should be able to RDP to any VM via Bastion, VMs can reach the database, and the firewall is blocking unallowed outbound traffic.

📚 Related Documentation

03

Phase 3 — Database Backup, Restore & Masking

Copy production data safely, remove all personal information, and automate weekly refreshes so the test environment always has realistic (but anonymised) data.

🗄️

Phase 3 — Database Pipeline

Backup → Download → Restore → Mask → Validate — automated weekly

96h 12 working days

💡 Why Not Just Use Production Data Directly?

Production databases contain real customer names, phone numbers, email addresses, bank details, and payment card references. Using this in a test environment violates GDPR and ISO 27001. We must mask (replace with fake data) all personal information before anyone touches the test databases.

1 Configure Production Backup to Blob Storage

Set up production SQL MI to write weekly backups of all 17 databases to the Azure Blob Storage container we created in Phase 2. In simple terms, this means production creates a sealed copy of each database and places it in a secure cloud storage location so the test environment can work from a copy instead of touching the live system.

Think of Azure Blob Storage as a locked filing cabinet in the cloud. Production places a fresh pack of database copies into that cabinet each week. The test environment only takes copies from the cabinet. It never pulls data straight out of the live system. That separation makes the process safer, easier to audit, and much easier to recover when something goes wrong.

-- For each of the 17 databases, create a backup to Blob URL: BACKUP DATABASE [PearlData] TO URL = 'https://stpearltest.blob.core.windows.net/backup/PearlData_20260420.bak' WITH CREDENTIAL = 'PearlTestBackupCredential', COMPRESSION, CHECKSUM;

Start by testing one database manually before scheduling all 17. Confirm the backup file lands in the correct Blob container, confirm the file size looks sensible, and confirm the job history reports success. Once the first test works, repeat the pattern for every database using a clear naming format such as DatabaseName_YYYYMMDD.bak.

  • Create the Blob credential first — SQL MI needs permission to write the backup files into Azure Blob Storage. If the SAS token or credential is wrong, the job fails immediately.
  • Use a naming standard — Include the database name and date in every file name so the team can immediately tell which backup set belongs to which weekly refresh.
  • Schedule during a quiet window — Saturday 02:00 UTC is a low-risk time that reduces pressure on production while still giving the test environment fresh data.
  • Keep compression and checksum enabled — Compression makes the files smaller and quicker to move. Checksum adds an integrity check so damage is easier to detect early.
  • Record evidence every week — Save file sizes, start and finish times, and job history. This proves the source data was created correctly before any restore begins.
  • SQL credential created on production MI for Blob SAS token
  • Weekly backup scheduled: Saturday 02:00 UTC
  • Initial test backup completed — record actual file sizes

🧭 What the Operator Actually Does

Connect to the production managed instance with SQL Server Management Studio or Azure Data Studio, create the Blob credential, run one manual test backup, then convert that logic into a scheduled SQL Agent job. After the job finishes, open the Azure portal and verify the file exists in the backup container before moving on.

2 Build the Automated Restore Tool (PowerShell)

A PowerShell script on VM3 automates the entire refresh process: download backups from Blob → restore to test SQL MI → run masking → validate → send notification. This script runs every Saturday and takes about 2–4 hours to complete. VM3 is the correct place to run this because it sits inside the private Azure network and can reach Blob Storage, SQL MI, and the rest of the internal environment safely.

In layman's terms, VM3 is the control room for the entire refresh. Instead of an engineer doing dozens of repetitive manual steps every weekend, the script performs the same sequence every time and writes down exactly what happened. That makes the process supportable and far less dependent on memory.

# VM3 preparation checklist Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope LocalMachine Install-Module Az -Scope AllUsers -Force Install-Module SqlServer -Scope AllUsers -Force New-Item -Path D:\restore-tools\logs -ItemType Directory -Force New-Item -Path D:\restore-tools\downloads -ItemType Directory -Force New-Item -Path D:\restore-tools\scripts -ItemType Directory -Force # Example manual run from an elevated PowerShell window on VM3 cd D:\restore-tools .\Invoke-PearlRestore.ps1 -BackupDate 2026-04-20 -Verbose
# Invoke-PearlRestore.ps1 — Key Functions: # 1. Download-Backups → Pull .bak from Blob (Az.Storage module) # 2. Restore-Databases → RESTORE DATABASE for each of 17 DBs # 3. Run-MaskingScripts → Execute T-SQL masking per database # 4. Update-ConfigStrings → Repoint endpoints to test values # 5. Validate-Restore → Row count checks, key queries # 6. Send-Notification → Email result summary
  • Connect through Azure Bastion — Operators should log into VM3 through Bastion, then open PowerShell as Administrator so the machine can manage modules, scripts, folders, and scheduled tasks properly.
  • Keep the folder structure obvious — Separate downloads, logs, scripts, and notifications. During an incident, clarity matters more than elegance.
  • Write human-readable logs — Each stage should clearly say which database is being downloaded, restored, masked, or validated so non-developers can follow progress.
  • Fail loudly — If one backup file is missing or one database restore fails, stop and report the exact failure rather than silently continuing with partial data.
  • Support a manual rerun — The runbook should tell an operator exactly how to log into VM3 and rerun the job with a date parameter when a scheduled run fails.

🖥️ Running It Manually on the VM

Connect to VM3, open an elevated PowerShell window, move to D:\restore-tools, and run the restore script with the backup date you want to use. Watch the log output. If one database is taking far longer than expected, stop guessing and check the log file, SQL restore status, free disk space, and Blob download completion before rerunning anything.

3 Create Data Masking Scripts (T-SQL)

Masking = replacing real data with fake but realistic-looking data. "John Smith" becomes "Test User 12345". "07700 900000" becomes "00000 000000". Real email addresses become "masked_123@test.pearl". These scripts run automatically after every restore. This is the step that turns a risky production copy into something safe enough for a test environment.

The important idea is that the data still needs to behave like real data even after it is anonymised. Testers still need to search it, filter it, and run workflows against it. The system should still feel realistic, but nobody should be able to identify a real customer, caller, user, or payment detail from what they see.

ScriptWhat Gets Masked
mask-PearlData.sqlCaller names, phone numbers, email addresses, physical addresses
mask-PearlUsers.sqlUser names, emails, phone numbers, company contacts, password hashes
mask-PearlBilling.sqlCustomer names on invoices, bank details, card references
mask-SMSBroadcast.sqlMobile numbers, SMS message body text
mask-Messages.sqlCaller names, phone numbers, message text content
mask-PearlLog.sqlTruncate/replace PII in audit log text fields
  • Mask the highest-risk fields first — Names, phone numbers, emails, addresses, bank details, card references, message bodies, and free-text notes are the main danger areas.
  • Preserve relationships where needed — If one person appears in multiple tables, the replacement values should still line up logically so testing remains useful.
  • Reset access safely — Password hashes, service credentials, and admin access should be replaced with non-production values the team controls.
  • Do manual spot checks — Review at least ten records in each important table after the first masking run to prove no real personal data remains.

🔍 What Success Looks Like

The environment contains believable but fake customer and message data. Testers can do their work, but they cannot discover a real phone number, real email address, or real payment reference anywhere in the restored estate.

4 Override ConfigStrings for Test Safety

After restoring and masking, override configuration values in the database so the test environment cannot accidentally call live services. This is the safety net that prevents test environments from sending real SMS messages, using live email domains, charging real cards, or talking to live telephony and payment systems.

This matters because many applications store live settings inside database tables, not just inside config files. If you restore a production database and do nothing else, the application can still contain live API keys, live URLs, or live feature flags. This step deliberately swaps those values for safe sandbox or disabled settings.

-- Run against PearlOperations on Test SQL MI after each restore UPDATE ConfigStrings SET ConfigString = 'sk_test_xxx' WHERE ResourceKey = 'stripesecretkey'; UPDATE ConfigStrings SET ConfigString = 'sandbox_token' WHERE ResourceKey = 'gocardless_token'; UPDATE ConfigStrings SET ConfigString = 'sandbox.mailgun.org' WHERE ResourceKey = 'mailgun_domain'; UPDATE ConfigStrings SET ConfigString = '0' WHERE ResourceKey = 'sms_enabled'; UPDATE ConfigStrings SET ConfigString = 'sandbox_org_id' WHERE ResourceKey = 'genesys_org_id';
  • Swap every live payment key — All financial integrations must use sandbox credentials only.
  • Disable outbound messaging where appropriate — If the environment should never send real SMS or email, turn the feature off and verify the services obey the flag.
  • Repoint internal endpoints — Totem, Solr, storage buckets, and private services should all reference the test environment, not production.
  • Validate the final values — Run a query after the override script and compare the results against an approved list of sandbox settings.
5 Test Full Pipeline End-to-End

Run the complete cycle and prove it works: Backup → Download → Restore → Mask → Override → Validate. This is the proof step. Until the team completes one full refresh successfully, the process is still theory rather than an operational capability.

The first full test should be run like a rehearsal. One operator drives the process on VM3 while another person watches, takes notes, records timings, and confirms the evidence. That turns the first run into both a technical proof and a training exercise.

  • All 17 databases restored successfully
  • Spot-check 10 records per masked table — PII fields are anonymised
  • ConfigStrings point to sandbox endpoints
  • Total pipeline duration under 4 hours
  • Process documented in restore runbook
  • Measure each stage — Capture timings for backup creation, Blob upload, VM3 download, restore, masking, override, and validation.
  • Test the application after the refresh — A database restore only counts as successful if the websites and services can actually use the data afterward.
  • Document failure handling — Write down what to do if one backup is missing, if one database fails to restore, or if masking stops halfway through.
  • Finish the runbook immediately — Include the VM3 commands, log locations, notification recipients, and success checks while the process is still fresh in the team's memory.

Database Backup Video Walkthroughs

These are search-based video links rather than single fixed videos so your team can pick the most up-to-date walkthroughs. Azure screens and PowerShell modules change often, so current tutorials are more useful than a stale recording.

📚 Related Documentation

🏁 Phase 3 Checkpoint

Realistic but anonymised data refreshing weekly. Personal information is protected. The test environment always has fresh, safe data.

04

Phase 4 — CI/CD Pipeline (Automated Build & Deploy)

This is the phase that transforms "copy files to the server manually" into "push code to GitHub and it deploys automatically". This section is extra detailed because deployment is the most critical area to get right.

🚀

Phase 4 — The Most Important Phase for Your Team

This is where deployment automation lives. Extra detail provided below.

20h ~3 working days

Understanding CI/CD — The Big Picture

CI/CD stands for Continuous Integration / Continuous Deployment. Here's what each part means in plain English:

📦 Continuous Integration (CI) = "Auto-Build"

Every time a developer pushes code to GitHub, the system automatically compiles (builds) the code and checks for errors. If the build fails, the team knows immediately. For Pearl: CI means MSBuild compiles the ASP.NET Web Forms project and all 7 other components automatically.

🚀 Continuous Deployment (CD) = "Auto-Deploy"

After the code builds successfully, the system automatically deploys it to the test server. In Pearl's case, this means: stop the IIS websites → copy the new files to VM1 → restart → verify the site comes back up. If something goes wrong, it automatically rolls back to the previous version.

🔧 How It All Connects

Developer pushes code → GitHub Actions runs → Code compiles on VM3 → Artifacts deployed to VM1/VM2 → Health check verifies success → Done. The entire process takes minutes instead of hours of manual work, and it's repeatable, logged, and rollback-safe.

Essential CI/CD Concept Videos — Watch These First

DevOps CI/CD Explained in 100 Seconds
Fireship Essential Beginner
What is GitOps? How GitOps Works and Why It's Useful
TechWorld with Nana Intermediate
Azure DevOps Tutorial for Beginners | CI/CD with Azure Pipelines
TechWorld with Nana Intermediate

GitHub Actions Deep Dive

GitHub Actions is the automation engine inside GitHub. When you push code, it reads a YAML file (a recipe) in your repository and follows the instructions. Here's the vocabulary you need:

TermWhat It Means (Plain English)Pearl Example
WorkflowA recipe file (.yml) that tells GitHub what to do.github/workflows/build.yml
TriggerThe event that starts the workflowPush to main branch, or manual button click
JobA group of related steps that run on one machine"build" job, "deploy" job
StepA single command or action within a job"Checkout code", "Run MSBuild", "Copy files"
RunnerThe machine where the job executesVM3 (self-hosted runner)
Self-hosted runnerA runner YOU control (not GitHub's cloud)Needed because Pearl must build inside the private network
ArtifactThe output files from a build (compiled code)The .dll and .aspx files that get deployed
EnvironmentA target like "test" or "production" with protection rules"test" environment requiring 1 approval
SecretA password or key stored securely in GitHubVM connection details, Key Vault credentials

GitHub Actions Video Tutorials — Complete Learning Path

GitHub Actions Tutorial — Basic Concepts and CI/CD Pipeline
TechWorld with Nana Essential Beginner
GitHub Actions | From Zero to Hero in 90 Minutes (Environments, Secrets, Runners)
CoderDave Essential Intermediate
How GitHub Actions 10x My Productivity
Beyond Fireship Beginner
Automate Your Workflows with GitHub Actions
The Roadmap Intermediate

Terraform Deep Dive — Infrastructure as Code

Terraform is the tool that lets you create all the Azure infrastructure (VNets, VMs, databases, Key Vault) by writing text files instead of clicking around the Azure portal. This is called "Infrastructure as Code" (IaC).

💡 Why Use Terraform Instead of the Azure Portal?

Imagine building your Azure environment by clicking through the portal. Now imagine someone asks you to build an identical second environment. You'd have to remember every click, every setting, every rule. With Terraform, the entire environment is defined in code files — to build a second environment, you just run the same files with different parameters. It's repeatable, auditable, and version-controlled.

🔧 How Terraform Works — The Basics

1. Write: Create .tf files that describe your desired infrastructure (e.g., "I want a VNet with these subnets").
2. Plan: Run terraform plan — Terraform shows you what it WILL create/change (a preview).
3. Apply: Run terraform apply — Terraform actually creates the resources in Azure.
4. State: Terraform remembers what it created in a "state file" so it knows what already exists.

# Example: Creating an Azure Resource Group with Terraform # File: main.tf provider "azurerm" { features {} } resource "azurerm_resource_group" "pearl_hub" { name = "rg-pearl-test-hub" location = "uksouth" tags = { Environment = "Test" Project = "Pearl" } } resource "azurerm_virtual_network" "hub_vnet" { name = "vnet-pearl-hub" location = azurerm_resource_group.pearl_hub.location resource_group_name = azurerm_resource_group.pearl_hub.name address_space = ["10.1.0.0/16"] } # To deploy: terraform init → terraform plan → terraform apply

Terraform Video Tutorials — Complete Learning Path

Terraform in 100 Seconds
Fireship Essential Beginner
What is Infrastructure as Code? IaC Tools Compared
TechWorld with Nana Essential Beginner
Learn Terraform with Azure — Full Course for Beginners
freeCodeCamp Essential Intermediate
Complete Terraform Course — From BEGINNER to PRO!
DevOps Directive Intermediate
HashiCorp Terraform Associate Certification Course
freeCodeCamp Advanced
Learn Terraform by Building a Dev Environment — Full Course
freeCodeCamp Intermediate

Combining Everything — DevOps Crash Course

The Actual Build & Deploy Workflow — Step by Step

Now let's walk through exactly what you'll build in Phase 4, step by step.

1 Register the GitHub Actions Self-Hosted Runner on VM3

What is a self-hosted runner? GitHub Actions normally runs your code on GitHub's servers in the cloud. But Pearl needs to build inside your private Azure network (because the code needs access to NuGet packages and the network is firewalled). A self-hosted runner is software you install on VM3 that connects to GitHub and says "I'm ready to run jobs."

  1. Go to your GitHub repository → Settings → Actions → Runners → "New self-hosted runner"
  2. Select "Windows" and "x64"
  3. GitHub shows you a token — copy it. Remote desktop into VM3 via Bastion
  4. Download the runner package and run the configuration:
    # On VM3, in PowerShell (Administrator): # Create a folder for the runner mkdir C:\actions-runner ; cd C:\actions-runner # Download the latest runner (GitHub shows you the URL) Invoke-WebRequest -Uri https://github.com/actions/runner/releases/download/v2.XXX.X/actions-runner-win-x64-2.XXX.X.zip -OutFile actions-runner.zip # Extract Expand-Archive -Path actions-runner.zip -DestinationPath . # Configure — GitHub gives you the exact command with your token .\config.cmd --url https://github.com/YOUR-ORG/message-direct --token YOUR_TOKEN --name pearl-test-runner --labels self-hosted,windows,pearl-test # Install as a Windows service so it starts automatically .\svc.cmd install .\svc.cmd start
  5. Verify: go back to GitHub → Settings → Actions → Runners — your runner should show "Online" with a green dot

💡 Why Self-Hosted Instead of GitHub-Hosted?

GitHub's hosted runners run on GitHub's servers in the public cloud. They can't reach your private Azure VNet. Since Pearl's build tools, NuGet packages, and deployment targets are all inside the private network, we need a runner that lives inside that network. That's VM3.

2 Create the Build Workflow (.github/workflows/build.yml)

This is the recipe that tells GitHub how to compile Pearl's code. It's a YAML file that lives in your repository. Here's what it does:

# .github/workflows/build.yml name: Build Pearl on: push: branches: [main, develop, 'release/*'] # Triggers when code is pushed workflow_dispatch: # Also allows manual trigger jobs: build: runs-on: [self-hosted, windows, pearl-test] # Runs on VM3 steps: - name: Checkout code uses: actions/checkout@v4 # Downloads the code - name: Restore NuGet packages run: nuget restore Pearl.sln # Downloads dependencies - name: Build with MSBuild run: | msbuild Pearl.sln /p:Configuration=Release /p:Platform="Any CPU" # Compiles all projects - name: Package artifacts uses: actions/upload-artifact@v4 # Saves the compiled output with: name: pearl-build path: | pearl-azure/bin/ pearl-webservices/bin/ utility-server/bin/ queue-processor/bin/Release/ system-checker/bin/Release/ ai-spooler/bin/Release/ totem/bin/Release/

When a developer pushes code to the main, develop, or release/* branch, this workflow automatically:

  1. Downloads the latest code from GitHub
  2. Restores NuGet packages (third-party libraries Pearl depends on)
  3. Compiles all 7+ components using MSBuild
  4. Packages the compiled output as a "build artifact" that the deploy workflow can use
3 Create the Deploy Workflow (.github/workflows/deploy-test.yml)

This is the recipe that deploys compiled code to the test servers. It has a manual trigger and requires someone to click "Approve" before it runs — so nobody accidentally deploys.

# .github/workflows/deploy-test.yml name: Deploy to Test on: workflow_dispatch: # Manual trigger only workflow_run: # Or auto-trigger after build workflows: ["Build Pearl"] types: [completed] jobs: deploy: runs-on: [self-hosted, windows, pearl-test] environment: test # Requires approval (set up in GitHub) steps: - name: Download build artifacts uses: actions/download-artifact@v4 with: name: pearl-build # --- STEP 1: Create rollback backup --- - name: Backup current deployment run: | robocopy D:\apps\pearl-azure D:\apps\_rollback\pearl-azure /MIR /NFL /NDL robocopy D:\apps\pearl-webservices D:\apps\_rollback\pearl-webservices /MIR /NFL /NDL # --- STEP 2: Stop IIS sites --- - name: Stop IIS sites on VM1 run: | Invoke-Command -ComputerName 10.2.1.X -ScriptBlock { Stop-Website -Name 'pearl-azure' Stop-Website -Name 'pearl-webservices' } # --- STEP 3: Deploy new code --- - name: Copy web artifacts to VM1 run: | robocopy pearl-azure\bin\ \\10.2.1.X\D$\apps\pearl-azure\bin\ /MIR /NFL /NDL robocopy pearl-webservices\bin\ \\10.2.1.X\D$\apps\pearl-webservices\bin\ /MIR /NFL /NDL - name: Copy worker artifacts to VM2 run: | robocopy queue-processor\bin\Release\ \\10.2.2.X\D$\apps\queue-processor\ /MIR /NFL /NDL robocopy system-checker\bin\Release\ \\10.2.2.X\D$\apps\system-checker\ /MIR /NFL /NDL # --- STEP 4: Start everything back up --- - name: Start IIS sites run: | Invoke-Command -ComputerName 10.2.1.X -ScriptBlock { Start-Website -Name 'pearl-azure' Start-Website -Name 'pearl-webservices' } # --- STEP 5: Health check --- - name: Verify deployment run: | $response = Invoke-WebRequest -Uri http://10.2.1.X/login.aspx -UseBasicParsing if ($response.StatusCode -ne 200) { throw "Health check failed!" } Write-Host "✅ Deployment verified — login page responded HTTP 200"

⚠️ Setting Up the Approval Gate

In GitHub, go to Settings → Environments → New environment → name it "test" → check "Required reviewers" → add at least 1 person. Now every deploy requires someone to click "Approve" in GitHub before it runs. This prevents accidental deployments.

4 Create the DB Refresh Workflow

This workflow runs the PowerShell restore script from Phase 3 on a schedule (every Saturday) or manually when needed.

# .github/workflows/db-refresh.yml name: Refresh Test Database on: schedule: - cron: '0 6 * * 6' # Every Saturday at 06:00 UTC workflow_dispatch: # Manual trigger available jobs: refresh: runs-on: [self-hosted, windows, pearl-test] steps: - name: Run restore pipeline run: D:\restore-tools\Invoke-PearlRestore.ps1 - name: Notify on success if: success() run: Write-Host "✅ Database refresh completed successfully" - name: Notify on failure if: failure() run: Write-Host "❌ Database refresh FAILED — check logs"
5 Validate the Full CI/CD Pipeline

Before calling Phase 4 done, prove everything works reliably:

  • Test #1: Push code to develop branch → verify build workflow runs automatically and succeeds
  • Test #2: Trigger deploy workflow → approve → verify code appears on VM1 and VM2
  • Test #3: Access the Pearl login page via browser → verify HTTP 200
  • Test #4: Force a deployment failure (deploy broken code) → verify rollback restores the previous version
  • Test #5: Trigger database refresh → verify all 17 databases are restored and masked
  • Document all workflows, parameters, and troubleshooting steps in the operations runbook

🏁 Phase 4 Checkpoint

Code changes can be built, approved, deployed, and rolled back automatically. The entire process is logged in GitHub. No more manual file copying.

📚 Related Documentation

05

Phase 5 — Security, Compliance & Monitoring

Lock everything down, prove the estate is isolated from production, harden identity and secrets, establish audit evidence, and enforce monitoring and governance controls.

🔐

Phase 5 — Security & Compliance

Isolation verification, RBAC, Key Vault hardening, audit logging, monitoring, Azure Policy

56h 7 working days
1 Verify Complete Isolation from Production

This is the most critical security check. The test environment must have ZERO network connectivity to production. If a developer accidentally runs the wrong script, it must be impossible for it to reach production databases or services.

  • Confirm no VNet peering exists between test and production subscriptions
  • Run connectivity tests: ping/telnet from test VMs to production SQL MI → must fail
  • Review Azure Firewall logs: only allowed outbound destinations appearing
  • Document evidence that the environment remains private-only with no accidental production route
2 Set Up Role-Based Access Controls (RBAC)

RBAC is how you stop the wrong people having the wrong power. The goal here is to make sure infrastructure admins, QA users, and operations staff each get only the level of access they genuinely need.

  • Review and confirm RBAC assignments for infrastructure, QA, and operations teams
  • Validate least-privilege scope on the subscription and resource groups
  • Remove or reduce any unnecessary elevated access before sign-off
3 Harden Key Vault & Set Up Secret Rotation

Key Vault is the control point for secrets. This step ensures passwords, connection strings, and API keys are not scattered across servers or scripts and that the team has a clear process for rotating them safely.

  • Verify Key Vault protection settings, access boundaries, and managed identity usage
  • Confirm secrets are not stored in local configuration files or deployment scripts
  • Document the operational procedure for secret rotation and access review
  • Verify Key Vault access logs show only expected Managed Identity callers
4 Build the Audit Logging System

Audit logging is your evidence trail. If a secret was read, a deployment was triggered, or a privileged change was made, the environment should produce a reviewable record that tells you what happened, when it happened, and through which control path.

  • Verify all VM disks are encrypted (BitLocker via Azure Disk Encryption)
  • Confirm audit-relevant events are captured for infrastructure, access, deployment, and secret usage
  • Validate the audit evidence path is usable for investigation and review
  • Confirm retained logs and records are sufficient for security and operational traceability
5 Set Up Monitoring & Azure Policies

Monitoring tells you when something breaks; Azure Policy stops bad configurations from being created in the first place. Together they provide operational visibility and governance guardrails.

  • Create a Log Analytics Workspace and connect all VMs and Azure resources
  • Set up alerts: VM unavailable, SQL MI high DTU, failed deployment, unexpected firewall denials
  • Verify auto-shutdown schedules fire correctly at 19:00 UTC
  • Apply Azure Policy assignments to deny public IPs, require tags, and restrict resources to UK South
  • Complete the ISO 27001-aligned security controls checklist

Videos for Azure Security & Identity

Azure Active Directory Tutorial | Identity & Access Management
Adam Marczak Essential
Azure Key Vault — Secure Secrets, Keys & Certificates
Adam Marczak Intermediate

🏁 Phase 5 Checkpoint

Environment is hardened, auditable, and demonstrably isolated from production. Security controls documented and verified.

📚 Related Documentation

06

Phase 6 — Testing, Documentation & Handover

Prove everything works for real-world scenarios, write clear documentation, and hand over to the operations team.

✔️

Phase 6 — Validation & Handover

Smoke tests, documentation, recorded walkthroughs, team handover

56h 7 working days
1 Run Full Smoke Tests

"Smoke testing" means testing the most important things work. The name comes from electronics — if you plug something in and smoke comes out, you know there's a problem. We test the critical user journeys:

  • Operator login: Navigate to Pearl login page → authenticate → see the dashboard
  • Message capture: Create a test message via the operator screen → verify it appears in the database
  • Client portal: Log in as a test client → view messages → view rota schedule
  • Queue processor: Verify jobs are processing (check Process_MachineStates table)
  • System checker: Verify health checks running (check checkjobs table)
  • Totem: Verify long-poll registration works (/ping endpoint responds)
  • Integration safety: Verify Stripe calls use test mode (check API logs)
  • Integration safety: Verify no SMS messages are sent (check SMSSpoolOutgoing table)
  • Integration safety: Verify Mailgun uses sandbox domain (emails don't reach real people)
2 Write Operational Documentation

Create the documents the team will rely on after the build is complete:

  • Deployment guide: How to deploy code changes (step-by-step with screenshots)
  • Backup & restore guide: How to refresh test data manually when needed
  • Troubleshooting runbook: Common issues and how to fix them
  • Secret rotation procedure: How to rotate passwords and API keys
  • VM start/stop guide: How to turn VMs on/off for cost savings
  • Architecture diagram: Final diagram with actual IPs, hostnames, and component placement
  • Recorded walkthrough video: Screen recording of the full deploy cycle (build → approve → deploy → verify)
3 Handover to Operations Team

The formal transfer of ownership. After this, the operations team runs the environment independently.

  1. Live walkthrough session with the operations team (deployment, restore, rollback, troubleshooting)
  2. Record the session for future reference
  3. Confirm named owners and support responsibilities
  4. Formal handover acceptance sign-off

🏁 Phase 6 Checkpoint — Project Complete

Environment validated, documented, and handed over. The operations team is self-sufficient. The test environment is live and operational.

📚 Related Documentation

Summary & Completion Checklist

Everything you need to complete this project, in one checklist. Each line maps back to a specific acceptance criterion from the original RFP.

#Acceptance CriteriaEvidence RequiredPhase
1Environment provisioned from documented automationTerraform templates / PowerShell scripts in repoPhase 2
2Private-only, isolated from productionNetwork test results, no VNet peering proofPhase 5
3App components deploy, key user journeys workSmoke test results logPhase 6
4Integrations safely sandboxedAPI call logs showing test/sandbox endpointsPhase 6
5Test data load/refresh works end-to-endDB refresh execution logPhase 3
6Release process supports approvals + rollback2+ successful deploys + 1 rollback evidencePhase 4
7Documentation enables internal operationRunbook set + walkthrough recordingPhase 6

All Video References Index

Quick reference of every video recommended in this guide, organised by topic:

CategoryVideo TitleChannelLevel
AzureAZ-900 Microsoft Azure Fundamentals Full CourseAdam MarczakBeginner
AzureAzure Full Course — 8 HoursEdurekaBeginner
AzureAZ-900 Networking (VNet, VPN, Load Balancer)Adam MarczakBeginner
AzureAzure Key Vault TutorialAdam MarczakIntermediate
AzureAzure Storage Tutorial (Blob, Queue, Table)Adam MarczakIntermediate
AzureAzure Active Directory / Entra ID TutorialAdam MarczakIntermediate
DevOpsWhat is DevOps? REALLY Understand ItTechWorld with NanaBeginner
DevOpsCI/CD Explained in 100 SecondsFireshipBeginner
DevOpsDevOps Prerequisites CoursefreeCodeCampBeginner
DevOpsDevOps Crash Course (Docker, Terraform, GitHub Actions)Traversy MediaIntermediate
IaCWhat is Infrastructure as Code?TechWorld with NanaBeginner
IaCWhat is GitOps?TechWorld with NanaIntermediate
TerraformTerraform in 100 SecondsFireshipBeginner
TerraformLearn Terraform with Azure — Full CoursefreeCodeCampIntermediate
TerraformComplete Terraform Course (Beginner to Pro)DevOps DirectiveIntermediate
TerraformHashiCorp Terraform Certification CoursefreeCodeCampAdvanced
TerraformLearn Terraform by Building a Dev EnvironmentfreeCodeCampIntermediate
GitHub ActionsGitHub Actions — Basic Concepts & CI/CD PipelineTechWorld with NanaBeginner
GitHub ActionsGitHub Actions Zero to Hero (90 min)CoderDaveIntermediate
GitHub ActionsHow GitHub Actions 10x ProductivityBeyond FireshipBeginner
GitHub ActionsAutomate Workflows with GitHub ActionsThe RoadmapIntermediate
CI/CDAzure DevOps Tutorial for BeginnersTechWorld with NanaIntermediate
PowerShellPowerShell Master Class — FundamentalsJohn SavillBeginner
SQLSQL Course for Beginners (Full Course)Programming with MoshBeginner

📌 What to Do Next

1. Complete all prerequisites (accounts, access, measurements).
2. Watch the foundation videos (Azure, CI/CD concepts, GitHub Actions basics).
3. Begin Phase 2 — start with networking and kick off SQL MI provisioning early.
4. Work through each phase in order — don't skip ahead.
5. Use this guide as a reference throughout the build. Come back to the relevant section whenever you need a refresher.

← Back to Architecture Proposal