Pearl Test Environment — Complete Implementation Guide
A comprehensive, step-by-step reference with curated video tutorials for every phase of building the Pearl test environment on Azure. Designed for teams new to Azure, CI/CD, GitHub Actions, and Terraform.
Prerequisites — Before You Start Anything
Before Phase 2 begins, these items must be completed. Think of these as the "tickets to enter" — without them, the build will stall immediately.
Administrative Prerequisites
These are the business and access approvals your team needs before touching any Azure resources. Each one prevents a blocker later.
- Phase 1 proposal approved by technical heads — This is the green light. Without formal sign-off, don't start spending on Azure.
- Azure subscription created for the test environment — This must be a separate subscription from production. Think of it like having a separate bank account for test expenses.
- Azure Active Directory / Entra ID group created — This controls who can access the test environment. Create a security group called something like "Pearl-Test-Team".
- GitHub repository access confirmed — Everyone on the team needs push access to the repo. The CI/CD pipeline (GitHub Actions) needs a registration token.
- Telerik RadControls license verified — Pearl uses Telerik components. Confirm the licence covers non-production use.
- Genesys Cloud CX sandbox org credentials obtained — Contact Genesys to get a sandbox environment. This prevents test calls from reaching real customers.
Technical Measurements Required
Before we build the test database, we need to know how big the current databases actually are. You wouldn't move furniture into a new flat without measuring the doorways first — same principle.
- Measure all 17 production database sizes — Run
EXEC sp_spaceusedon each database. Write down the total size. This tells us how much storage to provision on the SQL Managed Instance. - Document the production SQL MI backup schedule — Note retention period and when backups happen. We'll mirror this process for test data refreshes.
- List all IIS site bindings on pearl3/pearl4/pearlinternal — Document every site name, port, protocol, and app pool. We'll recreate these exactly on VM1.
- List all Windows services on the production worker machines — Document each service name, executable path, startup type, and run-as account.
- List Memcached cache key prefixes currently in use — This helps us configure Memcached correctly on the test web server.
Accounts & Access You'll Need
Azure Portal Access
An Azure account with at least Contributor role on the test subscription. You'll use portal.azure.com for most provisioning.
GitHub Account
Admin access to the repository. Needed to register self-hosted runners and create environment protection rules.
Stripe Test Keys
Stripe test-mode API keys (sk_test_xxx). Available from the Stripe dashboard under Developers → API keys.
Mailgun Sandbox
Mailgun sandbox domain and API key. All test emails route through the sandbox so no real emails are sent.
GoCardless Sandbox
GoCardless sandbox token. Prevents any real Direct Debit mandates from being created.
Genesys Sandbox
Genesys Cloud CX sandbox org ID. Test telephony routing without touching real phone lines.
⚠️ Don't Skip the Prerequisites
The most common reason test environment builds stall is missing prerequisites. Azure subscription creation can take days if your organisation requires approval. Sandbox API keys from third parties can take a week or more. Start collecting these now, even before Phase 2 begins.
📚 Related Documentation
- Azure subscription management documentation — Good for understanding subscription ownership, access boundaries, and operational control.
- Azure budgets and cost alerts — Useful when setting up the monthly budget guardrail for the test estate.
- Azure RBAC overview — Explains how Contributor, Reader, and least-privilege access should work.
- Microsoft Entra group management — Useful when creating and maintaining the team access group.
- GitHub repository access management — Relevant for making sure the right people and automation accounts can use the repo.
- Identity and access design guidance — Helpful when deciding how to separate environments and access patterns.
Recommended Learning Path
If you're new to Azure, CI/CD, and Infrastructure as Code, follow this learning path before starting the build. Each step builds on the previous one.
Understand Cloud & Azure Basics (2–3 hours)
What is a subscription, resource group, region, and VNet? Start with the AZ-900 and Azure overview videos below. Don't worry about memorising everything — just get familiar with the vocabulary.
Learn What CI/CD Means (1 hour)
Understand "Continuous Integration" (auto-building code) and "Continuous Deployment" (auto-pushing it to servers). Watch the CI/CD and DevOps concept videos. You don't need to be an expert — just understand the flow.
Learn GitHub Actions (2–3 hours)
This is YOUR deployment tool. Understand workflows, jobs, steps, runners, and secrets. The TechWorld with Nana and CoderDave tutorials are excellent starting points.
Learn Terraform Basics (3–4 hours)
Terraform lets you describe your entire Azure infrastructure in code files. Instead of clicking through the Azure portal, you write a file that says "create a VNet, create a VM" and Terraform builds it. The freeCodeCamp Terraform + Azure course is perfect.
Combine It All — IaC + CI/CD (2 hours)
Watch the Traversy Media DevOps crash course to see how Docker, Terraform, and GitHub Actions work together. This connects the dots between everything you've learned.
📚 Related Documentation
- Microsoft Azure Fundamentals learning path — Best official starting point for Azure terminology and core services.
- Azure Well-Architected Framework: Operational Excellence — Helps frame why repeatability, runbooks, and automation matter.
- GitHub Actions documentation hub — Official reference for workflows, runners, secrets, and deployments.
- Understanding GitHub Actions — Good companion to the CI/CD learning steps.
- Terraform introduction — Covers the basic mental model of infrastructure as code.
- Terraform on Azure getting started guide — Useful once the team moves from concepts into Azure-specific Terraform work.
Foundation Videos — Watch These First
These videos cover the core technologies used in this project. Watch them before diving into any phase-specific work. They're ordered from easiest to most advanced.
Azure Cloud Fundamentals
If you've never used Azure before, start here. These videos teach you what Azure is, how resources are organised, and how networking works in the cloud.
DevOps & CI/CD Concepts
Before learning specific tools, understand the why. What is DevOps? What does CI/CD actually mean? Why do we automate deployments instead of copying files manually?
PowerShell Fundamentals
You'll use PowerShell extensively for configuring Windows servers, running database scripts, and automating tasks. If you've never used PowerShell, this is essential viewing.
📚 Related Documentation
- Azure Resource Manager overview — Explains how Azure resources, groups, and deployments are organised.
- Azure virtual network overview — Good background reading before the networking sections.
- What is DevOps? — Plain-language Microsoft explanation of DevOps and delivery flow.
- Introduction to GitHub Actions — Good written companion to the video set.
- Terraform language documentation — Useful once the team begins reading or writing `.tf` files.
- PowerShell scripting overview — Useful because PowerShell is used heavily on the Windows VMs and automation paths.
Phase 2 — Build the Cloud Infrastructure
This is the foundation. We're building the Azure cloud infrastructure from scratch — a secure private network, three Windows servers, a database, and the "plumbing" that connects everything.
💡 What You're Building (In Plain English)
Imagine building a secure office building. First, you build the walls and security gates (networking). Then you set up the rooms (VMs). Then you install the filing cabinets (database). Then you put locks on the doors and a key cabinet (Key Vault). That's Phase 2.
Step 2.1–2.4: Networking & Security Foundation
What this is: Creating isolated private networks in Azure. Think of a VNet (Virtual Network) as a private office building. A subnet is a floor in that building. A Network Security Group (NSG) is the security guard who checks badges at each floor. Azure Bastion is a secure front door that lets admins in without exposing the building to the public street. Azure Firewall is the security checkpoint that controls what goes out.
A subscription is like a billing account — all resources you create go inside it. Resource groups are folders that organise related resources together. We create three: one for networking (hub), one for application servers (spoke), and one for database resources (data).
- Subscription created with budget alert at £800/month (prevents surprise bills)
- Resource groups created with proper tags (so you can track costs and ownership)
- RBAC roles assigned: Contributor for infra team, Reader for QA team
The Hub VNet is the "security checkpoint" network. It contains Azure Bastion (your secure remote desktop gateway — you connect to VMs through this instead of exposing them to the internet) and Azure Firewall (controls what outbound traffic is allowed).
Then deploy Azure Bastion (Standard SKU) and Azure Firewall (Standard SKU). Configure firewall rules to allow outbound access to GitHub, NuGet, Azure Blob Storage, and sandbox API endpoints only. Block everything else by default.
- Hub VNet created (10.1.0.0/16)
- Azure Bastion deployed — you can now RDP to VMs securely
- Azure Firewall deployed with outbound rules for GitHub, NuGet, Blob Storage, and sandbox APIs
The Spoke VNet is where your actual application servers live. We divide it into subnets (floors) so the web server, worker server, build server, and database are each isolated.
- Spoke VNet created with 4 subnets
- VNet peering established between Hub ↔ Spoke
- Route table created: all internet traffic (0.0.0.0/0) → Azure Firewall
NSGs are like security guards at each floor. They check: "Is this traffic allowed to come in / go out of this subnet?" We create one NSG per subnet with rules that only allow the exact traffic Pearl needs.
| NSG | Key Inbound Rules | Key Outbound Rules |
|---|---|---|
| nsg-snet-web | Worker→Web (80,443), Build→Web (80,443), Bastion→RDP (3389) | Web→DB (1433), Web→Worker (8080), Web→Internet (443 via FW) |
| nsg-snet-worker | Web→Worker (8080), Bastion→RDP (3389) | Worker→Web (80,443), Worker→DB (1433) |
| nsg-snet-build | Bastion→RDP (3389) | Build→Web (80,443), Build→Worker (*), Build→DB (1433), Build→Internet (443 via FW) |
- One NSG created and attached per subnet
- Default deny — only explicitly allowed traffic passes
- Verified: VMs cannot be accessed directly from the internet
Videos for Networking & Azure Infrastructure
Step 2.5–2.7: Core Services (Key Vault, Storage, SQL MI)
Key Vault is like a secure lockbox for passwords and API keys. Instead of storing database passwords in configuration files (insecure!), we store them in Key Vault and let the servers retrieve them securely using their managed identities.
- Create Key Vault:
kv-pearl-testin rg-pearl-test-spoke - Enable soft-delete (accidentally deleted secrets can be recovered) and purge protection (prevents permanent deletion for 90 days)
- Add all the secrets: 17 database connection strings, Stripe test key, GoCardless sandbox token, Mailgun sandbox key, Genesys sandbox org ID, S3 test bucket credentials
💡 Why Not Just Put Passwords in Config Files?
Config files can be accidentally committed to Git, read by anyone with server access, or leaked in error messages. Key Vault centralises secrets, provides access logging (who read what, when), and allows easy rotation without touching server configs.
Azure Blob Storage is like a cloud hard drive. We use it to store database backup files, restore scripts, and deployment artifacts. Create a storage account called stpearltest with three containers: backup (holds DB backups from production), restore (processed files), and scripts (automation scripts).
- Storage account created (LRS redundancy, Hot tier, UK South)
- Lifecycle management: auto-delete backups older than 28 days
- Encryption at rest enabled (Microsoft-managed keys)
SQL Managed Instance is a fully-managed SQL Server in the cloud. It's compatible with on-premises SQL Server features (which Pearl relies on heavily) but Azure handles patching, backups, and high availability for you. This is where all 17 databases will live.
⚠️ SQL MI Takes 4–6 Hours to Provision
This is the longest single provisioning step. Start this early in the day. Kick off the provisioning, then work on other tasks while Azure sets it up. Configuration: General Purpose tier, 4 vCores, 256 GB storage, public endpoint DISABLED, placed in the snet-data subnet.
Videos for Key Vault, Storage & SQL
Step 2.8–2.11: Server Build-Out (3 VMs)
Now we build the three Windows Servers. Each has a specific role. Think of them as three employees in the office, each with a distinct job description.
This is the "front desk". It runs IIS (Internet Information Services — Microsoft's web server), hosts the Pearl web application, internal web services, Memcached (a caching layer), and Apache Solr (the search engine). Operators and clients interact with Pearl through this server.
- Provision VM: Standard_D4s_v5 (4 vCPU, 16 GB RAM), Windows Server 2022 Datacenter
- Place in snet-web subnet, NO public IP (accessed only via Bastion)
- Harden OS: enable Windows Update, disable unnecessary features, enable disk encryption (BitLocker), configure Windows Firewall to match NSG rules
- Install IIS with ASP.NET 4.5 support:
Install-WindowsFeature Web-Server, Web-Asp-Net45, Web-ISAPI-Ext, Web-ISAPI-Filter, Web-Mgmt-Console, Web-Http-Redirect
- Create 5 IIS sites with correct bindings:
| Site | Path | Binding | App Pool |
|---|---|---|---|
| pearl-azure | D:\apps\pearl-azure | http:80, https:443 | PearlAppPool (.NET 4.0, Integrated) |
| pearl-webservices | D:\apps\pearl-webservices | http:8081 | PearlWSAppPool (.NET 4.0, Integrated) |
| utility-payments | D:\apps\utility-server\payments | http:8082 | UtilityAppPool (.NET 4.0) |
| utility-xero | D:\apps\utility-server\xero | http:8083 | UtilityAppPool |
| utility-reporting | D:\apps\utility-server\reporting | http:8084 | UtilityAppPool |
- Install and configure Memcached as a Windows service on port 11211
- Install Java Runtime + Apache Solr as a Windows service on port 8983
- Add hosts file entries pointing to internal IPs
- Configure auto-shutdown at 19:00 UTC (saves costs since this is test-only)
This is the "back office". It runs 4 Windows services that do background processing — queue processing (executing scheduled jobs), health monitoring, AI quality scoring, and real-time notifications (Totem). These run 24/7 and don't have a visible web interface.
- Provision VM: Standard_D2s_v5 (2 vCPU, 8 GB RAM), Windows Server 2022
- Install .NET Framework 3.5 (for Totem — it's built on an older .NET version) and .NET 4.8.1 (for ai-spooler)
- Create directory structure and register 4 Windows services:
sc create QueueProcessor binPath="D:\apps\queue-processor\MessageQueueProcessor.exe" start=auto sc create SystemChecker binPath="D:\apps\system-checker\SystemChecker.exe" start=auto sc create AISpooler binPath="D:\apps\ai-spooler\AISpooler.exe" start=auto sc create TotemServer binPath="D:\apps\totem\Totem2.exe" start=auto
- Add hosts file entries mapping internal hostnames to IPs
- Configure auto-shutdown at 19:00 UTC
This is the "workshop". It compiles code (MSBuild), runs the GitHub Actions runner (which automates deployments), and hosts the database restore tooling. No end-users ever interact with this server.
- Provision VM: Standard_D2s_v5, Windows Server 2022
- Install Visual Studio Build Tools 2022 (with MSBuild 17, .NET 4.8 and 3.5 targeting packs, NuGet CLI)
- Install Git for Windows
- Install and configure GitHub Actions self-hosted runner (detailed in Phase 4)
- Create restore tools directory:
D:\restore-tools\ - Configure auto-shutdown at 19:00 UTC
Managed Identity is like giving a VM its own ID card. Instead of storing Key Vault passwords on the VM (which defeats the purpose), the VM uses its identity to authenticate with Key Vault automatically. Azure handles this behind the scenes — no passwords needed.
- Enable System-assigned Managed Identity on VM1, VM2, VM3
- Grant Key Vault access: VM1 gets Secret Read, VM2 gets Secret Read, VM3 gets Secret Read + Storage Contributor
- Test: RDP into each VM, try to read a secret from Key Vault using PowerShell — it should work without providing any credentials
🏁 Phase 2 Checkpoint
Three servers running, database online, network secured — but no application code deployed yet. You should be able to RDP to any VM via Bastion, VMs can reach the database, and the firewall is blocking unallowed outbound traffic.
📚 Related Documentation
- Create a virtual network in the Azure portal — Good for the basic VNet and subnet workflow.
- Network security groups overview — Useful for designing and validating the NSG rules.
- Azure Firewall overview — Explains rules, routing, and central outbound control.
- Azure Bastion overview — Relevant for secure VM administration without public RDP exposure.
- Azure Key Vault overview — Useful for understanding secret storage and retrieval patterns.
- Azure Blob Storage introduction — Important for backup files, restore assets, and operational scripts.
- Azure SQL Managed Instance overview — Covers the SQL platform this design depends on.
- Create a Windows VM in Azure — Useful for the VM1, VM2, and VM3 build steps.
- Managed identities for Azure resources — Relevant for VM-to-Key Vault and VM-to-Storage authentication.
Phase 3 — Database Backup, Restore & Masking
Copy production data safely, remove all personal information, and automate weekly refreshes so the test environment always has realistic (but anonymised) data.
💡 Why Not Just Use Production Data Directly?
Production databases contain real customer names, phone numbers, email addresses, bank details, and payment card references. Using this in a test environment violates GDPR and ISO 27001. We must mask (replace with fake data) all personal information before anyone touches the test databases.
Set up production SQL MI to write weekly backups of all 17 databases to the Azure Blob Storage container we created in Phase 2. In simple terms, this means production creates a sealed copy of each database and places it in a secure cloud storage location so the test environment can work from a copy instead of touching the live system.
Think of Azure Blob Storage as a locked filing cabinet in the cloud. Production places a fresh pack of database copies into that cabinet each week. The test environment only takes copies from the cabinet. It never pulls data straight out of the live system. That separation makes the process safer, easier to audit, and much easier to recover when something goes wrong.
Start by testing one database manually before scheduling all 17. Confirm the backup file lands in the correct Blob container, confirm the file size looks sensible, and confirm the job history reports success. Once the first test works, repeat the pattern for every database using a clear naming format such as DatabaseName_YYYYMMDD.bak.
- Create the Blob credential first — SQL MI needs permission to write the backup files into Azure Blob Storage. If the SAS token or credential is wrong, the job fails immediately.
- Use a naming standard — Include the database name and date in every file name so the team can immediately tell which backup set belongs to which weekly refresh.
- Schedule during a quiet window — Saturday 02:00 UTC is a low-risk time that reduces pressure on production while still giving the test environment fresh data.
- Keep compression and checksum enabled — Compression makes the files smaller and quicker to move. Checksum adds an integrity check so damage is easier to detect early.
- Record evidence every week — Save file sizes, start and finish times, and job history. This proves the source data was created correctly before any restore begins.
- SQL credential created on production MI for Blob SAS token
- Weekly backup scheduled: Saturday 02:00 UTC
- Initial test backup completed — record actual file sizes
🧭 What the Operator Actually Does
Connect to the production managed instance with SQL Server Management Studio or Azure Data Studio, create the Blob credential, run one manual test backup, then convert that logic into a scheduled SQL Agent job. After the job finishes, open the Azure portal and verify the file exists in the backup container before moving on.
A PowerShell script on VM3 automates the entire refresh process: download backups from Blob → restore to test SQL MI → run masking → validate → send notification. This script runs every Saturday and takes about 2–4 hours to complete. VM3 is the correct place to run this because it sits inside the private Azure network and can reach Blob Storage, SQL MI, and the rest of the internal environment safely.
In layman's terms, VM3 is the control room for the entire refresh. Instead of an engineer doing dozens of repetitive manual steps every weekend, the script performs the same sequence every time and writes down exactly what happened. That makes the process supportable and far less dependent on memory.
- Connect through Azure Bastion — Operators should log into VM3 through Bastion, then open PowerShell as Administrator so the machine can manage modules, scripts, folders, and scheduled tasks properly.
- Keep the folder structure obvious — Separate downloads, logs, scripts, and notifications. During an incident, clarity matters more than elegance.
- Write human-readable logs — Each stage should clearly say which database is being downloaded, restored, masked, or validated so non-developers can follow progress.
- Fail loudly — If one backup file is missing or one database restore fails, stop and report the exact failure rather than silently continuing with partial data.
- Support a manual rerun — The runbook should tell an operator exactly how to log into VM3 and rerun the job with a date parameter when a scheduled run fails.
🖥️ Running It Manually on the VM
Connect to VM3, open an elevated PowerShell window, move to D:\restore-tools, and run the restore script with the backup date you want to use. Watch the log output. If one database is taking far longer than expected, stop guessing and check the log file, SQL restore status, free disk space, and Blob download completion before rerunning anything.
Masking = replacing real data with fake but realistic-looking data. "John Smith" becomes "Test User 12345". "07700 900000" becomes "00000 000000". Real email addresses become "masked_123@test.pearl". These scripts run automatically after every restore. This is the step that turns a risky production copy into something safe enough for a test environment.
The important idea is that the data still needs to behave like real data even after it is anonymised. Testers still need to search it, filter it, and run workflows against it. The system should still feel realistic, but nobody should be able to identify a real customer, caller, user, or payment detail from what they see.
| Script | What Gets Masked |
|---|---|
| mask-PearlData.sql | Caller names, phone numbers, email addresses, physical addresses |
| mask-PearlUsers.sql | User names, emails, phone numbers, company contacts, password hashes |
| mask-PearlBilling.sql | Customer names on invoices, bank details, card references |
| mask-SMSBroadcast.sql | Mobile numbers, SMS message body text |
| mask-Messages.sql | Caller names, phone numbers, message text content |
| mask-PearlLog.sql | Truncate/replace PII in audit log text fields |
- Mask the highest-risk fields first — Names, phone numbers, emails, addresses, bank details, card references, message bodies, and free-text notes are the main danger areas.
- Preserve relationships where needed — If one person appears in multiple tables, the replacement values should still line up logically so testing remains useful.
- Reset access safely — Password hashes, service credentials, and admin access should be replaced with non-production values the team controls.
- Do manual spot checks — Review at least ten records in each important table after the first masking run to prove no real personal data remains.
🔍 What Success Looks Like
The environment contains believable but fake customer and message data. Testers can do their work, but they cannot discover a real phone number, real email address, or real payment reference anywhere in the restored estate.
After restoring and masking, override configuration values in the database so the test environment cannot accidentally call live services. This is the safety net that prevents test environments from sending real SMS messages, using live email domains, charging real cards, or talking to live telephony and payment systems.
This matters because many applications store live settings inside database tables, not just inside config files. If you restore a production database and do nothing else, the application can still contain live API keys, live URLs, or live feature flags. This step deliberately swaps those values for safe sandbox or disabled settings.
- Swap every live payment key — All financial integrations must use sandbox credentials only.
- Disable outbound messaging where appropriate — If the environment should never send real SMS or email, turn the feature off and verify the services obey the flag.
- Repoint internal endpoints — Totem, Solr, storage buckets, and private services should all reference the test environment, not production.
- Validate the final values — Run a query after the override script and compare the results against an approved list of sandbox settings.
Run the complete cycle and prove it works: Backup → Download → Restore → Mask → Override → Validate. This is the proof step. Until the team completes one full refresh successfully, the process is still theory rather than an operational capability.
The first full test should be run like a rehearsal. One operator drives the process on VM3 while another person watches, takes notes, records timings, and confirms the evidence. That turns the first run into both a technical proof and a training exercise.
- All 17 databases restored successfully
- Spot-check 10 records per masked table — PII fields are anonymised
- ConfigStrings point to sandbox endpoints
- Total pipeline duration under 4 hours
- Process documented in restore runbook
- Measure each stage — Capture timings for backup creation, Blob upload, VM3 download, restore, masking, override, and validation.
- Test the application after the refresh — A database restore only counts as successful if the websites and services can actually use the data afterward.
- Document failure handling — Write down what to do if one backup is missing, if one database fails to restore, or if masking stops halfway through.
- Finish the runbook immediately — Include the VM3 commands, log locations, notification recipients, and success checks while the process is still fresh in the team's memory.
Database Backup Video Walkthroughs
These are search-based video links rather than single fixed videos so your team can pick the most up-to-date walkthroughs. Azure screens and PowerShell modules change often, so current tutorials are more useful than a stale recording.
- Azure SQL Managed Instance backup and restore walkthroughs — Focus on backup strategy, restore options, and SQL tooling.
- SQL Server backup and restore with PowerShell — Useful for learning how the scripted restore flow behaves.
- Azure Blob Storage upload and download with PowerShell — Helpful for the backup file transfer part of the process.
- Windows Server PowerShell basics for operators — Good for team members who will run the refresh from VM3.
- Azure VM administration with Windows Server and PowerShell — Covers server-side checks, scripting, and troubleshooting on the VM itself.
- SQL Server data masking walkthroughs — Useful when building or reviewing the anonymisation scripts.
📚 Related Documentation
- SQL Server backup to URL — Relevant for writing backups straight to Azure Blob Storage.
- Restore a SQL Server database backup — Useful reference for the restore side of the flow.
- Install-Module documentation — Helpful when preparing PowerShell modules on VM3.
- Az.Storage PowerShell module reference — Covers the commands used to interact with Blob Storage from the VM.
- SQL Server PowerShell module documentation — Useful for scripted SQL execution and administration tasks.
- PowerShell scripts documentation — Good for operators who need to understand how the restore script is structured and executed.
- SQL Server dynamic data masking overview — Helpful background reading when planning masking rules and validation.
🏁 Phase 3 Checkpoint
Realistic but anonymised data refreshing weekly. Personal information is protected. The test environment always has fresh, safe data.
Phase 4 — CI/CD Pipeline (Automated Build & Deploy)
This is the phase that transforms "copy files to the server manually" into "push code to GitHub and it deploys automatically". This section is extra detailed because deployment is the most critical area to get right.
Understanding CI/CD — The Big Picture
CI/CD stands for Continuous Integration / Continuous Deployment. Here's what each part means in plain English:
📦 Continuous Integration (CI) = "Auto-Build"
Every time a developer pushes code to GitHub, the system automatically compiles (builds) the code and checks for errors. If the build fails, the team knows immediately. For Pearl: CI means MSBuild compiles the ASP.NET Web Forms project and all 7 other components automatically.
🚀 Continuous Deployment (CD) = "Auto-Deploy"
After the code builds successfully, the system automatically deploys it to the test server. In Pearl's case, this means: stop the IIS websites → copy the new files to VM1 → restart → verify the site comes back up. If something goes wrong, it automatically rolls back to the previous version.
🔧 How It All Connects
Developer pushes code → GitHub Actions runs → Code compiles on VM3 → Artifacts deployed to VM1/VM2 → Health check verifies success → Done. The entire process takes minutes instead of hours of manual work, and it's repeatable, logged, and rollback-safe.
Essential CI/CD Concept Videos — Watch These First
GitHub Actions Deep Dive
GitHub Actions is the automation engine inside GitHub. When you push code, it reads a YAML file (a recipe) in your repository and follows the instructions. Here's the vocabulary you need:
| Term | What It Means (Plain English) | Pearl Example |
|---|---|---|
| Workflow | A recipe file (.yml) that tells GitHub what to do | .github/workflows/build.yml |
| Trigger | The event that starts the workflow | Push to main branch, or manual button click |
| Job | A group of related steps that run on one machine | "build" job, "deploy" job |
| Step | A single command or action within a job | "Checkout code", "Run MSBuild", "Copy files" |
| Runner | The machine where the job executes | VM3 (self-hosted runner) |
| Self-hosted runner | A runner YOU control (not GitHub's cloud) | Needed because Pearl must build inside the private network |
| Artifact | The output files from a build (compiled code) | The .dll and .aspx files that get deployed |
| Environment | A target like "test" or "production" with protection rules | "test" environment requiring 1 approval |
| Secret | A password or key stored securely in GitHub | VM connection details, Key Vault credentials |
GitHub Actions Video Tutorials — Complete Learning Path
Terraform Deep Dive — Infrastructure as Code
Terraform is the tool that lets you create all the Azure infrastructure (VNets, VMs, databases, Key Vault) by writing text files instead of clicking around the Azure portal. This is called "Infrastructure as Code" (IaC).
💡 Why Use Terraform Instead of the Azure Portal?
Imagine building your Azure environment by clicking through the portal. Now imagine someone asks you to build an identical second environment. You'd have to remember every click, every setting, every rule. With Terraform, the entire environment is defined in code files — to build a second environment, you just run the same files with different parameters. It's repeatable, auditable, and version-controlled.
🔧 How Terraform Works — The Basics
1. Write: Create .tf files that describe your desired infrastructure (e.g., "I want a VNet with these subnets").
2. Plan: Run terraform plan — Terraform shows you what it WILL create/change (a preview).
3. Apply: Run terraform apply — Terraform actually creates the resources in Azure.
4. State: Terraform remembers what it created in a "state file" so it knows what already exists.
Terraform Video Tutorials — Complete Learning Path
Combining Everything — DevOps Crash Course
The Actual Build & Deploy Workflow — Step by Step
Now let's walk through exactly what you'll build in Phase 4, step by step.
What is a self-hosted runner? GitHub Actions normally runs your code on GitHub's servers in the cloud. But Pearl needs to build inside your private Azure network (because the code needs access to NuGet packages and the network is firewalled). A self-hosted runner is software you install on VM3 that connects to GitHub and says "I'm ready to run jobs."
- Go to your GitHub repository → Settings → Actions → Runners → "New self-hosted runner"
- Select "Windows" and "x64"
- GitHub shows you a token — copy it. Remote desktop into VM3 via Bastion
- Download the runner package and run the configuration:
# On VM3, in PowerShell (Administrator): # Create a folder for the runner mkdir C:\actions-runner ; cd C:\actions-runner # Download the latest runner (GitHub shows you the URL) Invoke-WebRequest -Uri https://github.com/actions/runner/releases/download/v2.XXX.X/actions-runner-win-x64-2.XXX.X.zip -OutFile actions-runner.zip # Extract Expand-Archive -Path actions-runner.zip -DestinationPath . # Configure — GitHub gives you the exact command with your token .\config.cmd --url https://github.com/YOUR-ORG/message-direct --token YOUR_TOKEN --name pearl-test-runner --labels self-hosted,windows,pearl-test # Install as a Windows service so it starts automatically .\svc.cmd install .\svc.cmd start
- Verify: go back to GitHub → Settings → Actions → Runners — your runner should show "Online" with a green dot
💡 Why Self-Hosted Instead of GitHub-Hosted?
GitHub's hosted runners run on GitHub's servers in the public cloud. They can't reach your private Azure VNet. Since Pearl's build tools, NuGet packages, and deployment targets are all inside the private network, we need a runner that lives inside that network. That's VM3.
This is the recipe that tells GitHub how to compile Pearl's code. It's a YAML file that lives in your repository. Here's what it does:
When a developer pushes code to the main, develop, or release/* branch, this workflow automatically:
- Downloads the latest code from GitHub
- Restores NuGet packages (third-party libraries Pearl depends on)
- Compiles all 7+ components using MSBuild
- Packages the compiled output as a "build artifact" that the deploy workflow can use
This is the recipe that deploys compiled code to the test servers. It has a manual trigger and requires someone to click "Approve" before it runs — so nobody accidentally deploys.
⚠️ Setting Up the Approval Gate
In GitHub, go to Settings → Environments → New environment → name it "test" → check "Required reviewers" → add at least 1 person. Now every deploy requires someone to click "Approve" in GitHub before it runs. This prevents accidental deployments.
This workflow runs the PowerShell restore script from Phase 3 on a schedule (every Saturday) or manually when needed.
Before calling Phase 4 done, prove everything works reliably:
- Test #1: Push code to
developbranch → verify build workflow runs automatically and succeeds - Test #2: Trigger deploy workflow → approve → verify code appears on VM1 and VM2
- Test #3: Access the Pearl login page via browser → verify HTTP 200
- Test #4: Force a deployment failure (deploy broken code) → verify rollback restores the previous version
- Test #5: Trigger database refresh → verify all 17 databases are restored and masked
- Document all workflows, parameters, and troubleshooting steps in the operations runbook
🏁 Phase 4 Checkpoint
Code changes can be built, approved, deployed, and rolled back automatically. The entire process is logged in GitHub. No more manual file copying.
📚 Related Documentation
- GitHub Actions workflow syntax — Core reference for writing or reviewing the workflow YAML files.
- About self-hosted runners — Relevant because VM3 hosts the runner for private builds and deployments.
- Using environments for deployment — Useful for approvals and environment protections.
- Store and share workflow artifacts — Relevant for build outputs passed into deployment jobs.
- Variables and secrets in GitHub Actions — Useful for handling deployment settings securely.
- Terraform with Azure tutorial — Good support material for the infrastructure-as-code side of the deployment process.
Phase 5 — Security, Compliance & Monitoring
Lock everything down, prove the estate is isolated from production, harden identity and secrets, establish audit evidence, and enforce monitoring and governance controls.
This is the most critical security check. The test environment must have ZERO network connectivity to production. If a developer accidentally runs the wrong script, it must be impossible for it to reach production databases or services.
- Confirm no VNet peering exists between test and production subscriptions
- Run connectivity tests: ping/telnet from test VMs to production SQL MI → must fail
- Review Azure Firewall logs: only allowed outbound destinations appearing
- Document evidence that the environment remains private-only with no accidental production route
RBAC is how you stop the wrong people having the wrong power. The goal here is to make sure infrastructure admins, QA users, and operations staff each get only the level of access they genuinely need.
- Review and confirm RBAC assignments for infrastructure, QA, and operations teams
- Validate least-privilege scope on the subscription and resource groups
- Remove or reduce any unnecessary elevated access before sign-off
Key Vault is the control point for secrets. This step ensures passwords, connection strings, and API keys are not scattered across servers or scripts and that the team has a clear process for rotating them safely.
- Verify Key Vault protection settings, access boundaries, and managed identity usage
- Confirm secrets are not stored in local configuration files or deployment scripts
- Document the operational procedure for secret rotation and access review
- Verify Key Vault access logs show only expected Managed Identity callers
Audit logging is your evidence trail. If a secret was read, a deployment was triggered, or a privileged change was made, the environment should produce a reviewable record that tells you what happened, when it happened, and through which control path.
- Verify all VM disks are encrypted (BitLocker via Azure Disk Encryption)
- Confirm audit-relevant events are captured for infrastructure, access, deployment, and secret usage
- Validate the audit evidence path is usable for investigation and review
- Confirm retained logs and records are sufficient for security and operational traceability
Monitoring tells you when something breaks; Azure Policy stops bad configurations from being created in the first place. Together they provide operational visibility and governance guardrails.
- Create a Log Analytics Workspace and connect all VMs and Azure resources
- Set up alerts: VM unavailable, SQL MI high DTU, failed deployment, unexpected firewall denials
- Verify auto-shutdown schedules fire correctly at 19:00 UTC
- Apply Azure Policy assignments to deny public IPs, require tags, and restrict resources to UK South
- Complete the ISO 27001-aligned security controls checklist
Videos for Azure Security & Identity
🏁 Phase 5 Checkpoint
Environment is hardened, auditable, and demonstrably isolated from production. Security controls documented and verified.
📚 Related Documentation
- Azure RBAC overview — Supports access review and least-privilege design.
- Azure Key Vault security features — Useful for hardening the secret store and reviewing controls.
- Azure Policy overview — Relevant for denying public IPs and enforcing mandatory tags and regions.
- Azure Monitor overview — Covers the monitoring platform used for visibility and alerting.
- Log Analytics workspace overview — Useful for centralised logs and investigations.
- Azure Monitor alerts overview — Helps define operational alerting and notification patterns.
Phase 6 — Testing, Documentation & Handover
Prove everything works for real-world scenarios, write clear documentation, and hand over to the operations team.
"Smoke testing" means testing the most important things work. The name comes from electronics — if you plug something in and smoke comes out, you know there's a problem. We test the critical user journeys:
- Operator login: Navigate to Pearl login page → authenticate → see the dashboard
- Message capture: Create a test message via the operator screen → verify it appears in the database
- Client portal: Log in as a test client → view messages → view rota schedule
- Queue processor: Verify jobs are processing (check Process_MachineStates table)
- System checker: Verify health checks running (check checkjobs table)
- Totem: Verify long-poll registration works (/ping endpoint responds)
- Integration safety: Verify Stripe calls use test mode (check API logs)
- Integration safety: Verify no SMS messages are sent (check SMSSpoolOutgoing table)
- Integration safety: Verify Mailgun uses sandbox domain (emails don't reach real people)
Create the documents the team will rely on after the build is complete:
- Deployment guide: How to deploy code changes (step-by-step with screenshots)
- Backup & restore guide: How to refresh test data manually when needed
- Troubleshooting runbook: Common issues and how to fix them
- Secret rotation procedure: How to rotate passwords and API keys
- VM start/stop guide: How to turn VMs on/off for cost savings
- Architecture diagram: Final diagram with actual IPs, hostnames, and component placement
- Recorded walkthrough video: Screen recording of the full deploy cycle (build → approve → deploy → verify)
The formal transfer of ownership. After this, the operations team runs the environment independently.
- Live walkthrough session with the operations team (deployment, restore, rollback, troubleshooting)
- Record the session for future reference
- Confirm named owners and support responsibilities
- Formal handover acceptance sign-off
🏁 Phase 6 Checkpoint — Project Complete
Environment validated, documented, and handed over. The operations team is self-sufficient. The test environment is live and operational.
📚 Related Documentation
- Operational Excellence overview — Good guidance for handover quality, supportability, and ongoing operations.
- Health modeling guidance — Useful when deciding what should be included in smoke tests and acceptance checks.
- Monitoring and analysis guidance — Helpful for deciding what evidence and dashboards matter after handover.
- Safe deployment guidance — Useful for documenting rollback, deployment, and validation expectations.
- Reviewing deployments in GitHub — Helpful for the operational handover around deployment approvals and checks.
Summary & Completion Checklist
Everything you need to complete this project, in one checklist. Each line maps back to a specific acceptance criterion from the original RFP.
| # | Acceptance Criteria | Evidence Required | Phase |
|---|---|---|---|
| 1 | Environment provisioned from documented automation | Terraform templates / PowerShell scripts in repo | Phase 2 |
| 2 | Private-only, isolated from production | Network test results, no VNet peering proof | Phase 5 |
| 3 | App components deploy, key user journeys work | Smoke test results log | Phase 6 |
| 4 | Integrations safely sandboxed | API call logs showing test/sandbox endpoints | Phase 6 |
| 5 | Test data load/refresh works end-to-end | DB refresh execution log | Phase 3 |
| 6 | Release process supports approvals + rollback | 2+ successful deploys + 1 rollback evidence | Phase 4 |
| 7 | Documentation enables internal operation | Runbook set + walkthrough recording | Phase 6 |
All Video References Index
Quick reference of every video recommended in this guide, organised by topic:
📚 Documentation Index
| Category | Video Title | Channel | Level |
|---|---|---|---|
| Azure | AZ-900 Microsoft Azure Fundamentals Full Course | Adam Marczak | Beginner |
| Azure | Azure Full Course — 8 Hours | Edureka | Beginner |
| Azure | AZ-900 Networking (VNet, VPN, Load Balancer) | Adam Marczak | Beginner |
| Azure | Azure Key Vault Tutorial | Adam Marczak | Intermediate |
| Azure | Azure Storage Tutorial (Blob, Queue, Table) | Adam Marczak | Intermediate |
| Azure | Azure Active Directory / Entra ID Tutorial | Adam Marczak | Intermediate |
| DevOps | What is DevOps? REALLY Understand It | TechWorld with Nana | Beginner |
| DevOps | CI/CD Explained in 100 Seconds | Fireship | Beginner |
| DevOps | DevOps Prerequisites Course | freeCodeCamp | Beginner |
| DevOps | DevOps Crash Course (Docker, Terraform, GitHub Actions) | Traversy Media | Intermediate |
| IaC | What is Infrastructure as Code? | TechWorld with Nana | Beginner |
| IaC | What is GitOps? | TechWorld with Nana | Intermediate |
| Terraform | Terraform in 100 Seconds | Fireship | Beginner |
| Terraform | Learn Terraform with Azure — Full Course | freeCodeCamp | Intermediate |
| Terraform | Complete Terraform Course (Beginner to Pro) | DevOps Directive | Intermediate |
| Terraform | HashiCorp Terraform Certification Course | freeCodeCamp | Advanced |
| Terraform | Learn Terraform by Building a Dev Environment | freeCodeCamp | Intermediate |
| GitHub Actions | GitHub Actions — Basic Concepts & CI/CD Pipeline | TechWorld with Nana | Beginner |
| GitHub Actions | GitHub Actions Zero to Hero (90 min) | CoderDave | Intermediate |
| GitHub Actions | How GitHub Actions 10x Productivity | Beyond Fireship | Beginner |
| GitHub Actions | Automate Workflows with GitHub Actions | The Roadmap | Intermediate |
| CI/CD | Azure DevOps Tutorial for Beginners | TechWorld with Nana | Intermediate |
| PowerShell | PowerShell Master Class — Fundamentals | John Savill | Beginner |
| SQL | SQL Course for Beginners (Full Course) | Programming with Mosh | Beginner |
📌 What to Do Next
1. Complete all prerequisites (accounts, access, measurements).
2. Watch the foundation videos (Azure, CI/CD concepts, GitHub Actions basics).
3. Begin Phase 2 — start with networking and kick off SQL MI provisioning early.
4. Work through each phase in order — don't skip ahead.
5. Use this guide as a reference throughout the build. Come back to the relevant section whenever you need a refresher.