Deploying Multi-Tenant Fabric Monitoring with Azure Bicep
Infrastructure as code isn't optional for production monitoring systems. When you're responsible for monitoring dozens of customer Fabric capacities, manually clicking through Azure Portal isn't just slow. It's a compliance and disaster recovery nightmare.
I built Fabric Capacity Monitor to deploy entirely via Azure Bicep. Here's why that matters and how to deploy it yourself.
Why IaC for monitoring infrastructure
Repeatability across environments
You deploy this monitoring stack once for your consulting company. It monitors 10, 50, or 1000 customer Fabric capacities from a single centralized platform. If a customer asks "can you deploy a dedicated instance just for us?" you run one command and their environment spins up identically.
No "it worked in dev" problems. No configuration drift between regions.
Version control for infrastructure changes
Every change to your monitoring infrastructure goes through Git. Security team wants to audit the database configuration? Point them to the commit history. Need to roll back a networking change that broke something? Git revert and redeploy.
This is table stakes for enterprise customers. They won't accept "we configured it manually via Portal."
Audit trail for compliance
Your customer's security team asks: "What Azure resources does this monitoring solution deploy? What permissions does it need? Show us the exact configuration."
You send them main.bicep. They review it, approve it, and track the exact commit SHA running in production. This is the difference between a 2-week security review and a 2-month one.
Disaster recovery
Your monitoring database gets corrupted. Your Key Vault gets accidentally deleted. A region outage takes down your Container App.
With Bicep: Redeploy the entire stack in 15 minutes. Database connection strings auto-regenerate and store in Key Vault. Managed identities get re-assigned. Everything rebuilds from code.
Without Bicep: Hope your documentation is up to date.
Architecture overview
The Fabric Capacity Monitor is a containerized FastAPI backend that collects capacity metrics from multiple customer tenants via cross-tenant service principals. Here's the Azure infrastructure:
Resource Group (your monitoring environment)
├── Container Apps Environment
│ └── FastAPI Backend Container
├── Key Vault (secrets)
├── PostgreSQL Flexible Server (private)
├── Storage Account (distributed locking)
├── Container Registry
└── Virtual Network (10.0.0.0/16)
├── App Subnet (10.0.0.0/23)
├── DB Subnet (10.0.2.0/24)
└── Private Endpoint Subnet (10.0.3.0/24)
Each component has a dedicated Bicep module. Let's walk through them.
Module 1: PostgreSQL Flexible Server
Why Flexible Server over Single Server: Better networking options (VNet injection), zone redundancy, and the Burstable tier is cost-effective for small deployments. Single Server is being deprecated anyway.
// infra/modules/database.bicep
@description('Environment type for SKU selection')
@allowed(['Starter', 'Enterprise'])
param environmentType string
resource postgresServer 'Microsoft.DBforPostgreSQL/flexibleServers@2023-03-01-preview' = {
name: serverName
location: location
sku: {
name: environmentType == 'Starter' ? 'Standard_B1ms' : 'Standard_D2s_v3'
tier: environmentType == 'Starter' ? 'Burstable' : 'GeneralPurpose'
}
properties: {
version: '15'
administratorLogin: administratorLogin
administratorLoginPassword: administratorLoginPassword
storage: {
storageSizeGB: environmentType == 'Starter' ? 32 : 128
}
backup: {
backupRetentionDays: environmentType == 'Starter' ? 7 : 35
geoRedundantBackup: environmentType == 'Starter' ? 'Disabled' : 'Enabled'
}
highAvailability: {
mode: environmentType == 'Enterprise' ? 'ZoneRedundant' : 'Disabled'
}
network: {
delegatedSubnetResourceId: subnetId
privateDnsZoneArmResourceId: privateDnsZone.id
}
}
}
Key configuration decisions:
SKU selection: Burstable B1ms for Starter (~13 EUR/month), General Purpose D2s_v3 for Enterprise (~200 EUR/month). The tier dramatically affects cost but also performance under load.
VNet integration: The network.delegatedSubnetResourceId parameter injects the database into a dedicated subnet. No public endpoint. This is mandatory for any production deployment touching customer data.
High availability: Zone-redundant HA costs 2x but survives zone failures. Starter tier skips it to save money. Enterprise tier requires it for SLAs.
Backup retention: 7 days is fine for dev/test. Enterprise customers want 35 days for compliance. Geo-redundant backup adds another 2x cost multiplier but protects against regional disasters.
Private DNS zone
resource privateDnsZone 'Microsoft.Network/privateDnsZones@2020-06-01' = {
name: 'privatelink.postgres.database.azure.com'
location: 'global'
}
resource privateDnsZoneLink 'Microsoft.Network/privateDnsZones/virtualNetworkLinks@2020-06-01' = {
parent: privateDnsZone
name: '${serverName}-link'
location: 'global'
properties: {
registrationEnabled: false
virtualNetwork: {
id: vnetId
}
}
}
This is critical. Without private DNS, your Container App can't resolve the database hostname to its private IP. Azure does this automatically in Portal, but in Bicep you must explicitly create the zone and link it to your VNet.
Module 2: Azure Container Apps
Container Apps are serverless containers. Think "App Service for containers" but with scale-to-zero and better networking.
// infra/modules/container-app.bicep
resource containerAppEnvironment 'Microsoft.App/managedEnvironments@2024-03-01' = {
name: environmentName
location: location
properties: {
vnetConfiguration: {
infrastructureSubnetId: subnetId
}
workloadProfiles: [
{
name: 'Consumption'
workloadProfileType: 'Consumption'
}
]
zoneRedundant: environmentType == 'Enterprise'
}
}
resource containerApp 'Microsoft.App/containerApps@2024-03-01' = {
name: appName
location: location
identity: {
type: 'UserAssigned'
userAssignedIdentities: {
'${managedIdentityId}': {}
}
}
properties: {
environmentId: containerAppEnvironment.id
configuration: {
ingress: {
external: true
targetPort: 8000
transport: 'http'
allowInsecure: false
}
secrets: [
{
name: 'db-connection-string'
keyVaultUrl: 'https://${keyVaultName}.vault.azure.net/secrets/db-connection-string'
identity: managedIdentityId
}
]
}
template: {
containers: [
{
name: 'main'
image: containerImage
resources: {
cpu: json(environmentType == 'Starter' ? '0.5' : '2.0')
memory: environmentType == 'Starter' ? '1Gi' : '4Gi'
}
env: [
{
name: 'DATABASE_URL'
secretRef: 'db-connection-string'
}
{
name: 'AZURE_KEY_VAULT_URL'
value: 'https://${keyVaultName}.vault.azure.net'
}
]
}
]
scale: {
minReplicas: environmentType == 'Starter' ? 0 : 1
maxReplicas: environmentType == 'Starter' ? 3 : 10
rules: [
{
name: 'http-scaling'
http: {
metadata: {
concurrentRequests: '100'
}
}
}
]
}
}
}
}
Why this configuration matters:
Scale-to-zero (Starter): minReplicas: 0 means the app shuts down completely when idle. You pay nothing during idle time. The first request after idle has 3-5 second cold start, which is acceptable for a monitoring backend with a 15-minute collection interval.
Always-on (Enterprise): minReplicas: 1 keeps at least one replica warm. No cold starts, better for real-time dashboards or customers expecting instant API responses.
Key Vault secret references: The keyVaultUrl in the secrets section tells Container Apps to fetch the database connection string from Key Vault at runtime using the managed identity. No secrets in your Bicep files or deployment outputs.
VNet integration: The vnetConfiguration.infrastructureSubnetId parameter injects the Container App Environment into your VNet. This allows it to reach the private PostgreSQL endpoint.
Why not App Service?
Container Apps cost less at low traffic (scale-to-zero), scale better at high traffic (auto-scales to 10+ replicas), and have simpler networking. App Service makes sense if you need Windows containers or deployment slots.
Module 3: Key Vault
Secrets management is non-negotiable. Database passwords, customer service principal secrets, admin API keys. All go in Key Vault.
// infra/modules/keyvault.bicep
resource keyVault 'Microsoft.KeyVault/vaults@2023-07-01' = {
name: keyVaultName
location: location
properties: {
sku: {
family: 'A'
name: 'standard'
}
tenantId: subscription().tenantId
enableRbacAuthorization: true
enableSoftDelete: true
softDeleteRetentionInDays: environmentType == 'Enterprise' ? 90 : 7
enablePurgeProtection: environmentType == 'Enterprise'
publicNetworkAccess: environmentType == 'Enterprise' ? 'Disabled' : 'Enabled'
}
}
I use enableRbacAuthorization: true because RBAC is the modern approach. It integrates with Azure AD, supports conditional access policies, and audits better. Access policies are legacy.
Granting managed identity access
resource secretsUserRole 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
scope: keyVault
name: guid(keyVault.id, managedIdentityPrincipalId, 'Key Vault Secrets User')
properties: {
roleDefinitionId: subscriptionResourceId(
'Microsoft.Authorization/roleDefinitions',
'4633458b-17de-408a-b874-0445c86b69e6' // Key Vault Secrets User
)
principalId: managedIdentityPrincipalId
principalType: 'ServicePrincipal'
}
}
The guid() function ensures the role assignment has a deterministic name. If you run the deployment twice, it won't try to create duplicate assignments.
Enterprise: Private Link for Key Vault
resource privateEndpoint 'Microsoft.Network/privateEndpoints@2023-11-01' = if (environmentType == 'Enterprise') {
name: '${keyVaultName}-pe'
location: location
properties: {
subnet: {
id: privateEndpointSubnetId
}
privateLinkServiceConnections: [
{
name: '${keyVaultName}-pe-connection'
properties: {
privateLinkServiceId: keyVault.id
groupIds: ['vault']
}
}
]
}
}
Enterprise deployments disable public network access to Key Vault and use a private endpoint. The Container App accesses it via the VNet. If your Key Vault is compromised, the attacker needs to be inside your VNet. Much harder than hitting a public endpoint.
See the security architecture for more on how Key Vault fits into the overall design.
Module 4: Networking
Virtual networks in Azure are free, but you pay for traffic and certain resources like NAT Gateways or VPN connections. This solution uses only subnets and NSGs, so networking costs are near-zero.
// infra/modules/network.bicep
var vnetAddressPrefix = '10.0.0.0/16'
var appSubnetAddressPrefix = '10.0.0.0/23'
var dbSubnetAddressPrefix = '10.0.2.0/24'
var privateEndpointSubnetAddressPrefix = '10.0.3.0/24'
resource vnet 'Microsoft.Network/virtualNetworks@2023-11-01' = {
name: vnetName
location: location
properties: {
addressSpace: {
addressPrefixes: [vnetAddressPrefix]
}
subnets: [
{
name: '${namePrefix}-snet-app'
properties: {
addressPrefix: appSubnetAddressPrefix
networkSecurityGroup: { id: nsgApp.id }
delegations: [
{
name: 'Microsoft.App.environments'
properties: {
serviceName: 'Microsoft.App/environments'
}
}
]
}
}
{
name: '${namePrefix}-snet-db'
properties: {
addressPrefix: dbSubnetAddressPrefix
networkSecurityGroup: { id: nsgDb.id }
delegations: [
{
name: 'Microsoft.DBforPostgreSQL.flexibleServers'
properties: {
serviceName: 'Microsoft.DBforPostgreSQL/flexibleServers'
}
}
]
}
}
{
name: '${namePrefix}-snet-pe'
properties: {
addressPrefix: privateEndpointSubnetAddressPrefix
networkSecurityGroup: { id: nsgPrivateEndpoint.id }
privateEndpointNetworkPolicies: 'Disabled'
}
}
]
}
}
Why three subnets:
- App subnet: Hosts Container Apps Environment. Delegated to
Microsoft.App/environmentsso Azure can inject the necessary infrastructure. - DB subnet: Hosts PostgreSQL. Delegated to
Microsoft.DBforPostgreSQL/flexibleServers. Database has no public IP. Only accessible from this subnet. - Private endpoint subnet: Hosts private endpoints for Key Vault (Enterprise). Must have
privateEndpointNetworkPolicies: 'Disabled'.
NSG rules
resource nsgDb 'Microsoft.Network/networkSecurityGroups@2023-11-01' = {
name: '${namePrefix}-nsg-db'
location: location
properties: {
securityRules: [
{
name: 'AllowAppSubnet'
properties: {
protocol: 'Tcp'
sourcePortRange: '*'
destinationPortRange: '5432'
sourceAddressPrefix: appSubnetAddressPrefix
destinationAddressPrefix: '*'
access: 'Allow'
priority: 100
direction: 'Inbound'
}
}
]
}
}
This NSG allows only the app subnet to reach the database on port 5432. Everything else is denied by default. Security teams love explicit allow rules.
Putting it together
The main Bicep file orchestrates all modules:
// infra/main.bicep
targetScope = 'resourceGroup'
param appName string
param environmentType string = 'Starter'
param location string = resourceGroup().location
var nameSuffix = uniqueString(resourceGroup().id)
module identity 'modules/identity.bicep' = {
name: 'identity-deployment'
params: {
location: location
identityName: 'id-${appName}-${nameSuffix}'
}
}
module network 'modules/network.bicep' = {
name: 'network-deployment'
params: {
location: location
vnetName: 'vnet-${appName}-${nameSuffix}'
namePrefix: appName
}
}
module database 'modules/database.bicep' = {
name: 'database-deployment'
params: {
location: location
serverName: 'psql-${appName}-${nameSuffix}'
environmentType: environmentType
subnetId: network.outputs.dbSubnetId
vnetId: network.outputs.vnetId
}
}
module keyVault 'modules/keyvault.bicep' = {
name: 'keyvault-deployment'
params: {
location: location
keyVaultName: 'kv-${appName}-${nameSuffix}'
managedIdentityPrincipalId: identity.outputs.principalId
databaseConnectionString: database.outputs.connectionString
environmentType: environmentType
privateEndpointSubnetId: network.outputs.privateEndpointSubnetId
vnetId: network.outputs.vnetId
}
}
module containerApp 'modules/container-app.bicep' = {
name: 'containerapp-deployment'
params: {
location: location
appName: 'ca-${appName}-${nameSuffix}'
environmentType: environmentType
subnetId: network.outputs.appSubnetId
managedIdentityId: identity.outputs.identityId
keyVaultName: keyVault.outputs.keyVaultName
}
}
Bicep figures out dependencies automatically. Network deploys first (nothing depends on other modules), then identity, then database (needs network), then Key Vault (needs identity + database), then Container App (needs everything).
If you've written Terraform, you'll recognize this is cleaner. No explicit depends_on blocks unless you have non-obvious dependencies.
Deployment walkthrough
One-command deployment
# Create resource group
az group create --name rg-fabricmon-prod --location westeurope
# Deploy infrastructure
az deployment group create `
--resource-group rg-fabricmon-prod `
--template-file infra/main.bicep `
--parameters appName=fabricmon environmentType=Enterprise
What happens:
- Bicep validates the template (syntax, resource types, API versions)
- Converts Bicep to ARM JSON
- Submits deployment to Azure Resource Manager
- ARM deploys resources in dependency order
- 10-15 minutes later, you have a running monitoring stack
Using a parameters file
For production, don't pass parameters on the command line. Use a JSON file:
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"appName": {
"value": "fabricmon"
},
"environmentType": {
"value": "Enterprise"
},
"location": {
"value": "westeurope"
}
}
}
Then deploy with:
az deployment group create `
--resource-group rg-fabricmon-prod `
--template-file infra/main.bicep `
--parameters @infra/parameters.prod.json
Commit parameters.prod.json to Git (it contains no secrets). Now your infrastructure configuration is version-controlled.
Post-deployment
After deployment completes:
# Get registry name from deployment output
$registryName = (az deployment group show `
--resource-group rg-fabricmon-prod `
--name main `
--query properties.outputs.registryName.value -o tsv)
# Build and push container
az acr login --name $registryName
docker build -t ${registryName}.azurecr.io/fabricmon:v1.0.0 backend/
docker push ${registryName}.azurecr.io/fabricmon:v1.0.0
# Update Container App
az containerapp update `
--name ca-fabricmon-abc123 `
--resource-group rg-fabricmon-prod `
--image ${registryName}.azurecr.io/fabricmon:v1.0.0
Troubleshooting common errors
Key Vault name already taken
Error: The vault name 'kv-fabricmon-abc123' is already in use.
Key Vault has soft delete enabled. Deleted vaults reserve the name for 7-90 days. Purge the deleted vault:
az keyvault purge --name kv-fabricmon-abc123
Or choose a different appName parameter to generate a different uniqueString().
Insufficient quota for PostgreSQL
Error: Operation could not be completed as it results in exceeding approved quota
Your subscription has a regional quota limit for PostgreSQL servers. Request a quota increase via Azure Portal > Subscriptions > Usage + quotas. Takes 1-2 business days.
Workaround: Deploy to a different region with available quota.
Container App fails to start
Check the logs:
az containerapp logs show `
--name ca-fabricmon-abc123 `
--resource-group rg-fabricmon-prod `
--tail 100
Common causes:
- Database migration failed (check Alembic logs)
- Key Vault secret reference incorrect (check managed identity permissions)
- Container image not found (verify registry and image tag)
How to rollback
If a deployment breaks production, use Git:
git revert HEAD
az deployment group create `
--resource-group rg-fabricmon-prod `
--template-file infra/main.bicep `
--parameters @infra/parameters.prod.json
Bicep deployments are idempotent. Running the same template twice doesn't duplicate resources. Only changed resources get updated.
To preview changes before deploying:
az deployment group what-if `
--resource-group rg-fabricmon-prod `
--template-file infra/main.bicep `
--parameters @infra/parameters.prod.json
This shows a diff of resources that will be created, updated, or deleted.
Starter vs Enterprise tier
The parameterized environmentType isn't just about saving money. It's about matching deployment complexity to organizational maturity.
Starter tier (15-30 EUR/month):
- Small consulting firms with 5-20 customers
- Scale-to-zero Container App (no idle costs)
- Burstable PostgreSQL (handles spiky workloads)
- 7-day backups
- No HA, no private endpoints
Enterprise tier (150-300 EUR/month):
- MSPs monitoring 100+ customer tenants
- Always-on Container App (instant responses)
- General Purpose PostgreSQL with zone-redundant HA
- 35-day geo-redundant backups (compliance)
- Private Link for Key Vault
The infrastructure scales with your business. Start with Starter, upgrade to Enterprise when you sign your 50th customer or your first Fortune 500.
CI/CD integration
GitHub Actions
name: Deploy Infrastructure
on:
push:
branches: [main]
paths:
- 'infra/**'
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Azure Login
uses: azure/login@v1
with:
creds: ${{ secrets.AZURE_CREDENTIALS }}
- name: Deploy Bicep
uses: azure/arm-deploy@v1
with:
scope: resourcegroup
resourceGroupName: rg-fabricmon-prod
template: ./infra/main.bicep
parameters: appName=fabricmon environmentType=Enterprise
Every push to main that modifies /infra triggers a deployment. Add approval gates for production.
Full implementation
The complete Bicep modules are in the GitHub repo. Clone it, customize the parameters, and deploy to your subscription.
You now have a production-grade, multi-tenant Fabric monitoring infrastructure defined in ~500 lines of Bicep. It deploys in 15 minutes, costs 15-300 EUR/month depending on tier, and scales to 1000+ customer tenants.