back to blog
Azure13 min read

Deploying Multi-Tenant Fabric Monitoring with Azure Bicep

#azure#bicep#infrastructure-as-code#container-apps#postgresql

Infrastructure as code isn't optional for production monitoring systems. When you're responsible for monitoring dozens of customer Fabric capacities, manually clicking through Azure Portal isn't just slow. It's a compliance and disaster recovery nightmare.

I built Fabric Capacity Monitor to deploy entirely via Azure Bicep. Here's why that matters and how to deploy it yourself.

Why IaC for monitoring infrastructure

Repeatability across environments

You deploy this monitoring stack once for your consulting company. It monitors 10, 50, or 1000 customer Fabric capacities from a single centralized platform. If a customer asks "can you deploy a dedicated instance just for us?" you run one command and their environment spins up identically.

No "it worked in dev" problems. No configuration drift between regions.

Version control for infrastructure changes

Every change to your monitoring infrastructure goes through Git. Security team wants to audit the database configuration? Point them to the commit history. Need to roll back a networking change that broke something? Git revert and redeploy.

This is table stakes for enterprise customers. They won't accept "we configured it manually via Portal."

Audit trail for compliance

Your customer's security team asks: "What Azure resources does this monitoring solution deploy? What permissions does it need? Show us the exact configuration."

You send them main.bicep. They review it, approve it, and track the exact commit SHA running in production. This is the difference between a 2-week security review and a 2-month one.

Disaster recovery

Your monitoring database gets corrupted. Your Key Vault gets accidentally deleted. A region outage takes down your Container App.

With Bicep: Redeploy the entire stack in 15 minutes. Database connection strings auto-regenerate and store in Key Vault. Managed identities get re-assigned. Everything rebuilds from code.

Without Bicep: Hope your documentation is up to date.

Architecture overview

The Fabric Capacity Monitor is a containerized FastAPI backend that collects capacity metrics from multiple customer tenants via cross-tenant service principals. Here's the Azure infrastructure:

Resource Group (your monitoring environment)
├── Container Apps Environment
│   └── FastAPI Backend Container
├── Key Vault (secrets)
├── PostgreSQL Flexible Server (private)
├── Storage Account (distributed locking)
├── Container Registry
└── Virtual Network (10.0.0.0/16)
    ├── App Subnet (10.0.0.0/23)
    ├── DB Subnet (10.0.2.0/24)
    └── Private Endpoint Subnet (10.0.3.0/24)

Each component has a dedicated Bicep module. Let's walk through them.

Module 1: PostgreSQL Flexible Server

Why Flexible Server over Single Server: Better networking options (VNet injection), zone redundancy, and the Burstable tier is cost-effective for small deployments. Single Server is being deprecated anyway.

// infra/modules/database.bicep
@description('Environment type for SKU selection')
@allowed(['Starter', 'Enterprise'])
param environmentType string

resource postgresServer 'Microsoft.DBforPostgreSQL/flexibleServers@2023-03-01-preview' = {
  name: serverName
  location: location
  sku: {
    name: environmentType == 'Starter' ? 'Standard_B1ms' : 'Standard_D2s_v3'
    tier: environmentType == 'Starter' ? 'Burstable' : 'GeneralPurpose'
  }
  properties: {
    version: '15'
    administratorLogin: administratorLogin
    administratorLoginPassword: administratorLoginPassword
    storage: {
      storageSizeGB: environmentType == 'Starter' ? 32 : 128
    }
    backup: {
      backupRetentionDays: environmentType == 'Starter' ? 7 : 35
      geoRedundantBackup: environmentType == 'Starter' ? 'Disabled' : 'Enabled'
    }
    highAvailability: {
      mode: environmentType == 'Enterprise' ? 'ZoneRedundant' : 'Disabled'
    }
    network: {
      delegatedSubnetResourceId: subnetId
      privateDnsZoneArmResourceId: privateDnsZone.id
    }
  }
}

Key configuration decisions:

SKU selection: Burstable B1ms for Starter (~13 EUR/month), General Purpose D2s_v3 for Enterprise (~200 EUR/month). The tier dramatically affects cost but also performance under load.

VNet integration: The network.delegatedSubnetResourceId parameter injects the database into a dedicated subnet. No public endpoint. This is mandatory for any production deployment touching customer data.

High availability: Zone-redundant HA costs 2x but survives zone failures. Starter tier skips it to save money. Enterprise tier requires it for SLAs.

Backup retention: 7 days is fine for dev/test. Enterprise customers want 35 days for compliance. Geo-redundant backup adds another 2x cost multiplier but protects against regional disasters.

Private DNS zone

resource privateDnsZone 'Microsoft.Network/privateDnsZones@2020-06-01' = {
  name: 'privatelink.postgres.database.azure.com'
  location: 'global'
}

resource privateDnsZoneLink 'Microsoft.Network/privateDnsZones/virtualNetworkLinks@2020-06-01' = {
  parent: privateDnsZone
  name: '${serverName}-link'
  location: 'global'
  properties: {
    registrationEnabled: false
    virtualNetwork: {
      id: vnetId
    }
  }
}

This is critical. Without private DNS, your Container App can't resolve the database hostname to its private IP. Azure does this automatically in Portal, but in Bicep you must explicitly create the zone and link it to your VNet.

Module 2: Azure Container Apps

Container Apps are serverless containers. Think "App Service for containers" but with scale-to-zero and better networking.

// infra/modules/container-app.bicep
resource containerAppEnvironment 'Microsoft.App/managedEnvironments@2024-03-01' = {
  name: environmentName
  location: location
  properties: {
    vnetConfiguration: {
      infrastructureSubnetId: subnetId
    }
    workloadProfiles: [
      {
        name: 'Consumption'
        workloadProfileType: 'Consumption'
      }
    ]
    zoneRedundant: environmentType == 'Enterprise'
  }
}

resource containerApp 'Microsoft.App/containerApps@2024-03-01' = {
  name: appName
  location: location
  identity: {
    type: 'UserAssigned'
    userAssignedIdentities: {
      '${managedIdentityId}': {}
    }
  }
  properties: {
    environmentId: containerAppEnvironment.id
    configuration: {
      ingress: {
        external: true
        targetPort: 8000
        transport: 'http'
        allowInsecure: false
      }
      secrets: [
        {
          name: 'db-connection-string'
          keyVaultUrl: 'https://${keyVaultName}.vault.azure.net/secrets/db-connection-string'
          identity: managedIdentityId
        }
      ]
    }
    template: {
      containers: [
        {
          name: 'main'
          image: containerImage
          resources: {
            cpu: json(environmentType == 'Starter' ? '0.5' : '2.0')
            memory: environmentType == 'Starter' ? '1Gi' : '4Gi'
          }
          env: [
            {
              name: 'DATABASE_URL'
              secretRef: 'db-connection-string'
            }
            {
              name: 'AZURE_KEY_VAULT_URL'
              value: 'https://${keyVaultName}.vault.azure.net'
            }
          ]
        }
      ]
      scale: {
        minReplicas: environmentType == 'Starter' ? 0 : 1
        maxReplicas: environmentType == 'Starter' ? 3 : 10
        rules: [
          {
            name: 'http-scaling'
            http: {
              metadata: {
                concurrentRequests: '100'
              }
            }
          }
        ]
      }
    }
  }
}

Why this configuration matters:

Scale-to-zero (Starter): minReplicas: 0 means the app shuts down completely when idle. You pay nothing during idle time. The first request after idle has 3-5 second cold start, which is acceptable for a monitoring backend with a 15-minute collection interval.

Always-on (Enterprise): minReplicas: 1 keeps at least one replica warm. No cold starts, better for real-time dashboards or customers expecting instant API responses.

Key Vault secret references: The keyVaultUrl in the secrets section tells Container Apps to fetch the database connection string from Key Vault at runtime using the managed identity. No secrets in your Bicep files or deployment outputs.

VNet integration: The vnetConfiguration.infrastructureSubnetId parameter injects the Container App Environment into your VNet. This allows it to reach the private PostgreSQL endpoint.

Why not App Service?

Container Apps cost less at low traffic (scale-to-zero), scale better at high traffic (auto-scales to 10+ replicas), and have simpler networking. App Service makes sense if you need Windows containers or deployment slots.

Module 3: Key Vault

Secrets management is non-negotiable. Database passwords, customer service principal secrets, admin API keys. All go in Key Vault.

// infra/modules/keyvault.bicep
resource keyVault 'Microsoft.KeyVault/vaults@2023-07-01' = {
  name: keyVaultName
  location: location
  properties: {
    sku: {
      family: 'A'
      name: 'standard'
    }
    tenantId: subscription().tenantId
    enableRbacAuthorization: true
    enableSoftDelete: true
    softDeleteRetentionInDays: environmentType == 'Enterprise' ? 90 : 7
    enablePurgeProtection: environmentType == 'Enterprise'
    publicNetworkAccess: environmentType == 'Enterprise' ? 'Disabled' : 'Enabled'
  }
}

I use enableRbacAuthorization: true because RBAC is the modern approach. It integrates with Azure AD, supports conditional access policies, and audits better. Access policies are legacy.

Granting managed identity access

resource secretsUserRole 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  scope: keyVault
  name: guid(keyVault.id, managedIdentityPrincipalId, 'Key Vault Secrets User')
  properties: {
    roleDefinitionId: subscriptionResourceId(
      'Microsoft.Authorization/roleDefinitions',
      '4633458b-17de-408a-b874-0445c86b69e6'  // Key Vault Secrets User
    )
    principalId: managedIdentityPrincipalId
    principalType: 'ServicePrincipal'
  }
}

The guid() function ensures the role assignment has a deterministic name. If you run the deployment twice, it won't try to create duplicate assignments.

resource privateEndpoint 'Microsoft.Network/privateEndpoints@2023-11-01' = if (environmentType == 'Enterprise') {
  name: '${keyVaultName}-pe'
  location: location
  properties: {
    subnet: {
      id: privateEndpointSubnetId
    }
    privateLinkServiceConnections: [
      {
        name: '${keyVaultName}-pe-connection'
        properties: {
          privateLinkServiceId: keyVault.id
          groupIds: ['vault']
        }
      }
    ]
  }
}

Enterprise deployments disable public network access to Key Vault and use a private endpoint. The Container App accesses it via the VNet. If your Key Vault is compromised, the attacker needs to be inside your VNet. Much harder than hitting a public endpoint.

See the security architecture for more on how Key Vault fits into the overall design.

Module 4: Networking

Virtual networks in Azure are free, but you pay for traffic and certain resources like NAT Gateways or VPN connections. This solution uses only subnets and NSGs, so networking costs are near-zero.

// infra/modules/network.bicep
var vnetAddressPrefix = '10.0.0.0/16'
var appSubnetAddressPrefix = '10.0.0.0/23'
var dbSubnetAddressPrefix = '10.0.2.0/24'
var privateEndpointSubnetAddressPrefix = '10.0.3.0/24'

resource vnet 'Microsoft.Network/virtualNetworks@2023-11-01' = {
  name: vnetName
  location: location
  properties: {
    addressSpace: {
      addressPrefixes: [vnetAddressPrefix]
    }
    subnets: [
      {
        name: '${namePrefix}-snet-app'
        properties: {
          addressPrefix: appSubnetAddressPrefix
          networkSecurityGroup: { id: nsgApp.id }
          delegations: [
            {
              name: 'Microsoft.App.environments'
              properties: {
                serviceName: 'Microsoft.App/environments'
              }
            }
          ]
        }
      }
      {
        name: '${namePrefix}-snet-db'
        properties: {
          addressPrefix: dbSubnetAddressPrefix
          networkSecurityGroup: { id: nsgDb.id }
          delegations: [
            {
              name: 'Microsoft.DBforPostgreSQL.flexibleServers'
              properties: {
                serviceName: 'Microsoft.DBforPostgreSQL/flexibleServers'
              }
            }
          ]
        }
      }
      {
        name: '${namePrefix}-snet-pe'
        properties: {
          addressPrefix: privateEndpointSubnetAddressPrefix
          networkSecurityGroup: { id: nsgPrivateEndpoint.id }
          privateEndpointNetworkPolicies: 'Disabled'
        }
      }
    ]
  }
}

Why three subnets:

  1. App subnet: Hosts Container Apps Environment. Delegated to Microsoft.App/environments so Azure can inject the necessary infrastructure.
  2. DB subnet: Hosts PostgreSQL. Delegated to Microsoft.DBforPostgreSQL/flexibleServers. Database has no public IP. Only accessible from this subnet.
  3. Private endpoint subnet: Hosts private endpoints for Key Vault (Enterprise). Must have privateEndpointNetworkPolicies: 'Disabled'.

NSG rules

resource nsgDb 'Microsoft.Network/networkSecurityGroups@2023-11-01' = {
  name: '${namePrefix}-nsg-db'
  location: location
  properties: {
    securityRules: [
      {
        name: 'AllowAppSubnet'
        properties: {
          protocol: 'Tcp'
          sourcePortRange: '*'
          destinationPortRange: '5432'
          sourceAddressPrefix: appSubnetAddressPrefix
          destinationAddressPrefix: '*'
          access: 'Allow'
          priority: 100
          direction: 'Inbound'
        }
      }
    ]
  }
}

This NSG allows only the app subnet to reach the database on port 5432. Everything else is denied by default. Security teams love explicit allow rules.

Putting it together

The main Bicep file orchestrates all modules:

// infra/main.bicep
targetScope = 'resourceGroup'

param appName string
param environmentType string = 'Starter'
param location string = resourceGroup().location

var nameSuffix = uniqueString(resourceGroup().id)

module identity 'modules/identity.bicep' = {
  name: 'identity-deployment'
  params: {
    location: location
    identityName: 'id-${appName}-${nameSuffix}'
  }
}

module network 'modules/network.bicep' = {
  name: 'network-deployment'
  params: {
    location: location
    vnetName: 'vnet-${appName}-${nameSuffix}'
    namePrefix: appName
  }
}

module database 'modules/database.bicep' = {
  name: 'database-deployment'
  params: {
    location: location
    serverName: 'psql-${appName}-${nameSuffix}'
    environmentType: environmentType
    subnetId: network.outputs.dbSubnetId
    vnetId: network.outputs.vnetId
  }
}

module keyVault 'modules/keyvault.bicep' = {
  name: 'keyvault-deployment'
  params: {
    location: location
    keyVaultName: 'kv-${appName}-${nameSuffix}'
    managedIdentityPrincipalId: identity.outputs.principalId
    databaseConnectionString: database.outputs.connectionString
    environmentType: environmentType
    privateEndpointSubnetId: network.outputs.privateEndpointSubnetId
    vnetId: network.outputs.vnetId
  }
}

module containerApp 'modules/container-app.bicep' = {
  name: 'containerapp-deployment'
  params: {
    location: location
    appName: 'ca-${appName}-${nameSuffix}'
    environmentType: environmentType
    subnetId: network.outputs.appSubnetId
    managedIdentityId: identity.outputs.identityId
    keyVaultName: keyVault.outputs.keyVaultName
  }
}

Bicep figures out dependencies automatically. Network deploys first (nothing depends on other modules), then identity, then database (needs network), then Key Vault (needs identity + database), then Container App (needs everything).

If you've written Terraform, you'll recognize this is cleaner. No explicit depends_on blocks unless you have non-obvious dependencies.

Deployment walkthrough

One-command deployment

# Create resource group
az group create --name rg-fabricmon-prod --location westeurope

# Deploy infrastructure
az deployment group create `
  --resource-group rg-fabricmon-prod `
  --template-file infra/main.bicep `
  --parameters appName=fabricmon environmentType=Enterprise

What happens:

  1. Bicep validates the template (syntax, resource types, API versions)
  2. Converts Bicep to ARM JSON
  3. Submits deployment to Azure Resource Manager
  4. ARM deploys resources in dependency order
  5. 10-15 minutes later, you have a running monitoring stack

Using a parameters file

For production, don't pass parameters on the command line. Use a JSON file:

{
  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "appName": {
      "value": "fabricmon"
    },
    "environmentType": {
      "value": "Enterprise"
    },
    "location": {
      "value": "westeurope"
    }
  }
}

Then deploy with:

az deployment group create `
  --resource-group rg-fabricmon-prod `
  --template-file infra/main.bicep `
  --parameters @infra/parameters.prod.json

Commit parameters.prod.json to Git (it contains no secrets). Now your infrastructure configuration is version-controlled.

Post-deployment

After deployment completes:

# Get registry name from deployment output
$registryName = (az deployment group show `
  --resource-group rg-fabricmon-prod `
  --name main `
  --query properties.outputs.registryName.value -o tsv)

# Build and push container
az acr login --name $registryName
docker build -t ${registryName}.azurecr.io/fabricmon:v1.0.0 backend/
docker push ${registryName}.azurecr.io/fabricmon:v1.0.0

# Update Container App
az containerapp update `
  --name ca-fabricmon-abc123 `
  --resource-group rg-fabricmon-prod `
  --image ${registryName}.azurecr.io/fabricmon:v1.0.0

Troubleshooting common errors

Key Vault name already taken

Error: The vault name 'kv-fabricmon-abc123' is already in use.

Key Vault has soft delete enabled. Deleted vaults reserve the name for 7-90 days. Purge the deleted vault:

az keyvault purge --name kv-fabricmon-abc123

Or choose a different appName parameter to generate a different uniqueString().

Insufficient quota for PostgreSQL

Error: Operation could not be completed as it results in exceeding approved quota

Your subscription has a regional quota limit for PostgreSQL servers. Request a quota increase via Azure Portal > Subscriptions > Usage + quotas. Takes 1-2 business days.

Workaround: Deploy to a different region with available quota.

Container App fails to start

Check the logs:

az containerapp logs show `
  --name ca-fabricmon-abc123 `
  --resource-group rg-fabricmon-prod `
  --tail 100

Common causes:

  • Database migration failed (check Alembic logs)
  • Key Vault secret reference incorrect (check managed identity permissions)
  • Container image not found (verify registry and image tag)

How to rollback

If a deployment breaks production, use Git:

git revert HEAD
az deployment group create `
  --resource-group rg-fabricmon-prod `
  --template-file infra/main.bicep `
  --parameters @infra/parameters.prod.json

Bicep deployments are idempotent. Running the same template twice doesn't duplicate resources. Only changed resources get updated.

To preview changes before deploying:

az deployment group what-if `
  --resource-group rg-fabricmon-prod `
  --template-file infra/main.bicep `
  --parameters @infra/parameters.prod.json

This shows a diff of resources that will be created, updated, or deleted.

Starter vs Enterprise tier

The parameterized environmentType isn't just about saving money. It's about matching deployment complexity to organizational maturity.

Starter tier (15-30 EUR/month):

  • Small consulting firms with 5-20 customers
  • Scale-to-zero Container App (no idle costs)
  • Burstable PostgreSQL (handles spiky workloads)
  • 7-day backups
  • No HA, no private endpoints

Enterprise tier (150-300 EUR/month):

  • MSPs monitoring 100+ customer tenants
  • Always-on Container App (instant responses)
  • General Purpose PostgreSQL with zone-redundant HA
  • 35-day geo-redundant backups (compliance)
  • Private Link for Key Vault

The infrastructure scales with your business. Start with Starter, upgrade to Enterprise when you sign your 50th customer or your first Fortune 500.

CI/CD integration

GitHub Actions

name: Deploy Infrastructure

on:
  push:
    branches: [main]
    paths:
      - 'infra/**'

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Azure Login
        uses: azure/login@v1
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}
      
      - name: Deploy Bicep
        uses: azure/arm-deploy@v1
        with:
          scope: resourcegroup
          resourceGroupName: rg-fabricmon-prod
          template: ./infra/main.bicep
          parameters: appName=fabricmon environmentType=Enterprise

Every push to main that modifies /infra triggers a deployment. Add approval gates for production.

Full implementation

The complete Bicep modules are in the GitHub repo. Clone it, customize the parameters, and deploy to your subscription.

You now have a production-grade, multi-tenant Fabric monitoring infrastructure defined in ~500 lines of Bicep. It deploys in 15 minutes, costs 15-300 EUR/month depending on tier, and scales to 1000+ customer tenants.

share:

frequently asked questions

Yari Bouwman

Written by

Data Engineer and Solution Designer specializing in scalable data platforms and modern cloud solutions. More about me

related posts