back to blog
Microsoft Fabric6 min read

Building Fabric Capacity Monitor: From Morning Frustration to Open Source Tool

#fabric#azure#open-source#capacity-planning#monitoring

Every morning started the same way. Open Azure portal. Switch tenant. Click through to fabric capacity. Check the metrics. Note any spikes. Repeat for the next customer. And the next one. And the next.

With 10+ fabric customers across different tenants, this ritual ate about an hour of my morning. Just to answer a simple question: is anyone overloaded right now?

That frustration eventually led me to build Fabric Capacity Monitor. An open source tool that does this automatically.

The problem with fabric capacity management

If you manage fabric for multiple organizations, you know the pain. Each customer has their own Azure tenant. Each capacity lives in its own little world. There's no way to see everything in one place.

And fabric's built-in metrics? They only go back 14 days. Try doing capacity planning with two weeks of data. It doesn't work. You can't see seasonal patterns or month-end spikes. You're basically flying blind.

The worst part was how i found out about throttling issues. A customer would call. "Hey our reports are slow." Then i'd log in and see their capacity was at 150% for the last three hours. Reactive instead of proactive. Not a great look.

Why not just use existing tools?

I looked at commercial alternatives. There are a few SaaS products that do fabric monitoring. But they all had the same problems:

Vendor lock-in. Once you send your capacity data to their platform, you're stuck. Want to switch? Good luck migrating your historical data.

Monthly subscription costs. Most charge per capacity or per user. With 10+ capacities across customers, that adds up fast. We're talking hundreds of euros per month for what's essentially a fancy dashboard.

Data sovereignty concerns. Some customers in regulated industries weren't comfortable with their capacity metrics going to a third party. Even though it's "just" utilization data, their compliance teams didn't love it.

Limited customization. The alerting rules were too basic. I wanted to alert on specific patterns like "throttling during business hours only" or "sustained high usage over 4 hours." Most tools just had simple threshold alerts.

There wasn't a good open source alternative. The few projects i found were either abandoned, fabric-specific but too basic, or required a complex kubernetes setup just to run.

So i built my own. And made it open source because honestly the fabric community needs better tooling and i don't want to become a SaaS vendor.

Tech choices and why

After trying a few approaches, i landed on this stack. You can see the full architecture on the project page.

Container Apps for compute

I needed something that could run scheduled jobs but not cost money when idle. Azure Functions was the obvious choice but i wanted more control over the runtime and didn't want to deal with the consumption plan cold start issues.

Container Apps with scale-to-zero hit the sweet spot. The job runs every 5 minutes, does its thing, scales back to zero. Monthly compute cost is basically nothing. Maybe 2-3 EUR.

The container approach also means no vendor lock-in on the compute side. If i need to move this to a different cloud or self-host on a VM, the docker image works anywhere.

PostgreSQL Flexible Server for storage

I debated between a few options here:

  • Cosmos DB: too expensive for this use case
  • Azure SQL: overkill and the pricing is confusing
  • Table Storage: cheap but querying historical data is painful
  • PostgreSQL Flexible: good balance of cost, features, and portability

Went with PostgreSQL Flexible Server on the Burstable B1ms tier. About 12 EUR per month. Handles millions of metric rows without issues. And if a customer wants to run their own instance, they can use any postgres-compatible database.

The flexible server also handles backups automatically. Important when you're storing months of historical data.

Key Vault for credentials

Each capacity needs service principal credentials to access its metrics. Storing those in environment variables or config files is asking for trouble.

Key Vault keeps everything secure and lets me rotate credentials without redeploying the application. The integration with Container Apps makes it seamless. Credentials get injected as environment variables at runtime.

Total infrastructure cost runs about 15-25 EUR per month depending on how much data you retain. Compare that to SaaS tools charging 50+ EUR per capacity per month.

Making it open source

The decision to open source wasn't obvious at first. I'd built something useful for my consulting work. Could have kept it proprietary and maybe charged for it someday.

But that felt wrong for a few reasons.

First, the fabric community is small. We all deal with the same problems. Hoarding tools doesn't help anyone.

Second, open source means better security. People can audit the code, find issues, suggest improvements. A closed source monitoring tool that handles credentials? Hard to trust.

Third, self-hosted gives customers full control. They choose where to deploy it, what data to collect, how long to retain it. Everything runs in their Azure subscription. They own the infrastructure and the data.

The code is on GitHub. MIT licensed. Fork it, modify it, run it however you want.

What it looks like now

Current state: monitoring 23 fabric capacities across 12 customer tenants. All from one dashboard.

Time savings are real. That morning ritual went from an hour to about 5 minutes. And most days i don't even need those 5 minutes because the alerts tell me if something needs attention.

The historical data is the real win though. I can now show customers their usage patterns over 6 months. Makes capacity planning conversations way easier. "Look, you spike every month-end. Let's talk about autoscale or a bigger capacity."

Caught two major throttling issues before customers noticed. Both were runaway dataflows that would have caused problems for hours. Got alerts within 15 minutes and paused the workloads.

Check the cost comparison table if you want to see how self-hosted stacks up against commercial options. Spoiler: it's significantly cheaper at scale.

If you're managing multiple fabric capacities and tired of the portal clicking, give the tool a try. The setup takes about 30 minutes and the infrastructure costs less than a nice lunch.

share:

frequently asked questions

Yari Bouwman

Written by

Data Engineer and Solution Designer specializing in scalable data platforms and modern cloud solutions. More about me

related posts