Why can a Fabric capacity throttle even when average utilization looks low?

Because Fabric smooths compute over time. Interactive work is smoothed over at least five minutes and background work over 24 hours. A capacity can look calm on a daily average while future capacity is already committed by earlier jobs.

Should I put Power BI, Spark, and Warehouse workloads on the same Fabric capacity?

Only if you accept shared failure. Mixed workloads can coexist, but critical interactive reporting usually deserves a separate blast radius from Spark notebooks, pipelines, and heavy warehouse ingestion.

What does surge protection actually do in Fabric?

It lets capacity admins reject background work earlier than the default platform limits so interactive workloads have a better chance of staying usable. It helps reduce deep overload, but it is not a substitute for proper capacity isolation.

When should I scale up versus redesign workload placement?

Scale up when a well-optimized and well-isolated workload still sustains high demand. Redesign first when the problem is avoidable contention, poor table maintenance, overlapping refresh windows, or one noisy workspace consuming most of the 24-hour budget.

Microsoft Fabric Capacity Governance: Concurrency, Smoothing, and Protecting Critical Workloads

Fabric capacity problems rarely start with a dramatic outage. They usually start with someone saying, "This report feels random lately."

At 9:15 the semantic model is slow. At 9:20 it is fine again. Your daily utilization looks like 45%, so everyone assumes the SKU is large enough. Then a refresh chain, some notebooks, and a warehouse ingestion window line up, and suddenly users hit delays or outright rejections.

That is the trap. In Fabric, average utilization is a comfort metric. It is not an operating model.

Reviewed April 2026

This post reflects current Fabric throttling, surge protection, and warehouse workload management behavior. Microsoft keeps changing details around workload controls, so always validate thresholds and platform behavior against your tenant before turning rules loose in production.

Why average utilization lies

Fabric does not charge your capacity pressure only at the moment a job runs. It uses bursting and smoothing.

Interactive operations are smoothed over at least 5 minutes and up to 64 minutes.
Background operations are smoothed over 24 hours.
Evaluation happens in 30-second timepoints.

That means a heavy background job can finish quickly, look successful, and still leave a long tail of committed future capacity behind it. Fabric calls that carryforward. The idle periods after that job are effectively paying off what you already spent.

This is why a capacity can feel slow in the morning even though the overnight load already finished. The compute was consumed yesterday. The penalty shows up now.

Warehouse makes this even more confusing. Most Warehouse and SQL analytics endpoint operations are classified as background to benefit from 24-hour smoothing. That is useful for throughput, but it also means heavy warehouse activity can quietly eat tomorrow's headroom if you are not watching the right charts.

So stop asking only, "What was my average utilization yesterday?" Ask these instead:

How much 10-minute headroom did I have during business hours?
How much 24-hour background budget was already committed before users logged in?
Which workspaces created the carryforward?

If you are already collecting platform data, compare this with your capacity monitoring approach. Discovery data and runtime pressure are different problems. Treating them as the same thing is how bad governance starts.

The three throttling stages that actually matter

Fabric does not jump straight from healthy to dead. It degrades in stages.

1. Interactive delay

Once the next 10 minutes of future capacity are effectively full, new interactive operations get delayed by 20 seconds. Users experience this as "Fabric is weirdly sluggish."

2. Interactive rejection

If the next 60 minutes are full, new interactive operations get rejected. This is where report viewers and analysts start seeing errors instead of just slow performance.

3. Background rejection

If the next 24 hours are full, Fabric rejects new background work too. At that point you are not managing load anymore. You are in recovery mode.

Two practical points matter here:

In-flight work is not throttled mid-run. The next requests are what suffer.
A utilization spike above 100% does not automatically mean throttling. You need to look at the throttling and burndown views, not just raw utilization.

That distinction matters because people often overreact to one spike and underreact to slow burndown. The dangerous situation is not "we peaked." The dangerous situation is "we are still paying off that peak when the next business cycle begins."

Start with blast radius, not department names

Most Fabric estates are grouped by org chart. Finance gets a workspace. Sales gets a workspace. Data engineering gets a workspace. Then everything lands on one shared F64 because it is convenient.

Convenient is not the same as operable.

The better pattern is to separate workloads by failure mode:

Capacity A: mission-critical semantic models and executive reporting
Capacity B: shared engineering workloads like notebooks, Dataflows Gen2, pipelines, and heavy warehouse ingestion
Capacity C: dev, test, ad hoc exploration, and experimental workloads

If you keep critical BI and heavy engineering on the same capacity, you are making a design choice to let background work damage interactive users. Sometimes that is acceptable for a small team. In production, it usually is not.

This does not mean duplicating all your data. Use OneLake shortcuts and shared curated zones where it makes sense. The point is compute isolation, not storage sprawl.

If you are still deciding whether a workload belongs in a lakehouse or warehouse, read lakehouse vs warehouse in Fabric first. Bad capacity governance often starts with choosing the right engine for the wrong usage pattern.

Use surge protection, but do not pretend it is architecture

Fabric surge protection is useful because it lets you reject background work earlier than the default platform limits. That gives you a chance to protect interactive users before the capacity falls into deep 24-hour overload.

A reasonable starting approach on shared capacities is:

set a background rejection threshold below your observed pain point
set a recovery threshold below normal operating background load
review both against the Compute page in the Capacity Metrics app, not gut feel

If your background rejection chart usually sits around 35% and spikes to 70% on bad days, a threshold somewhere between those values is a sane first move. If you set it at 95%, you have effectively done nothing. If you set it at 20%, you will reject jobs all day and blame the platform for your own settings.

Workspace-level surge protection is even more interesting. It lets you treat workspaces as:

Available: normal shared behavior
Mission Critical: exempt from workspace-level blocking
Blocked: manually or automatically rejected

This is one of the few real governance levers Fabric gives you for noisy-neighbor control. Use it. One runaway workspace should not be allowed to burn an entire shared capacity for everyone else.

But be honest about the limitation: mission-critical status does not override overall capacity throttling. If the whole SKU is cooked, everything still hurts. Surge protection is a guardrail. It is not a substitute for separating blast radius.

The workspace is your SQL isolation boundary

Warehouse teams often think only in terms of tables and queries. In Fabric, the workspace boundary matters just as much.

For Warehouse and SQL analytics endpoint, each workspace gets its own SQL compute boundary. By default, Fabric splits available compute into isolated SELECT and non-SELECT pools. That 50/50 split is there for a reason: read pressure and ingestion pressure should not destroy each other.

If one workspace is serving dashboard traffic and handling ETL-heavy writes at the same time, inspect the workload before you just buy a bigger SKU.

Start with the DMVs:

SELECT TOP 20
    session_id,
    status,
    command,
    total_elapsed_time_ms / 1000.0 AS elapsed_seconds,
    submit_time
FROM sys.dm_exec_requests
ORDER BY total_elapsed_time_ms DESC;

Then ask a harder question: should these readers and writers even be on the same workspace engine?

For read-heavy patterns, consider exposing the curated tables into a second workspace by using OneLake shortcuts. That gives you another SQL isolation boundary without creating a second copy of the data. It is often a cleaner fix than endlessly tuning query timeouts and hoping concurrency improves.

Reduce background spend before you scale up

Many capacity issues are not really capacity issues. They are waste issues.

Common examples:

Spark jobs shuffling far more data than necessary
Delta tables fragmented into thousands of small files
refresh windows stacked on top of each other for no business reason
notebooks used for workloads that would fit in pandas or SQL

Before you resize a capacity, fix the obvious waste first. My Spark notebook optimization guide and Delta Lake maintenance guide cover the mechanics in detail. At governance level, the point is simple: bad physics on one workload becomes shared pain for every workload on that capacity.

Here is a small example of reducing background pressure in a notebook before it ever hits your capacity budget:

from pyspark.sql.functions import broadcast, col, current_date, date_sub, sum

spark.conf.set("spark.sql.adaptive.enabled", "true")
spark.conf.set("spark.sql.shuffle.partitions", "400")

sales = (
    spark.table("sales")
    .filter(col("date") >= date_sub(current_date(), 7))
    .select("customer_id", "product_id", "amount", "date")
)

products = spark.table("dim_product").select("product_id", "category")

result = (
    sales.join(broadcast(products), "product_id")
    .groupBy("category")
    .agg(sum("amount").alias("amount"))
)

result.write.mode("overwrite").saveAsTable("category_sales_last_7_days")

The code itself is not the main point here. The point is that fewer shuffled bytes, better joins, and less wasteful writes reduce the 24-hour bill your entire capacity has to carry.

Also schedule maintenance like OPTIMIZE, heavy refreshes, and backfills as if other people exist. Because they do.

Alert on leading indicators, not just on failures

If your first signal is an end user complaining, your monitoring is already late.

Use the Fabric Capacity Metrics app for the detailed story: throttling state, burndown, timepoint drillthrough, and workspace contribution. Use Azure-level monitoring for the tripwire that gets your attention.

For example, you can alert on the Fabric capacity resource itself:

resource actionGroup 'Microsoft.Insights/actionGroups@2023-01-01' = {
  name: 'ag-fabric-ops'
  location: 'global'
  properties: {
    groupShortName: 'fabops'
    emailReceivers: [
      {
        name: 'platform-team'
        emailAddress: 'fabric-alerts@contoso.com'
      }
    ]
  }
}

resource alert 'Microsoft.Insights/metricAlerts@2018-03-01' = {
  name: 'fabric-capacity-utilization-high'
  location: 'global'
  properties: {
    scopes: [
      fabricCapacity.id
    ]
    severity: 2
    enabled: true
    evaluationFrequency: 'PT5M'
    windowSize: 'PT15M'
    criteria: {
      allOf: [
        {
          name: 'high-capacity-utilization'
          metricNamespace: 'Microsoft.Fabric/capacities'
          metricName: 'CapacityUtilization'
          operator: 'GreaterThan'
          threshold: 85
          timeAggregation: 'Average'
          criterionType: 'StaticThresholdCriterion'
        }
      ]
    }
    actions: [
      {
        actionGroupId: actionGroup.id
      }
    ]
  }
}

That alert alone will not explain the root cause, but it gets you moving before the helpdesk does. If you want the full monitoring stack, I already wrote about building a Fabric capacity monitor and deploying it with Azure Bicep.

Common governance mistakes

These show up constantly:

One giant shared capacity

Cheap on paper. Expensive in user pain.

Everything marked mission critical

If everything is critical, nothing is. You just removed your own control plane.

Using daily averages for sizing

Fabric pressure is about windows, carryforward, and recovery time. Averages hide all three.

Scaling before fixing storage layout

A fragmented Delta table can waste more CU than a lot of people realize. Capacity upgrades do not cure bad file layout.

Treating workspaces as folders

In Fabric they are also compute and governance boundaries. Design them like it matters, because it does.

Final thought

Fabric capacity governance is mostly about deciding who gets to hurt whom.

That sounds harsh, but it is true. Shared compute means shared consequences. The real architecture decision is not whether an F32 or F64 is cheaper. It is whether one workspace, one refresh chain, or one notebook backlog is allowed to ruin everyone else's day.

Use separate capacities where the blast radius justifies it. Use workspace-level controls where sharing is still sensible. Fix waste before buying more SKU. And when you do scale, do it because demand is real, not because governance was missing.

Fabric is a good platform for mixed analytics. It is a bad platform for wishful thinking.