Stop Blaming Users—Your Pipeline Is the Problem

Description

Ever wonder why your Dataverse pipeline feels like it’s built out of duct tape and bad decisions? You’re not alone. Most of us end up picking between Synapse Link and Dataflow Gen2 without a clear idea of which one actually fits. That’s what kills projects — picking wrong. Here’s the promise: by the end of this, you’ll know which to choose based on refresh frequency, storage ownership and cost, and rollback safety — the three things that decide whether your project hums along or blows up at 2 a.m. For context, Dataflow Gen2 caps out at 48 refreshes per day (about every 30 minutes), while Synapse Link can push as fast as every 15 minutes if you’re willing to manage compute. Hit subscribe to the M365.Show newsletter at m365 dot show for the full cheat sheet and follow the M365.Show Linkedin page for MVP livestreams. Now, let’s put the scalpel on the table and talk about control.The Scalpel on the Table: Synapse Link’s Control ObsessionYou ever meet that one engineer who measures coffee beans with a digital scale? Not eyeball it, not a scoop, but grams on the nose. That’s the Synapse Link personality. This tool isn’t built for quick fixes or “close enough.” It’s built for the teams who want to tune, monitor, and control every moving part of their pipeline. If that’s your style, you’ll be thrilled. If not, there’s a good chance you’ll feel like you’ve been handed a jet engine manual when all you wanted was a light switch. At its core, Synapse Link is Microsoft giving you the sharpest blade in the drawer. You decide which Dataverse tables to sync. You can narrow it to only the fields you need, dictate refresh schedules, and direct where the data lands. And here’s the important part: it exports data into your own Azure Data Lake Storage Gen2 account, not into Microsoft’s managed Dataverse lake. That means you own the data, you control access, and you satisfy those governance and compliance folks who ask endless questions about where data physically lives. But that freedom comes with a trade-off. If you want Delta files that Fabric tools can consume directly, it’s up to you to manage that conversion — either by enabling Synapse’s transformation or spinning up Spark jobs. No one’s doing it for you. Control and flexibility, yes. But also your compute bill, your responsibility. And speaking of responsibility, setup is not some two-click wizard. You’re provisioning Azure resources: an active subscription, a resource group, a storage account with hierarchical namespace enabled, plus an app registration with the right permissions or a service principal with data lake roles. Miss one setting, and your sync won’t even start. It’s the opposite of a low-code “just works” setup. This is infrastructure-first, so anyone running it needs to be comfortable with the Azure portal and permissions at a granular level. Let’s go back to that freedom. The draw here is selective syncing and near-real-time refreshes. With Synapse Link, refreshes can run as often as every 15 minutes. For revenue forecasting dashboards or operational reporting — think sales orders that need to appear in Fabric within the hour — that precision is gold. Teams can engineer their pipelines to pull only the tables they need, partition the outputs into optimal formats, and minimize unnecessary storage. It’s exactly the kind of setup you’d want if you’re running pipelines with transformations before shipping data into a warehouse or lakehouse. But precision has a cost. Every refresh you tighten, every table you add, every column you leave in “just in case” spins up compute jobs. That means resources in Azure are running on your dime. Which also means finance is involved sooner than later. The bargain you’re striking is clear: total control plus table-level precision equals heavy operational overhead if you’re not disciplined with scoping and scheduling. Let me share a cautionary tale. One enterprise wanted fine-grain control and jumped into Synapse Link with excitement. They scoped tables carefully, enabled hourly syncs, even partitioned their exports. It worked beautifully for a while — until multiple teams set up overlapping links on the same dataset. Suddenly, they had redundant refreshes running at overlapping intervals, duplicated data spread across multiple lakes, and governance meetings that felt like crime-scene investigations. The problem wasn’t the tool. It was that giving everyone surgical precision with no central rules led to chaos. The lesson: governance has to be baked in from day one, or Synapse Link will expose every gap in your processes. From a technical angle, it’s impressive. Data lands in Parquet, not some black-box service. You can pipe it wherever you want — Lakehouse, Warehouse, or even external analytics platforms. That open format and storage ownership are exactly what makes engineers excited. Synapse Link isn’t trying to hide the internals. It’s exposing them and expecting you to handle them properly. If your team already has infrastructure for pipeline monitoring, cost management, and security — Synapse Link slots right in. If you don’t, it can sink you fast. So who’s the right audience? If you’re a data engineer who wants to trace each byte, control scheduling down to the quarter-hour, and satisfy compliance by controlling exactly where the data lives, Synapse Link is the right choice. A concrete example: you’re running near-real-time sales feeds into Fabric for forecasting. You only need four tables, but you need them every 15 minutes. You want to avoid extra Dataverse storage costs while running downstream machine learning pipelines. Synapse Link makes perfect sense there. If you’re a business analyst who just wants to light up a Power BI dashboard, this is the wrong tool. It’s like giving a surgical kit to someone who just wanted to open Amazon packages. Bottom line, Synapse Link gives surgical-grade control of your Dataverse integration. That’s freeing if you have the skills, infrastructure, and budgets to handle it. But without that, it’s complexity overload. And let’s be real: most teams don’t need scalpel-level control just to get a dashboard working. Sometimes speed and simplicity mean more than precision. And that’s where the other option shows up — not the scalpel, but the multitool. Sometimes you don’t need surgical precision. You just need something fast, cheap, and easy enough to get the job done without bleeding everywhere.The Swiss Army Knife That Breaks Nail Files: Dataflow Gen2’s Low-Code MagicIf Synapse Link is for control freaks, Dataflow Gen2 is for the rest of us who just want to see something on a dashboard before lunch. Think of it as that cheap multitool hanging by the cash register at the gas station. It’s not elegant, it’s not durable, but it can get you through a surprising number of situations. The whole point here is speed — moving Dataverse data into Fabric without needing a dedicated data engineer lurking behind every button click. Where Synapse feels like a surgical suite, Dataflow Gen2 is more like grabbing the screwdriver out of the kitchen drawer. Any Power BI user can pick tables, apply a few drag‑and‑drop transformations, and send the output straight into Fabric Lakehouses or Warehouses. No SQL scripts, no complex Azure provisioning. Analysts, low‑code makers, and even the guy in marketing who runs six dashboards can spin up a Dataflow in minutes. Demo time: imagine setting up a customer engagement dashboard, pulling leads and contact tables straight from Dataverse. You’ll have visuals running before your coffee goes cold. Sounds impressive — but the gotchas show up the minute you start scheduling refreshes. Here’s the ceiling you can’t push through: Dataflow Gen2 runs refreshes up to 48 times a day — that’s once every 30 minutes at best. No faster. And unlike Synapse, you don’t get true incremental loads or row‑level updates. What happens is one of two things: append mode, which keeps adding to the Delta table in OneLake, or overwrite mode, which completely replaces the table contents during each run. That’s great if you’re testing a demo, but it can be disastrous if you’re depending on precise tracking or rollback. A lot of teams miss this nuance and assume it works like a transactionally safe system. It’s not — it’s bulk append or wholesale replace. I’ve seen the pain firsthand. One finance dashboard was hailed as a success story after a team stood it up in under an hour with Dataflow Gen2. Two weeks later, their nightly overwrite job was wiping historical rows. To leadership, the dashboard looked fine. Under the hood? Years of transaction history were half scrambled and permanently lost. That’s not a “quirk” — that’s structural. Dataflow doesn’t give you row‑level delta tracking or rollback states. You either keep every refresh stacked up with append (risking bloat and duplication) or overwrite and pray the current version is correct. Now, let’s talk money. Synapse makes you pull out the checkbook for Azure storage and compute. With Dataflow Gen2, it’s tied to Fabric capacity units. That’s a whole different kind of silent killer. It doesn’t run up Azure GB charges — instead, every refresh eats into a pool of capacity. If you don’t manage refresh frequency and volume, you’ll burn CUs faster than you expect. At first you barely notice; then, during mid‑day loads, your workspace slows to a crawl because too many Dataflows are chewing the same capacity pie. The users don’t blame poor scheduling — they just say “Fabric is slow.” That’s how sneaky the cost trade‑off works. And don’t overlook governance here. Dataflow Gen2 feels almost too open-handed. You can pick tables, filter columns, and mash them into golden datasets… right up until refresh jobs collide. I’ve watched teams unknowingly set up Dev and Prod Dataflows pointing at the same destination tables, overwriting each other at 2:00 a.m. That’s not a one‑off disaster — it’s a pattern if nobody enforces workspaces, naming, and ow

Become a supporter of this podcast: https://www.spreaker.com/podcast/m365-fm-modern-work-security-and-productivity-with-microsoft-365--6704921/support.

If this clashes with how you’ve seen it play out, I’m always curious. I use LinkedIn for the back-and-forth.

Listen

Description

Want to check another podcast?