Tech Talk with Jamel Cato

 

Jamel H. Cato is the
CEO of Cato Systems.

A client recently came to me with an interesting enhancement request. I had previously built an Azure Data Factory for them that performed a complex 4-Tier ETL process in SQL Server. The Factory was working great, but due to new operational requirements, several pipelines within it needed to run both on a regular schedule and on-demand whenever a certain event occurred. The problem, if you’re familiar with ADF, is that a pipeline can have either a schedule-based trigger or an event-based trigger. (And if you’re not familiar with Azure Data Factory, it’s essentially SSIS that runs entirely in the Cloud.)

There is a simple and elegant solution to this problem: Just have your ADF pipeline check for the existence of a file at the very beginning and only perform its normal operations if that file exists. This file can be placed in the target location on a regular schedule (making the pipeline time based) as well as whenever a triggering event occurs (making it simultaneously event based.) Then simply add a Trigger to the pipeline than runs every 5 minutes.

Because Azure Data Factories can natively access objects in Azure Storage, the easiest way to implement this solution is to place an empty file into a blob storage container. That empty file will serve as the trigger for the pipeline.

Here are the steps:

Add a Get Metadata Activity as the very first step of your pipeline. For clarity name it ‘File Check’. Point the dataset of this activity to the location of an empty file stored in Azure storage. Then add the Exists and Item name items to the Field List. It should look like this in ADF Studio:

If you only need your pipeline to carry out its activities on specific dates, then within the Dataset append a date stamp to the name of your target file with a pipeline expression like this:

Next, drag the green success endpoint of the File Check activity to the next activity in your pipeline. Like in SSIS, this is a precedence constraint that will control the flow of your job. In this case, it will prevent the execution of the next activity unless the target file exists because behind the scenes the pipeline automatically evaluates this expression from the first step:

At this point, your pipeline satisfies our original ask. However my preference is to take this one step further by wrapping all the original activities within an If Condition Activity. This has two benefits. First, it makes the pipeline more understandable because it is visually clear that everything within the If Condition Activity is conditionally triggered. Even better, you’ll see something easily readable like this when you click on it (which can be a lifesaver if you built the pipeline long ago or inherited it from another developer):

Now that you have your pipeline configured to check for and conditionally act upon your target file, the last two things to do are attach a trigger to your pipeline that runs very often (such as every 5 minutes) and then develop some process to automatically place your target file in the target folder. I usually satisfy the latter requirement by writing a PowerShell or Python script.

Finally, don’t forget to delete your target file after the pipeline runs. I typically do this with a Delete activity at the very end of the pipeline itself, but if a Factory has many pipelines or many target files, then I’ll resort to creating a Power Automate flow to loop through the folders and delete as it goes.

If, like me, you are a SQL developer, you may be thinking that you could accomplish the same objective by simply checking a column in a table. While that’s technically true, I would caution you to think twice about that approach here because it requires a database, a database connection and a database read operation, all of which are financially expensive in Microsoft Azure. Plus, DB connections in Data Factory are a challenge to migrate when you deploy to production, especially if they are secured by an Azure Key Vault. But anyone can drop a file in a folder, where the metadata check and empty file in Azure storage will together cost a fraction of one cent.

I hope this quick tip makes you look like a hero to your team. Happy data crunching.

— Jamel