You may have used Durable Functions to implement various kinds of workflows or long running tasks, but have you considered what they do under the hood? I was curious about those details and thought I'd share notes from going through the Durable Functions and DurableTask repositories. To be clear we will be focusing on the Azure Storage implementation of DurableTask which is used by default with Durable Functions.

Trigger binding

The Functions host will discover available functions during startup. If we generate a Durable Function in Visual Studio, we get an orchestrator function like this:

[FunctionName("Function1")]
public static async Task<List<string>> RunOrchestrator(
    [OrchestrationTrigger] IDurableOrchestrationContext context)
{
}

Durable Functions provides a "binding provider" for the [OrchestrationTrigger] attribute: OrchestrationTriggerAttributeBindingProvider. There are also similar classes for activity and entity triggers. The Functions host will call TryCreateAsync on this class to create a binding object. In this case a OrchestrationTriggerBinding is returned. This binding has a CreateListenerAsync method that gets called as well and returns a DurableTaskListener.

Tomasz Pęczek has a pretty nice article on how Functions extensions work if you are interested.

Starting the listener

DurableTaskListener will be responsible for starting Function executions. Its StartAsync method is called by the host to start it. There are two main things this method does:

  1. Create things needed by durability provider (in this case Storage blobs, tables and queues)
  2. Start listening for orchestration/activity messages that will trigger Function executions

Here is a (hopefully helpful) sequence diagram of what happens:

Sequence diagram of DurableTaskListener StartAsync

The only thing StartAsync itself does is call DurableTaskExtension.StartTaskHubWorkerIfNotStartedAsync. This method uses a boolean flag combined with an AsyncLock to ensure it only runs once. CreateIfNotExistsAsync is called on the default durability provider and the task hub worker is started.

The durability provider class calls into AzureStorageOrchestrationService.CreateIfNotExistsAsync. This one service is the core of the DurableTask implementation on Azure Storage. It has around 2000 lines of code though, maybe it could use some refactoring! There are also implementations for other providers in the DurableTask repository: Service Fabric, Redis, Service Bus, and SQL Server. There is also an "Emulator" implementation that is an in-process implementation for testing purposes. CreateIfNotExistsAsync just calls EnsureTaskHubAsync to create the "task hub" if it does not already exist.

Ensuring the task hub exists

The EnsureTaskHubAsync method is designed in such a way that it only runs the initialization once; though it can be reset if the provider runs into connection issues etc. You can find the implementation in GetTaskHubCreatorTask. The "task hub" refers to blob containers, tables and queues needed to run orchestrations. It will have a default name "TestHubName" when running locally. I usually change the name for each project through host.json so they use their own queues and won't get mixed up with other projects.

When EnsureTaskHubAsync first runs, it will create (replace "taskhub" with the name of your task hub):

  1. Blob container taskhub-applease
  2. Blob taskhub-appleaseinfo in above container
  3. Blob container taskhub-leases
  4. Tables taskhubHistory and taskhubInstances
  5. Work item queue taskhub-workitems
  6. Control queues (e.g. taskhub-control-partitionnum), one for each partition
  7. Blobs for each control queue in taskhub-leases container

The app lease blob is utilized by the AppLeaseManager class to "ensure a single app's partition manager is started at a time".

The lease blobs in the leases container are used by DurableTask to figure out which instance has control of which partition currently. So if you have for example two instances in Azure Functions and there are four partitions, both of them could have control of two partitions to split the work.

We will discuss leases and how DurableTask deals with partition ownership in a later part in more detail, so please stay tuned.

The "tracking store" is what DurableTask uses to store information about the orchestrations and their state. In the Azure Storage implementation this is implemented with the history and instances tables. The status of each orchestration is stored in the instances table. The history table is where all events are stored; it is used by Durable Functions for orchestrator replay.

Control queues and the work item queue are discussed in a lot of detail in my previous article on how Durable Functions scale. They are used by the Azure Storage implementation to trigger orchestrator and activity functions. Each of the control queues gets two lease blobs to control which instance is reading messages from the queue.

Starting the task hub worker

Now that the durability provider has finished creating the things it needs to run orchestrations, we can move on to TaskHubWorker.StartAsync. This creates a TaskOrchestrationDispatcher and a TaskActivityDispatcher. These classes seem to be responsible for getting new messages and triggering execution for them. Next AzureStorageOrchestrationService.StartAsync is called.

AzureStorageOrchestrationService.CreateIfNotExistsAsync gets called again, though this time it won't do anything since the things in Azure Storage were already created earlier.

AzureTableTrackingStore.StartAsync only disables Nagle's algorithm for the history and instances table URIs. There is an older blog article mentioned in the comments of DurableTask that says disabling Nagle's algorithm can greatly improve throughput for table inserts and updates in Azure Storage.

The last thing that the orchestration service does is call AppLeaseManager.StartAsync. We will discuss leases in more detail in a future part, but essentially this will start attempting to acquire the app lease. If it is acquired, the partition manager is started.

The resources required by Durable Functions/DurableTask now exist and the listener has been started. When a message is received in a control queue, an orchestration will be started. But let us discuss that in more detail the next time in part 2.

Summary

When a Functions app with Durable Functions starts up, two operations are done at a high level:

  1. Create required things in Azure Storage (blobs, tables, queues)
  2. Start listening for messages

There was quite a bit of code to go through in the two repositories, but the flow here is quite straightforward to follow.

Next time we will look at what happens when an orchestration is started:

[FunctionName("Function1_HttpStart")]
public static async Task<HttpResponseMessage> HttpStart(
    [HttpTrigger(AuthorizationLevel.Anonymous, "get", "post")] HttpRequestMessage req,
    [DurableClient] IDurableClient starter)
{
    string instanceId = await starter.StartNewAsync("Function1", null);
}

Links