MessageLockLostException in Azure WebJobs

Microsoft Azure is a strange beast. And no part of it is stranger than the Azure Service Bus. It’s possible to experience a MessageLockLostException if your WebJob isn’t handling the lock on the message correctly, even when the application is performing as expected within the debugger.

The Setup

In the case I was experiencing, the original developer opted to use the auto-magical ServiceBusTrigger to listen for incoming messages on the service bus. They had reasonably assumed that it takes care of managing message locks. In fact, until now, the service had worked for a long time without any problems. So the message was being deserialized and consumed much like the following code:

public static void ReceiveMessage([ServiceBusTrigger("myqueue")] BrokeredMessage receivedMessage)
{
	MyMessageType message = null;

	try
	{
		message = receivedMessage.GetBody<MyMessageType>();

		if (message == null)
		{
			throw new NullReferenceException("Message body returned null MyMessageType object");
		}
	}
	catch (Exception e)
	{
		Trace.TraceError($"Unable to deserialize queue item to MyMessageType: {e}");

		receivedMessage.DeadLetter();
	}

	if (message == null)
        {
		return;
	}
	
	try
	{
		Trace.TraceInformation($"Received message with ID: {receivedMessage.MessageId}");

		HandleMessageContent();
		
		Trace.TraceInformation($"Done with message with ID: {receivedMessage.MessageId}");
	}
	catch (Exception e)
	{
		Trace.TraceError($"Error trying to handle message: {e}");
		
		receivedMessage.DeadLetter();
	}
	finally
	{
		receivedMessage.Complete();
	}
}

Unfortunately, for some reason this had decided to stop working well on our QA environment, and on any new environments going forward. Out of the blue, every time a message went to the queue it would fail and go straight to the deadletter queue with

Microsoft.ServiceBus.Messaging.MessageLockLostException: The lock supplied is invalid. Either the lock expired, or the message has already been removed from the queue.

The Fix

So I spent some time looking around the forums for some assistance. I couldn’t find much that directly addresses this kind of problem, but seeing a response to a question on how to delete a message that has experienced a MessageLockLostException made me think that putting something in place to actively handle the message lock would probably resolve the issue.

Here’s what I ended up doing. Now we can send as many messages through the queue as we like, and MessageLockLostException is a thing of the past!

private const int MESSAGE_LOCK_TIME_OUT_IN_MS = 500;

public static void ReceiveMessage([ServiceBusTrigger("myqueue")] BrokeredMessage receivedMessage)
{
	Trace.TraceInformation("Recieved message event trigger from \"myqueue\"");
	Trace.TraceInformation($"Received message with ID: {receivedMessage.MessageId}");

	var renewMessageCancellationTokenSource = new CancellationTokenSource();
	RenewMessageLock(receivedMessage, MESSAGE_LOCK_TIME_OUT_IN_MS, renewMessageCancellationTokenSource);

	try
	{
		var message = receivedMessage.GetBody<MyMessageType>();

		if (message == null)
		{
			throw new NullReferenceException("Message body returned null MyMessageType object");
		}

		HandleMessageContent();
		
		Trace.TraceInformation($"Done with message message with ID {receivedMessage.MessageId}");

		receivedMessage.Complete();
	}
	catch (MessageLockLostException)
	{
		Trace.TraceInformation($"Message lock lost on {receivedMessage.MessageId}");

		// Release the lock and retry the message
	}
	catch (Exception e)
	{
		Trace.TraceError($"Error trying to sync entities: {e}");
		
		receivedMessage.DeadLetter();
	}
	finally
	{
		renewMessageCancellationTokenSource.Cancel();
	}
}

private static void RenewMessageLock(BrokeredMessage brokeredMessage, int messageLockTimeoutInMs, CancellationTokenSource cancellationTokenSource)
{
	Trace.TraceInformation($"Renewing message lock for {messageLockTimeoutInMs}");

	Task renewMessageTask = Task.Factory.StartNew(() =>
	{
		while (!cancellationTokenSource.Token.IsCancellationRequested)
		{
			if (DateTime.UtcNow > brokeredMessage.LockedUntilUtc.AddSeconds(-10))
			{
				brokeredMessage.RenewLock();
			}

			Thread.Sleep(messageLockTimeoutInMs);
		}
	}, cancellationTokenSource.Token);
}

2 Replies to “MessageLockLostException in Azure WebJobs”

  1. A few potential performance items:
    1. Switching from public static void ReceiveMessage( to public static async Task ReceiveMessage so that you can use async overload of the ASB SDK.
    2. You might want to replace Thread.Sleep(messageLockTimeoutInMs); with Task.Delay(messageLockTimeoutInMs); I
    Formatting got a little messed.

    Just curious why did you DLQ your message on a MessageLockLostException? Messages can be delivered several time and until DeliveryCount is not exhausted, message shouldn’t be DLQed.

    1. Thank you for your feedback Sean. All tips are thoroughly welcome and appreciated!

      In response to your question on why I deadletter the message, there is an issue of synchronicity with our systems. The queue is being used to propagate data from system A to system B. Each action undertaken on system A is represented as an atomic unit and persisted across the service bus. An initial action to create a record on system B might be followed up by an action to update that same record. However, if the record failed to be created due to a lost message lock, the subsequent update will fail, but for potentially an entirely different reason (the data does not exist to update).

      I only inherited this code recently and was looking to resolve the immediate problem with the MessageLockLostException. I haven’t yet delved too much in to the specifics of how the service bus retries message delivery, but it has to take this dependency between events into account. We have a monitoring process right now that can detect when the data is out of sync and rectify it before it becomes a problem. It’s less than ideal, but this system to system process is transient and being phased out in the coming months by an architecture that better suits our needs.

      You do make a good point though, it’s not a good example to promote as it’s likely not what somebody wants to do. I’ve modified the code to take this in to account.

Leave a Reply

Your email address will not be published. Required fields are marked *