Our product Turbo360 grows very rapidly with a constant new release every 2 weeks that add new features or enhancements. The regular flow of customer feedback and new requirements keeps our team busy whilst challenging our knowledge of the Azure Service Bus and other technologies. It is time to share some of the knowledge we have gained while developing these features to assist you with the operation of your Service Bus.
The latest customer feedback included resubmitting messages in bulk from the queue and deleting them on success. The requirement sounds very simple, but there are multiple factors that make the requirement hard to achieve if you lack the knowledge on how queues operate and what are the possibilities.
Challenge with Lock Duration
The first challenge is Time, specifically lock duration for every message once they have been retrieved. The problem is that the messages must be displayed in the UI and then provide time for the customer to choose the messages to be resubmitted or deleted. The solution would be to ask the customer to increase the lock duration, but this is not possible due to several reasons. Firstly, increasing lock duration might not be possible due to customer integration, so we cannot depend on that.
API Architecture doesn’t provide way to retain Client
Secondly, once the API calls have returned the JSON array representing the messages, it is impossible to return to the same thread after the Resubmit API have been called.
Initial proof of concept was utilizing Peek() method with PeekLock mode, which according to the Microsoft documentation will return the messages regardless of their state. The messages were filtered and only active messages were returned in a response to the UI. Then the selected messages were passed to Resubmit API and using the same approach compared and resubmitted. Well at least in theory, as once we worked on resubmitting we encountered different errors.
Now, this was an important learning curve, which is not fully documented or at least clearly explained. There are two things you need to consider when you decide to retrieve the messages:
- The method to get the messages: either Receive() or Peek()
- The mode in which you get the messages: PeekLock or ReceiveAndDelete
Challenges with Retrieval Operations provided by Service Bus Client Library
When you use the Receive() method with PeekLock mode, the method will return you a certain number of messages from the beginning of the queue but will keep a copy within the queue. Those messages will be locked for specified lock duration which can be found in the queue properties. The messages that you will receive is going to remain in the queue until you call Complete() to delete the messages or Abandon() to return the messages and release the lock. If you use ReceiveAndDelete mode everything will still act the same as with PeekLock, but the message will be removed from the queue, so if in any case your process fails or crashes the message will be gone and there is no way to get it back.
Users of Service Bus Explorer tool experience this issue and are unable to use it in Production Environment.
On the other hand, the Peek() method will return messages from the beginning of the queue regardless if they are locked or not. Now, this is where it gets confusing, the PeekLock mode together with Peek() method will not lock the messages. This is due to the fact that Peek() method only returns the shell of a message without the message context. The message context is what allows the user to control the message e.g. lock it, unlock it, delete it, etc. The PeekLock mode makes it confusing as it mentions it will lock something, while the Peek() method will just ignore it any of the modes. This is the reason why our initial POC was a failure and the development team had to look for other ways to achieve the task.
There was no message context, so resubmitting using Peek() method is not possible. The next step was to replace resubmit logic and use Receive() method to fetch messages and compare the IDs to find the ones we want to act upon. However, the test shows that this approach was unreliable and the difference in results between Receive() and Peek() is too big. It would take a lot of time before the process could identify the messages that the user wants to resubmit, especially if there are thousands of messages. The last approach was to use Receive() method to display the messages and unlock them right away, then the second time we call it we should get very close results.
Message Retrieval is considered as a transaction and every transaction is billed
However, this was not the case after calling resubmit we would get mixed results and it turned out to be unreliable. Furthermore, not a lot of people realize that every call to Service Bus API is charged, therefore, users should utilize Batch methods when possible to reduce the cost. The other problem with that approach was that unlocking the message requires a call to Abandon and there is no batch Abandon method, so for thousands of messages, it could have a significant impact of the additional charges for the customer.
Defer a Message for easy retrieval
The development team has reached a point of denial and the idea was about to be declined, but then IntelliSense came to rescue. The Receive() method can accept the sequence number of a message that you want to receive, which is exactly what was required to complete the task. However, the POC showed that the Receive() method was not returning any results when the sequence number was provided. Therefore, further research on this subject was conducted to identify what needs to be done for the receiver to return messages by providing the sequence number. It turns out that this approach is used to fetch messages that are in the deferred state. The defer method allows the user to mark a message as suspended, it is useful in scenarios where you received a message, but the process is not ready to complete it e.g. waiting for a result from another API call. The process can suspend the message from further processing by deferring it for later use. It is important to note that the user must save the sequence number of the message and store is safe as this will be the only way to get back that message. Once the message is in deferred state the receiver will not return it unless the process supplies it with an appropriate sequence number.
Once a Message is deferred, it can only be retrieved using a Sequence Number
Finally, this was the approved way of resubmitting and deleting the messages from the queue. The user of Turbo360 will need to take a known decision to defer messages through our UI to later resubmit it or delete it safely from their queue. The best part is that if for any reason the browser crashes or the internal process fails after the messages have been deferred, the user can still access those messages through “Deferred Dead Letter” tab inside the operations section of Turbo360.