Be on top of Azure Service Bus issues with proactive monitoring

Role of Azure Service Bus in Enterprise Integrations

Gone are days of large applications having tens of servers to deal with gigabytes of data, when seconds of response time and hours of offline maintenance were acceptable. Modern applications are deployed on everything running thousands of multi-core processors; end-users expect millisecond response times and 100% uptime. Need not mention the applications work with Data in Petabytes

The internet has become the lifeline of connected people, technologies, applications, devices, and data across the globe. We now have many distributed applications that can reach anywhere and use any amount of data to improve business efficiency, productivity, and, ultimately, customer satisfaction. Modern applications turn out to be an integration of multiple distributed applications.

Achieving seamless communication between these distributed applications is critical. I agree that APIs do the job, but how do you ensure asynchronous communication in scenarios that involve 24×7 business service? What if your message sender and receiver aren’t available simultaneously? Azure Service Bus is the savior!

Microsoft Azure Service Bus is a highly-scalable and reliable Enterprise Messaging Service that can connect software, including cloud applications, on-premises applications, and Azure services.

Let us consider a simple e-commerce scenario to understand the significance of Azure Service Bus in an integration.

The above message-driven integration ensures achieving the following:

Asynchronous message passing: Irrespective of variations in the incoming traffic, the customer-facing website should deliver the expected experience. Every submitted order needs to be asynchronously processed to meet this expectation. In this scenario, the responsibility of the customer-facing website is only to compile the order to a message and push it into the Service Bus Queue. The messages are asynchronously processed by the Logic App listening to the Queue.
Loosely coupled components: The website and the backend order processing Logic App are decoupled using the Service Bus Queue. They both need not be online at the same time to process the incoming orders. This design also facilitates containing the failures within the components. Even if the Logic App is down, the website can continue to receive the orders. Once up and running, the Logic App can process the orders, making the entire integration reliable.
Delegate failures as messages: Service Bus has a secondary sub-queue called Dead letter Queue to hold the messages that cannot be delivered to any receiver or messages that cannot be processed right now. The business use-case of this feature in the above scenario would be like the Logic App processes an order, finds the item ordered for is out of stock, the Logic App can dead-letter the message with the custom dead letter reason, “Item_out_of_Stock.” This can trigger a failover action to initiate reordering; failure is delegated as messages.
Reliability & Resilience: Azure Service Bus, in this scenario, ensures that messages aren’t lost even if communication fails between the website and the backend processor. The website can post messages to the Queue, and the Logic App can retrieve them when communication is reestablished. The website isn’t blocked unless it loses connectivity with the Service Bus Queue. Persisting the message in the Queue also enables resilience. If the Logic App fails while processing the message, another consumer can process the message.

Service Bus features that fit into the business

In my experience with Enterprises across business domains, I observed that Azure Service Bus is inevitable in Enterprise integration applications. I also found that the features of Service Bus Queues and Topics fit into the business needs:

Enabling sessions on the Service Bus Queue or Topic can guarantee ordered processing for related messages. While processing Order Line Items, use OrderId to set the SessionId and force all Order Line Items of an Order to be processed together in the order of their arrival.

A typical business scenario to ignore the duplicate orders within a specified time frame can be achieved through a simple property configuration on Service Bus Queue and Topics. Turning on ‘Duplicate Detection’ and configuring the ‘Duplicate Detection Timeframe Window’ will do.

Failures in business are inevitable, be it expected business exceptions or unprecedented system exceptions. Service Bus facilitates handling them effectively. Enabling dead letters on the Service Bus Queues or Topics can help push the message that cannot be processed right away to the secondary sub-queue for investigation and reprocessing later. In this e-commerce scenario, this feature can help manage the orders that cannot be processed due to business reasons like ‘Item out of stock’ or ‘Vendor inactive.’ The system exceptions that would push the dead letter queue might be ‘Time to live expiry,’ ‘Session id is null,’ ‘Message header size exceeded.’

I recommend this blog to explore a few other valuable features that Azure Service Bus offers.

Why should we monitor Service Bus?

Dead Letters

A message in a Service Bus Queue will end up in the dead-letter Queue for several reasons, as discussed in the section above. The purpose of the dead-letter Queue is to hold messages that can’t be delivered to any receiver or messages that can’t be processed. Messages can then be removed from the DLQ and inspected. With the help of an operator, an application might correct issues and resubmit the message, log the fact that there was an error, and take corrective action.

Mapping this to the e-commerce scenario we are dealing with, every dead-letter message is an order that has not been processed, which needs to be investigated and reprocessed. It is critical for the business to eye the dead letter messages.

Message stagnation

Assume the Logic app listening to the Queue in the e-commerce scenario above is down, which will lead to the active message being stagnated. If the Queue is set with ‘Auto Delete on Idle’ property, there is a possible chance of losing their messages which are actual incoming orders. It is necessary to monitor the active message count on the Service Bus Queue or Topic Subscription.

Server Errors

Azure Service Bus is a reliable Enterprise-grade messaging service; still, it is necessary to monitor any server error on the service to prevent issues due to outages in the region in which the Service Bus resource exists.

User Errors

This interesting metric helped me discover thousands of exceptions raised by the client application accessing the Service Bus Queues in one of my Customer’s Integration. I also shared my findings on the User errors in a StackOverflow response. Do check this out.

Size

Size of the Queue or Topic is defined (1, 2, 3, 4 GB or 5 GB) upon creation/ update. If the size exceeds the defined limit, the subsequent incoming messages will be rejected, and an exception will be sent to the application that sent the message. It is necessary to monitor the Queue size to prevent message loss and exceptions at the user-facing website end.

These are a few major reasons to focus on monitoring Azure Service Bus.

How can I be on top of the Service Bus issues?

Azure Monitor offers to monitor the metrics of Azure Service Bus Queues and Topics. But there are a few gaps that can be addressed using Turbo360. Microsoft acknowledges Turbo360 as a Service Bus Management tool on their documentation page. Let me throw light on some of the critical features in Turbo360 to achieve observability and remediation of the Service Bus issues.

Continuous Monitoring

The Azure monitor is limited to monitoring the metric information of the Service Bus Queues and Topics. In real-time business, there is a need to monitor the status and the current values of the critical properties like Message Count, Active Message Count, and Dead letter Message Count. Turbo360 offers to monitor the status, properties, and metrics of Service Bus Queues, Topics, and Topic subscriptions hence providing the complete coverage.

Turbo360 is the only tool that offers to monitor the message count and dead-letter message count of a Topic Subscription

Also, set up real-time dashboards on the key performance indices of the Azure Service Bus Queues and Topics and stay on top of them.

Observability

Below is the view of the Service Bus resources in the Azure portal:

It is hard to infer the current status of the Service Bus Queue or Topic from the perspectives we discussed above, like message count, dead-letter message count, throttled messages, and outage in size of the Queue.

Turbo360 can pitch in and add value through its feature Service Map by providing advanced observability. Visualize the Service Bus, the Sender, and the Reciever resources to depict the entire application. Spot the instant status of the resources along with the details behind the failure.

Operational Toolset

Any Application Performance Monitoring tool like Dynatrace, App Dynamics, etc., will stop identifying a failure, whereas Turbo360 also offers the operational toolset to remediate the failure. The message processing feature in Turbo360 provides the interface to view and process the active and dead letter messages. If the message had gotten into the dead letter queue due to’ Vendor Inactive,’ the remediation would be to repair the message, assign the order to a different vendor, and resubmit it to the same Queue for the Logic App to process.

Any APM will stop with detecting the issues; Turbo360 is way beyond APM as it allows to Visualize, Spot, and Fix an issue

This manual message processing interface is a time saver for processing messages without writing code. What if the scenario demands processing a massive volume of dead-letter messages? For example, there was an outage in the backend processing application over the weekend. Many incoming messages got into the dead-letter Queue as they were not processed within the configured time to live. The bak end processor is restored and ready to receive and process the incoming messages. The operational requirement here is to resubmit the entire set of dead-letter messages back into the original Queue, allowing the backend processor to pick them us and process them. I understand it is practically impossible to do this for a considerable volume, say 30000 messages. Turbo360 can address this need with ease using its Automated Task configuration.

90% of the production issues are functional errors, 60% of which can be automated using Turbo360 Automated Tasks

Advanced Security

When a team of users works with your Azure Service Bus Queues and Topics, sharing secure access to the resources is critical. Challenge in using tools like Service Bus Explorer is it requires sharing the namespace connection string with manage claims, which is not advisable. The Azure portal supports a role-based access control system; the problem is the complexity. Turbo360 addresses these challenges by facilitating access to the Service Bus resources using Service Principal. The admin can associate the help. The user access management in Turbo360 allows the definition of custom roles with granular user access policy definition.

Every action on the Service Bus Queues and Topics gets governed and audited. ‘Who’ did ‘what’ on ‘which’ resource, ‘when’ can be gathered from every action tracked.

Conclusion

Azure Service Bus Queues and Topics play a critical role in decoupling application components and building reliable Enterprise Integrations. To stay on top of the issues Service Bus might encounter, defining the support strategy with proactive monitoring and automated remediation is necessary. Turbo360 is the solution that can enable your Operations team to visualize the entire application, spot any issues, and remediate them even before the business realizes a failure. Several Enterprises across the globe from various business domains like Media & Entertainment, Logistics, Health Care, and Government are finding value in using Turbo360 for supporting their Azure Integrations. Sign up for a free trial and reap the benefits!