What’s a message?
Applications and systems communicate with each other by passing information back and forth. Information is passed as messages. A message is no more than a packaged information; a data record passed from one party, usually known as sender or producer, to one or more parties usually called receivers or consumers.
There are different types of messages with a very different intent and usage patterns. For example, commands and events. Regardless of type, all messages have a few things in common.
Common message attributes
Data is the whole reason a message is created and sent. Wherever it instructs to perform an action such as “CreateUser” carrying user details or notifies about Storage blob that was modified carrying blob information. The data is also known as the message body. When one system is packaging information and passing it to another system as a message, just having a body is not enough. Consider a scenario where a document written in Afrikaans is sent to China. While the message will make it, if the recipient of the message is not aware of the language, it will not be able to understand it. A hint what language is used in the message is not part of the data that was originally sent.
This is where metadata comes in. Metadata, also known as message properties or headers, is a supplemented information. This information can describe message data to allow interpretation and processing on the receiving side. Going back to the document originally written in Afrikaans, metadata could include a header expressing in what language message was written in, utilizing internationally accepted nomenclature to classify languages ISO 639. The message could have the following metadata:
{ “language”: “afr” }
For systems with different message serialization types, it is crucial to indicate serialization in the metadata. .NET applications sharing message contracts can benefit from metadata that would allow message deserialization without the need to discover message type. Azure Service Bus, for example, has a system property called “ContentType” specifically designed for that.
How big your message can be?
And last, but not the least common attribute of all messages is size limit. It varies from system to system, from broker to broker, but it’s always defined and sooner or later reminds about itself at the most inconvenient time. While the rule of thumb is to have messages as small as possible, when building systems that rely on messaging, understanding message size quota permitted by the underlying infrastructure is vital. And different technologies have different limits. A developer using MSMQ on-premises can send messages up to 4 MB in size, while an Azure developer using Storage Queues will be forced to fit everything into 64 KB.
The same Azure developer choosing to use Service Bus instead will benefit from a whopping 256 KB when using Standard tier and a cosmic 1 MB when using Premium tier. If your messages adhere to the core messaging principles and are a few kilobytes in size, there’s no worry. Though it’s not always possible to stick to the pure approach and the force of reality can have its word in how big messages will be.
So how to handle large Azure Service Bus messages?
Reduce Azure Service Bus message size
The simplest approach is to reduce message size. Review anything that is not necessary and remove it.
For example, let’s look at the following ChargeCustomer message:
{
"CustomerId": "1234567890",
"OrderId": "abcdefgh",
…
"SelectedProducts": [
{
"ProductId": "product01",
"ProductDescription": "description",
"ProductRatings” : [ … ]
},
{
"ProductId": "product02",
"ProductDescription": "description",
"ProductRatings" : [ … ]
},
…]
}
At first glance, this looks like a valid message. When looking closer, this message intends to instruct the billing service to charge the customer for purchase. The message contains the minimum required information: customer and order IDs. And yet it also contains information about purchased products. Information that is most likely available if an order is queried using order ID information, making this information redundant and unnecessary for the given message: product ID, description, and ratings. Not only this information is unnecessary, but it could easily cause a message to surpass its maximum size, failing to be dispatched. Take a product such as Fitbit Charge 2 on Amazon. It “only” has over 16 thousand reviews. Taking 50 characters of these review titles only would end up consuming 822 KB.
Carefully examine what data goes into the message body. Information that is not needed for immediate message processing or is optional and can be retrieved by querying if needed should not be added.
Use optimal serialization
Another aspect to keep in mind is what serialization is used. The most efficient serialization when it comes to size is binary. But then in many systems messages, the body needs to be viewable which rules binary serialization out. Text-based serializers are not born equal though. Let’s compare the same message serialized using XML and JSON serializers.
{
"Id":"ab123",
"Firstname":"John",
"LastName":"Doe",
"Address":"address",
"ZipCode":"90210",
"State":"CA",
"Country":"USA"
}
<root>
<Address>address</Address>
<Country>USA</Country>
<Firstname>John</Firstname>
<Id>ab123</Id>
<LastName>Doe</LastName>
<State>CA</State>
<ZipCode>90210</ZipCode>
</root>
From a quick observation, it’s noticeable that XML serialization is less desirable for its opening and closing tags that add to every attribute and opening and closing tags. For example, when using Storage Queues, a maximum message size of 64KB can be reached in no time when XML serialization is used. Therefore, whenever it’s possible, use serialization that suits business needs and does not inflate message size.
Use claim check pattern
Binary data is everywhere:
- PDF files associated with procurement process
- X-Ray image files linked to a patient’s’ dental visit
- Home inspector’s report with digital images
No arguments about convenience and ease of sending messages along with binary files packaged in a single envelope. While it’s convenient, it has toll on message size and the overall messaging performance. Imagine a workflow that passes binary data from one stage to another and another. Attachment data might be needed at some point and might not be used at all. But the tax of passing the binary data is paid by every single message. The thread of message failing at some point with just some minor additions is real. What’s a possible solution? Claim check pattern.
Claim check pattern is one of those patterns borrowed from domains that are not software development but have been used for centuries. For example, coat check tags where used at theatres where guests would leave their coats, receiving tags in exchange. Those tags would be stored for the duration of the show and retrieved back at the end. Each tag would uniquely identify a coat it was exchanged for. The person with the tag would be able to enjoy the show and sit worry-free for a bulky coat.
This same pattern is applied in software development for messaging. Looking back at home inspector report from above. Just like a coat at the theatre, digital images are bulky, and might not be required for processing. They could be stored elsewhere and retrieved when required. Azure offers Storage service for exactly this purpose. Storage blobs – It’s cheap, safe, and reliable. Digital images would be stored as blobs rather being included in the message body. Message sent would reference blob identifiers, allowing receiving parties to retrieve binary data if required.
Consider cleanup of associated data after processing
Claim check pattern is a very powerful pattern. And as with anything powerful, responsibility is coming along. Messages with body only do not require any extra work. Whenever a message is processed, it’s completed and gone for good. Messages with attachments stored as external blobs do require additional work. Once a message is processed, the attachment has to be removed as part of the cleanup. While the actual cleanup itself might not be too much of a sophisticated process, the devil is in detail. Consider a message that represents an event. An event is a broadcasted message and can have an unbound number of consumers. Neither the sender nor the individual consumers know when a blob representing an attachment can be safely removed.
To mitigate this complication, there are several ways to handle such a scenario:
- Leave attachments as-is; storage is cheap
- Define time-to-live value for attachments that would comply with the business needs
- Build a centralized solution to collect metadata about events and consumers, audit and register information to determine if all its consumers processed an event or not. Execute cleanup logic based on the collected information
Adjust Azure Service Bus message size at run-time
A common scenario is to have messages of the same type resulting in different sizes. For example, 95% of messages of the same type would be well below the maximum size. But that pesky 5% would exceed it, cause message size violation and failure at run-time. For scenarios like this, claim check pattern applied conditionally on the message body could result in better overall performance. i.e., messages smaller than the maximum allowed size would go through without the hassle of mandatory offloading of the body data into Storage blob. The latency of this operation for 95% of the messages will add up quickly. But for the 5% that would not fit into the quota, the claim check pattern would ensure that “obese” messages go on a diet and can be sent/received.
What’s next?
This blog gives you options to deal with large Azure Service Bus messages. In the next post, we’ll explore how to build a basic claim check pattern in Azure and find out how Azure Service Bus .NET Standard client offers run-time message size adjustment.