Apache Kafka is a distributed, event streaming platform capable of handling trillions of events a day. It was created and open-sourced by LinkedIn in 2011. Since then, Kafka has evolved to a full-fledged event streaming platform. Many enterprises have already implemented Kafka, or plan to in the near future.
With trillions of integrated systems transactions, exceptions are bound to occur. These exceptions must be handled gracefully, in-keeping with SLA’s and with the least impact on other transactions.
This article discusses some approaches to handling exceptions for specific business scenarios.
Reporting/Statistics:
Consumers reading messages for reporting purposes may be tolerant of some exceptions, so long as overall data reporting is not skewed due to those errors. Such consumers can be configured to monitor error percentage, and if the error percentage crosses a certain threshold, the reporting can be delayed until the exceptions are cleared within the reporting SLA’s. For example, if a consumer reports the number of payments > $100 in the last hour and a percentage of errors, this consumer can continue to report, irrespective of error threshold. However, a consumer reporting total sales for the day may need to wait for a certain error percentage to clear out, otherwise an incorrect message may be displayed.
Note: this assumes the error is with message-processing. and not with the sales process itself.
Singular Messages:
The topics with messages that are independent of other messages in the topic don’t need to be handled in the sequence in which they arrive. Similarly, any exceptions in processing such messages can be handled in any order. For example, product sales, credit card transaction, news feed etc.
Exception-handling:
Approach 1: Insert offset of the message in exception to a new RBMS DB table (e.g. MYSQL, SQL server or Oracle). The table should have the topic name, offset, exception handling status, retries.
The exception handling process will read through the DB table and process messages one by one. The messages processed successfully will be marked as complete. The messages which throw exceptions, have their retry count updated until the maximum retry count is reached. Messages which reach the maximum retry count are not processed again. Such messages can also be sent to a Dead Letter Queue Topic for manual research and handling.
Approach 2: Insert the offset of the exception message to a different topic, with number of retries. The consumer for that topic will retry processing the message. Messages that throw exceptions again can be inserted into a new topic (e.g. retry_2) and processed again, and so on. After the maximum number of retries is reached, the message can be sent to a Dead Letter Queue topic for manual research and handling.
Sequential Messages:
The topic with messages which are to be processed in a sequence may require special exception-handling to maintain the sequence of events. Handling is very similar to singular messages, except the related messages need to be handled in the same sequence they were created. Also, if a message in the sequence throws an error, related messages should also be held until the error is resolved.
The offset for the message with exception should be kept in the DB table, along with number of retries and status. In addition, the key for the message in exception needs to be stored in a different table to track sequential messages. For example, if the messages for a customer need to be handled in sequence, the customer id is key. This customer id will be stored in a separate table with the status. Any new message with the key will be stored in the exception table, until all errors for the key are cleared.
If the maximum number of retries is reached, the message can be sent to a Dead Letter Queue topic, along with all related messages, for further analysis and handling.
We hope this helped you understand how to handle business exceptions KAFKA: Handling Business Exceptions with Apache Kafka. If your organization has any additional questions about this distributed, event streaming platform, please don’t hesitate to reach out to us. We’d love to help you get started. Lastly, to achieve digital transformation in record time, check out our Enterprise Architecture Modernization Kickstart.
References for further reading:
https://www.confluent.io/what-is-apache-kafka/
https://eng.uber.com/reliable-reprocessing/