Why insurers need data streaming? And which technology fits best? (Spoiler: Why Kafka?)

May 26, 2021 Big Data, Data, Data streaming, Insurance, Kafka , Article , Data streaming

By Dariusz Zieliński

Ever since we committed to strengthening our competency of data streaming at Sollers Consulting, time and again I was asked: But why data streaming in insurance? It was then that the thought occurred to me it was high time to gather various arguments into one comprehensive article that I would like to present to you today. Read on to find out more.

Let me start with the basics: What is data streaming?

Data streaming is a mechanism whose features enable safe and extremely efficient (throughput performance) data transfer between systems, continuously and real time. Period.

The less effort we put into moving data around, the faster outcomes bring value and the more we can focus on the core business at hand. With that in mind, whenever today we see an issue with data flow, it is worth considering using data streaming technology like Kafka. It is, of course, a generalization, but one that allows us to mentally replace the question of “why?” with a much more interesting “why not?”.

Going into a little more detail, the data streaming mechanism, for example the one offered by Kafka, relies on the publisher (a sender, or in Kafka: a producer) of a piece of data (a message) not sending it directly to the subscriber (a receiver, or in Kafka: consumer), but classifying the message (tagging it, “adding a topic” or “putting in one stream”) and storing it centrally. The receiver may then subscribe via a central hub where messages are published (a broker – for example Kafka itself) to receive certain classes of messages.

For efficiency’s sake, messages are written into the broker in batches, which are categorized under one topic. Topics are in turn further broken down into smaller, more technical components called partitions. A stream is considered to be a single topic of data, regardless of the number of partitions.

Kafka owes its remarkable performance to partitioning messages into smaller pieces that allow the entire system to scale. The flow of information fragments (their division and subsequent consolidation; and error handling) is managed by the Kafka mechanism.

The final feature especially worth mentioning is configurable retention of messages, which means durable storage of messages for a defined period.

In other words, skipping all the technical jargon: if we want to send a message (data) from point A to point B, we send it to the central hub with the appropriate tag and anyone interested in that particular tag (not only point B) can download it from there for a specific, limited period of time, after which the message is deleted.

Where is the value and which use cases are currently most discussed in insurance?

In my preparations to write this piece, I researched numerous case studies of solutions based on Kafka. The overwhelming majority of key takeaways pointed out to “not neglecting IT infrastructure” and “focusing on processes first”. On top of that, I had some fascinating discussions with our insurance clients about the reasons for and the effects of using data streaming.

Kafka is used whenever large amounts of messages are sent between applications, particularly where collected data undergoes processing and transformation at multiple stages. Does it ring a bell?

Typical use cases include the increasingly popular usage-based insurance, prevention, and individual risk assessment with automated underwriting, so processing various data streams from IoT, wearables or user behaviours in any channel.

With the data streaming technology, we can improve customer journey through deep personalization. A classic use case where streaming brings value is also real-time analytics and fraud detection, where “reporting” time is of colossal importance.

We should also mention cross-industry use cases of tools to track user activity, all kinds of real-time metric calculation, and sometimes logging.

Still, going back to the aforementioned “reporting” topic, I would put it forward as the the key argument and business case in favor of implementing Kafka in the insurance organization. Why?

Kafka is not just a project. It is THE project. It is a transformation that cuts across the whole organization on several levels: process, human, organizational and, of course, infrastructural.

It is a huge undertaking, however, one should start small. Take a process that will allow you to see immediate returns and help others understand the value Kafka brings.

Insurers today need 20+ hours to prepare daily reports and handle the flow of data through their data batches. With data streaming, it will be mere minutes, not hours. It is a tremendously money-saving project. For instance, data preparation that today requires 2 FTE for 2 weeks, with Kafka should only take one hour – and there are hundreds of such activities per month. The calculation is a simple multiplication, and the freed capacity can be used in other places to generate more value.

Without streaming, you are doomed to struggle with delays caused by queuing and batches. Moreover, when something along the way does not work as it should, there also appears the problem of handling the “incomplete” message transmission. That puts quite a lot on your plate, doesn’t it? Professionals know how to deal with it in a “traditional batch” mode, but I encourage you to calculate at what cost.

Why do I think it should be Kafka?

The answer is simple: because it is vendor-agnostic, and it has been an open-source project on GitHub since 2010. There are several alternative tools to use to implement efficient and secure data streaming; however, most of them are strongly associated with one supplier – be it cloud-based solutions from Amazon and Google or one ecosystem of cloud-based and on-premise solutions such as provided by Microsoft. Kafka integrates well with multiple platforms and technologies, both on the producer and consumer’s side. This multiplicity is also one of the important features of Kafka, allowing you to read any single stream of messages without interfering with the others. Before Kafka, the frequent limitation of the message queuing solutions was that when a message was consumed by one client, it was not available to any other. Kafka handles this issue perfectly.

Final notes: Streaming vs. good old batches – what would Kafka be better at?

While Kafka supports real-time messaging, it is not required to do so. You can look at it as a very efficient mechanism replacing classic batches. It is highly adjustable and can be, for example, slowed down to “once a day” or boosted to “every hour”, allowing you to scale and to innovate in the future.
The matter of future developments is the key argument here and the focal point of business cases in large insurance organizations. Do you ever find yourself experiencing FOMO (fear of missing out)?
If you struggle with your data flow today, how do you think you will manage in 5 years? After all, the volume of data transmission is only going to grow.

Author: Dariusz Zieliński, Lead Consultant at Sollers Consulting

Dariusz Zieliński

For the last 13 years Dariusz has been associated with the financial services, mostly on the junction of software and insurance. He was involved in several successful transformations and projects, among them establishing greenfield start-up insurance companies. His area of expertise is mainly P&C insurance, with specialisation in claims and sales processes. He was also responsible for the program of creating and market launch of the P&C core insurance system developed from scratch.

At Sollers Consulting, Dariusz is responsible for the Data Competency, with the goal to enable our customers to become truly data-driven companies.

Read other articles