ARAG Insurance seizes the power of automation with data lake technology

European insurance group ARAG SE partnered with Xomnia to enhance its overall way of working and customer service with data. It worked on achieving this by migrating its operations to the data lake, and automating the formatting of the sales data supplied by its third party providers.

The project, known as the Datahub, has been successfully completed at the end of 2022, but our collaboration with Arag SE continues. With it, we aim to satisfy the growing data needs of ARAG’s business and modernize its IT infrastructure with a future-proof data lake.

Xomnia has and is continuing to help us develop on our own. Our data engineers have learned a lot from the expertise that Xomnia has brought to ARAG and are continuously using the lessons learned during the development of the Data Hub.


The vast majority of ARAG SE’s insurance policies are sold by hundreds of third party providers, such as major banks. The insurer, however, couldn’t make full use of the policy data sold through those external partners. This is because different partners report their sales differently and through different channels. Consequently, working with the reported data was tedious and involved a considerable amount of manual actions.

The insurer approached Xomnia to collaborate on overcoming this challenge through creating the necessary data infrastructure to clean and unify all data about their sales, policy portfolio and claims. This would also allow them to generate insights out of their various sales data quickly and accurately.


The first challenge that needed to be addressed was replacing ARAG’s traditional inhouse data warehouse with a data lake. This is because its on-premise data warehouse had become outdated and costly to operate and maintain.

Xomnia’s Machine Learning Engineers Dustin van Weersel, Siem van den Reijen and Maarten van Raaij joined ARAG SE Datahub’s team to further develop the data lake and accompanying data marks to replace the old data warehouse. Afterwards, they worked on creating a data pipeline that can automate the process of cleaning and transforming all the data of the sold insurance policies into a universal format, from which insights could be generated.

“Using certain parameters and settings, we tried to automate this process, and conducted some manual mapping tasks to get to where we want to go, based on research about each distribution partner and the information that they supply,” explained Siem.

“To future-proof the solution, our data pipeline uses Spark, an analytics engine suited for large scale data pipelines, to transform the data.” added Dustin. “This is combined with Kubernetes, an open source container orchestration system that allows us to orchestrate and scale all the various parts of our application.”

After the data has been transformed, it is used to create a data model within SQL server, a relational database. The resulting data model is then ingested by PowerBI to create datasets which are made available to the business for reporting purposes. Data engineers and data scientists can directly use the relational data model within SQL Server for analyses and modeling purposes.

Since ARAG SE’s specialty is (legal) insurance, it mainly deals with the policies sold to customers and the claims they make on their policy. Therefore, our team is developing separate data models dedicated specifically to the policies and claims, which will give the business increased insights in their portfolio, as well as help improve the service provided to customers when they have questions or want to make claims on their policies.


Following our collaboration, ARAG SE can make more use of its policy, claims and its internal and external sales data. For instance, they will be able to quickly understand trends in the sales and clients, such as their most popular policies among different clientele and geographies. They will also have more detailed insights into the performance of specific coverages within their product offerings.

Apart from the core business of ARAG, the Datahub also services the various internal departments such as Finance and Control, HR and Sales. For example, consolidating the policy data of all the various reinsurers into a singular data model will allow the service center to give quicker responses by not having to query external API’s and increased confidence in the information by having one single source of truth. Stay tuned for more developments.

Moreover, the first dashboard, which has already been built, gives an increased insight into the profit and loss statements (P&L) and full time equivalents (FTEs). The dashboard is still being developed to improve the way it delivers those insights.